from:"Jingbai Ma"

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma

On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote:
> (2013/11/08 14:12), Atsushi Kumagai wrote:
>> Hello Jingbai,
>>
>> (2013/11/07 17:58), Jingbai Ma wrote:
>>> On 11/06/2013 10:23 PM, Vivek Goyal wrote:
>>>> On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
>>>>> (2013/11/06 5:27), Vivek Goyal wrote:
>>>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>>>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump 
>>>>>>> file.
>>>>>>>
>>>>>>> This patch requires the kernel patch to export necessary data 
>>>>>>> structures into
>>>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>>>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>>>>>>
>>>>>>> This patch introduce two new dump levels 32 and 64 to exclude all 
>>>>>>> unused and
>>>>>>> active hugepages. The level to exclude all unnecessary pages will be 
>>>>>>> 127 now.
>>>>>>
>>>>>> Interesting. Why hugepages should be treated any differentely than normal
>>>>>> pages?
>>>>>>
>>>>>> If user asked to filter out free page, then it should be filtered and
>>>>>> it should not matter whether it is a huge page or not?
>>>>>
>>>>> I'm making a RFC patch of hugepages filtering based on such policy.
>>>>>
>>>>> I attach the prototype version.
>>>>> It's able to filter out also THPs, and suitable for cyclic processing
>>>>> because it depends on mem_map and looking up it can be divided into
>>>>> cycles. This is the same idea as page_is_buddy().
>>>>>
>>>>> So I think it's better.
>>>>
>>>> Agreed. Being able to treat hugepages in same manner as other pages
>>>> sounds good.
>>>>
>>>> Jingbai, looks good to you?
>>>
>>> It looks good to me.
>>>
>>> My only concern is by this way, we only can exclude all hugepage together, 
>>> but can't exclude the free hugepages only. I'm not sure if user need to 
>>> dump out the activated hugepage only.
>>>
>>> Kumagai-san, please correct me, if I'm wrong.
>>
>> Yes, my patch treats all allocated hugetlbfs pages as user pages,
>> doesn't distinguish whether the pages are actually used or not.
>> I made so because I guess it's enough for almost all users.
>>
>> We can introduce new dump level after it's needed actually,
>> but I don't think now is the time. To introduce it without
>> demand will make this tool just more complex.
>>
> 
> Typically, users would allocate huge pages as much as actually they use only,
> in order not to waste system memory. So, this design seems reasonable.
> 

OK, It looks reasonable.
Thanks!

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma


On 11/06/2013 10:23 PM, Vivek Goyal wrote:

On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.


Agreed. Being able to treat hugepages in same manner as other pages
sounds good.

Jingbai, looks good to you?


It looks good to me.

My only concern is by this way, we only can exclude all hugepage 
together, but can't exclude the free hugepages only. I'm not sure if 
user need to dump out the activated hugepage only.


Kumagai-san, please correct me, if I'm wrong.





Thanks
Vivek



--
Thanks
Atsushi Kumagai



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma


On 11/06/2013 10:23 PM, Vivek Goyal wrote:

On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.


Agreed. Being able to treat hugepages in same manner as other pages
sounds good.

Jingbai, looks good to you?


It looks good to me.

My only concern is by this way, we only can exclude all hugepage 
together, but can't exclude the free hugepages only. I'm not sure if 
user need to dump out the activated hugepage only.


Kumagai-san, please correct me, if I'm wrong.





Thanks
Vivek



--
Thanks
Atsushi Kumagai



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma

On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote:
 (2013/11/08 14:12), Atsushi Kumagai wrote:
 Hello Jingbai,

 (2013/11/07 17:58), Jingbai Ma wrote:
 On 11/06/2013 10:23 PM, Vivek Goyal wrote:
 On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump 
 file.

 This patch requires the kernel patch to export necessary data 
 structures into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

 This patch introduce two new dump levels 32 and 64 to exclude all 
 unused and
 active hugepages. The level to exclude all unnecessary pages will be 
 127 now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.

 Agreed. Being able to treat hugepages in same manner as other pages
 sounds good.

 Jingbai, looks good to you?

 It looks good to me.

 My only concern is by this way, we only can exclude all hugepage together, 
 but can't exclude the free hugepages only. I'm not sure if user need to 
 dump out the activated hugepage only.

 Kumagai-san, please correct me, if I'm wrong.

 Yes, my patch treats all allocated hugetlbfs pages as user pages,
 doesn't distinguish whether the pages are actually used or not.
 I made so because I guess it's enough for almost all users.

 We can introduce new dump level after it's needed actually,
 but I don't think now is the time. To introduce it without
 demand will make this tool just more complex.

 
 Typically, users would allocate huge pages as much as actually they use only,
 in order not to waste system memory. So, this design seems reasonable.
 

OK, It looks reasonable.
Thanks!

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma


On 11/06/2013 04:26 AM, Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


Yes, free hugepages should be filtered out with other free pages. It 
sounds reasonable.


But for active hugepages, I would offer user more choices/flexibility. 
(maybe bad).

I'm OK to filter active hugepages with other user data page.

Any other comments?




Thanks
Vivek



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] makedumpfile: hugepage filtering: add excluding hugepage messages

2013-11-05 Thread Jingbai Ma

Add messages for print_info.

Signed-off-by: Jingbai Ma 
---
 print_info.c |   12 +++-
 print_info.h |2 ++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/print_info.c b/print_info.c
index 06939e0..978d9fb 100644
--- a/print_info.c
+++ b/print_info.c
@@ -103,17 +103,19 @@ print_usage(void)
MSG("  The maximum of Dump_Level is 31.\n");
MSG("  Note that Dump_Level for Xen dump filtering is 0 or 1.\n");
MSG("\n");
-   MSG("| cachecache\n");
-   MSG("  Dump  |  zero   without  with userfree\n");
-   MSG("  Level |  page   private  private  datapage\n");
-   MSG(" ---+---\n");
+   MSG("| cachecachefree
active\n");
+   MSG("  Dump  |  zero   without  with userfreehuge
huge\n");
+   MSG("  Level |  page   private  private  datapagepage
page\n");
+   MSG(" 
---+--\n");
MSG(" 0  |\n");
MSG(" 1  |   X\n");
MSG(" 2  |   X\n");
MSG(" 4  |   XX\n");
MSG(" 8  |X\n");
MSG("16  |X\n");
-   MSG("31  |   X   XX   X   X\n");
+   MSG("32  |X\n");
+   MSG("64  |X   
X\n");
+   MSG("   127  |   X   XX   X   X   X   
X\n");
MSG("\n");
MSG("  [-E]:\n");
MSG("  Create DUMPFILE in the ELF format.\n");
diff --git a/print_info.h b/print_info.h
index 01e3706..8461df6 100644
--- a/print_info.h
+++ b/print_info.h
@@ -35,6 +35,8 @@ void print_execution_time(char *step_name, struct timeval 
*tv_start);
 #define PROGRESS_HOLES "Checking for memory holes  "
 #define PROGRESS_UNN_PAGES "Excluding unnecessary pages"
 #define PROGRESS_FREE_PAGES"Excluding free pages   "
+#define PROGRESS_FREE_HUGE "Excluding free huge pages  "
+#define PROGRESS_ACTIVE_HUGE   "Excluding active huge pages"
 #define PROGRESS_ZERO_PAGES"Excluding zero pages   "
 #define PROGRESS_XEN_DOMAIN"Excluding xen user domain  "
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.

| cachecachefreeactive
  Dump  |  zero   without  with userfreehugehuge
  Level |  page   private  private  datapagepagepage
 ---+--
 0  |
 1  |   X
 2  |   X
 4  |   XX
 8  |X
16  |X
32  |X
64  |X   X
   127  |   X   XX   X   X   X   X

example:
To exclude all unnecessary pages:
makedumpfile -c --message-level 23 -d 127 /proc/vmcore /var/crash/kdump

To exclude all unnecessary pages but keep active hugepages:
makedumpfile -c --message-level 23 -d 63 /proc/vmcore /var/crash/kdump

---

Jingbai Ma (3):
  makedumpfile: hugepage filtering: add hugepage filtering functions
  makedumpfile: hugepage filtering: add excluding hugepage messages
  makedumpfile: hugepage filtering: add new dump levels for manual page


 makedumpfile.8 |  170 +++
 makedumpfile.c |  272 
 makedumpfile.h |   19 
 print_info.c   |   12 +-
 print_info.h   |2 
 5 files changed, 431 insertions(+), 44 deletions(-)

--

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] makedumpfile: hugepage filtering: add new dump levels for manual page

2013-11-05 Thread Jingbai Ma

Add new dump levels for makedumpfile manual page.

Signed-off-by: Jingbai Ma 
---
 makedumpfile.8 |  170 
 1 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/makedumpfile.8 b/makedumpfile.8
index adeb811..70e8732 100644
--- a/makedumpfile.8
+++ b/makedumpfile.8
@@ -164,43 +164,139 @@ by dump_level 11, makedumpfile retries it by dump_level 
31.
 .br
 # makedumpfile \-d 11,31 \-x vmlinux /proc/vmcore dumpfile
 
-   |  |cache  |cache  |  |
-  dump | zero |without|with   | user | free
- level | page |private|private| data | page
-.br
-\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-
- 0 |  |   |   |  |
- 1 |  X   |   |   |  |
- 2 |  |   X   |   |  |
- 3 |  X   |   X   |   |  |
- 4 |  |   X   |   X   |  |
- 5 |  X   |   X   |   X   |  |
- 6 |  |   X   |   X   |  |
- 7 |  X   |   X   |   X   |  |
- 8 |  |   |   |  X   |
- 9 |  X   |   |   |  X   |
-10 |  |   X   |   |  X   |
-11 |  X   |   X   |   |  X   |
-12 |  |   X   |   X   |  X   |
-13 |  X   |   X   |   X   |  X   |
-14 |  |   X   |   X   |  X   |
-15 |  X   |   X   |   X   |  X   |
-16 |  |   |   |  |  X
-17 |  X   |   |   |  |  X
-18 |  |   X   |   |  |  X
-19 |  X   |   X   |   |  |  X
-20 |  |   X   |   X   |  |  X
-21 |  X   |   X   |   X   |  |  X
-22 |  |   X   |   X   |  |  X
-23 |  X   |   X   |   X   |  |  X
-24 |  |   |   |  X   |  X
-25 |  X   |   |   |  X   |  X
-26 |  |   X   |   |  X   |  X
-27 |  X   |   X   |   |  X   |  X
-28 |  |   X   |   X   |  X   |  X
-29 |  X   |   X   |   X   |  X   |  X
-30 |  |   X   |   X   |  X   |  X
-31 |  X   |   X   |   X   |  X   |  X
+   |  |cache  |cache  |  |  | free | active
+  dump | zero |without|with   | user | free | huge | huge
+ level | page |private|private| data | page | page | page
+.br
+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-\-
+ 0 |  |   |   |  |  |  |
+ 1 |  X   |   |   |  |  |  |
+ 2 |  |   X   |   |  |  |  |
+ 3 |  X   |   X   |   |  |  |  |
+ 4 |  |   X   |   X   |  |  |  |
+ 5 |  X   |   X   |   X   |  |  |  |
+ 6 |  |   X   |   X   |  |  |  |
+ 7 |  X   |   X   |   X   |  |  |  |
+ 8 |  |   |   |  X   |  |  |
+ 9 |  X   |   |   |  X   |  |  |
+10 |  |   X   |   |  X   |  |  |
+11 |  X   |   X   |   |  X   |  |  |
+12 |  |   X   |   X   |  X   |  |  |
+13 |  X   |   X   |   X   |  X   |  |  |
+14 |  |   X   |   X   |  X   |  |  |
+15 |  X   |   X   |   X   |  X   |  |  |
+16 |  |   |   |  |  X   |  |
+17 |  X   |   |   |  |  X   |  |
+18 |  |   X   |   |  |  X   |  |
+19 |  X   |   X   |   |  |  X   |  |
+20 |  |   X   |   X   |  |  X   |  |
+21 |  X   |   X   |   X   |  |  X   |  |
+22 |  |   X   |   X   |  |  X   |  |
+23 |  X   |   X   |   X   |  |  X   |  |
+24 |  |   |   |  X   |  X   |  |
+25 |  X   |   |   |  X   |  X   |  |
+26 |  |   X   |   |  X   |  X   |  |
+27 |  X   |   X   |   |  X   |  X   |  |
+28 |  |   X   |   X   |  X   |  X   |  |
+29 |  X   |   X   |   X   |  X   |  X   |  |
+30 |  |   X   |   X   |  X   |  X   |  |
+31 |  X   |   X   |   X   |  X   |  X   |  |
+32 |  |   |   |  |  |  X   |
+33 |  X   |   |   |  |  |  X   |
+34 |  |   X   |   |  |  |  X   |
+35 |  X   |   X   |   |  |  |  X   |
+36 |  |   X   |   X   |  |  |  X   |
+37 |  X   |   X   |   X   |  |  |  X   |
+38 |  |   X   |   X   |  |  |  X   |
+39 |  X   |   X   |   X   |  |  |  X   |
+40 |  |   |   |  X   |  |  X   |
+41 |  X   |   |   |  X   |  |  X   |
+42 |  |   X   |   |  X   |  |  X   |
+43 |  X   |   X   |   |  X   |  |  X   |
+44 |  |   X   |   X   |  X   |  |  X   |
+45 |  X   |   X   |   X   |  X   |  |  X   |
+46 |  |   X   |   X   |  X   |  |  X   |
+47 |  X   |   X   |   X   |  X   |  |  X   |
+48 |  |   |   |  |  X   |  X   |
+49 |  X

[PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions

2013-11-05 Thread Jingbai Ma

Add functions to exclude hugepage from vmcore dump.

Signed-off-by: Jingbai Ma 
---
 makedumpfile.c |  272 
 makedumpfile.h |   19 
 2 files changed, 289 insertions(+), 2 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index b42565c..f0b2531 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -46,6 +46,8 @@ unsigned long long pfn_cache_private;
 unsigned long long pfn_user;
 unsigned long long pfn_free;
 unsigned long long pfn_hwpoison;
+unsigned long long pfn_free_huge;
+unsigned long long pfn_active_huge;
 
 unsigned long long num_dumped;
 
@@ -1038,6 +1040,7 @@ get_symbol_info(void)
SYMBOL_INIT(mem_map, "mem_map");
SYMBOL_INIT(vmem_map, "vmem_map");
SYMBOL_INIT(mem_section, "mem_section");
+   SYMBOL_INIT(hstates, "hstates");
SYMBOL_INIT(pkmap_count, "pkmap_count");
SYMBOL_INIT_NEXT(pkmap_count_next, "pkmap_count");
SYMBOL_INIT(system_utsname, "system_utsname");
@@ -1174,6 +1177,19 @@ get_structure_info(void)
OFFSET_INIT(list_head.prev, "list_head", "prev");
 
/*
+* Get offsets of the hstate's members.
+*/
+   SIZE_INIT(hstate, "hstate");
+   OFFSET_INIT(hstate.order, "hstate", "order");
+   OFFSET_INIT(hstate.nr_huge_pages, "hstate", "nr_huge_pages");
+   OFFSET_INIT(hstate.free_huge_pages, "hstate", "free_huge_pages");
+   OFFSET_INIT(hstate.hugepage_activelist, "hstate",
+   "hugepage_activelist");
+   OFFSET_INIT(hstate.hugepage_freelists, "hstate", "hugepage_freelists");
+   MEMBER_ARRAY_LENGTH_INIT(hstate.hugepage_freelists, "hstate",
+   "hugepage_freelists");
+
+   /*
 * Get offsets of the node_memblk_s's members.
 */
SIZE_INIT(node_memblk_s, "node_memblk_s");
@@ -1555,6 +1571,7 @@ write_vmcoreinfo_data(void)
WRITE_SYMBOL("mem_map", mem_map);
WRITE_SYMBOL("vmem_map", vmem_map);
WRITE_SYMBOL("mem_section", mem_section);
+   WRITE_SYMBOL("hstates", hstates);
WRITE_SYMBOL("pkmap_count", pkmap_count);
WRITE_SYMBOL("pkmap_count_next", pkmap_count_next);
WRITE_SYMBOL("system_utsname", system_utsname);
@@ -1590,6 +1607,7 @@ write_vmcoreinfo_data(void)
WRITE_STRUCTURE_SIZE("zone", zone);
WRITE_STRUCTURE_SIZE("free_area", free_area);
WRITE_STRUCTURE_SIZE("list_head", list_head);
+   WRITE_STRUCTURE_SIZE("hstate", hstate);
WRITE_STRUCTURE_SIZE("node_memblk_s", node_memblk_s);
WRITE_STRUCTURE_SIZE("nodemask_t", nodemask_t);
WRITE_STRUCTURE_SIZE("pageflags", pageflags);
@@ -1628,6 +1646,13 @@ write_vmcoreinfo_data(void)
WRITE_MEMBER_OFFSET("vm_struct.addr", vm_struct.addr);
WRITE_MEMBER_OFFSET("vmap_area.va_start", vmap_area.va_start);
WRITE_MEMBER_OFFSET("vmap_area.list", vmap_area.list);
+   WRITE_MEMBER_OFFSET("hstate.order", hstate.order);
+   WRITE_MEMBER_OFFSET("hstate.nr_huge_pages", hstate.nr_huge_pages);
+   WRITE_MEMBER_OFFSET("hstate.free_huge_pages", hstate.free_huge_pages);
+   WRITE_MEMBER_OFFSET("hstate.hugepage_activelist",
+   hstate.hugepage_activelist);
+   WRITE_MEMBER_OFFSET("hstate.hugepage_freelists",
+   hstate.hugepage_freelists);
WRITE_MEMBER_OFFSET("log.ts_nsec", log.ts_nsec);
WRITE_MEMBER_OFFSET("log.len", log.len);
WRITE_MEMBER_OFFSET("log.text_len", log.text_len);
@@ -1647,6 +1672,9 @@ write_vmcoreinfo_data(void)
WRITE_ARRAY_LENGTH("zone.free_area", zone.free_area);
WRITE_ARRAY_LENGTH("free_area.free_list", free_area.free_list);
 
+   WRITE_ARRAY_LENGTH("hstate.hugepage_freelists",
+   hstate.hugepage_freelists);
+
WRITE_NUMBER("NR_FREE_PAGES", NR_FREE_PAGES);
WRITE_NUMBER("N_ONLINE", N_ONLINE);
 
@@ -1659,6 +1687,8 @@ write_vmcoreinfo_data(void)
 
WRITE_NUMBER("PAGE_BUDDY_MAPCOUNT_VALUE", PAGE_BUDDY_MAPCOUNT_VALUE);
 
+   WRITE_NUMBER("HUGE_MAX_HSTATE", HUGE_MAX_HSTATE);
+
/*
 * write the source file of 1st kernel
 */
@@ -1874,6 +1904,7 @@ read_vmcoreinfo(void)
READ_SYMBOL("mem_map", mem_map);
READ_SYMBOL("vmem_map", vmem_map);
READ_SYMBOL("mem_section", mem_section);
+   READ_SYMBOL("hstates", hstates);
READ_SYMBOL("pkmap_count", pkmap_co

[PATCH] kexec: export hugepage data structure into vmcoreinfo

2013-11-05 Thread Jingbai Ma

This patch exports hstates data structure into vmcoreinfo when
CONFIG_HUGETLB_PAGE is defined. makedumpfile needs to read information of
hugepage related data structure.

We introduce a function into "makedumpfile" to exclude hugepage from vmcore
dump.  In order to introduce this function, the hstates data structure has
to export into vmcoreinfo.

This patch based on Linux 3.12.

The patch set for makedumpfile to filter hugepage will be sent separately.

Signed-off-by: Jingbai Ma 
---
 kernel/kexec.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 2a74f30..766c7c8 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -38,6 +38,9 @@
 #include 
 #include 
 
+#include 
+
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -1578,11 +1581,17 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_STRUCT_SIZE(mem_section);
VMCOREINFO_OFFSET(mem_section, section_mem_map);
 #endif
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_SYMBOL(hstates);
+#endif
VMCOREINFO_STRUCT_SIZE(page);
VMCOREINFO_STRUCT_SIZE(pglist_data);
VMCOREINFO_STRUCT_SIZE(zone);
VMCOREINFO_STRUCT_SIZE(free_area);
VMCOREINFO_STRUCT_SIZE(list_head);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_STRUCT_SIZE(hstate);
+#endif
VMCOREINFO_SIZE(nodemask_t);
VMCOREINFO_OFFSET(page, flags);
VMCOREINFO_OFFSET(page, _count);
@@ -1606,9 +1615,19 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(list_head, prev);
VMCOREINFO_OFFSET(vmap_area, va_start);
VMCOREINFO_OFFSET(vmap_area, list);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_OFFSET(hstate, order);
+   VMCOREINFO_OFFSET(hstate, nr_huge_pages);
+   VMCOREINFO_OFFSET(hstate, free_huge_pages);
+   VMCOREINFO_OFFSET(hstate, hugepage_activelist);
+   VMCOREINFO_OFFSET(hstate, hugepage_freelists);
+#endif
VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER);
log_buf_kexec_setup();
VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_LENGTH(hstate.hugepage_freelists, MAX_NUMNODES);
+#endif
VMCOREINFO_NUMBER(NR_FREE_PAGES);
VMCOREINFO_NUMBER(PG_lru);
VMCOREINFO_NUMBER(PG_private);
@@ -1618,6 +1637,9 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_hwpoison);
 #endif
VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_NUMBER(HUGE_MAX_HSTATE);
+#endif
 
arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kexec: export hugepage data structure into vmcoreinfo

2013-11-05 Thread Jingbai Ma

This patch exports hstates data structure into vmcoreinfo when
CONFIG_HUGETLB_PAGE is defined. makedumpfile needs to read information of
hugepage related data structure.

We introduce a function into makedumpfile to exclude hugepage from vmcore
dump.  In order to introduce this function, the hstates data structure has
to export into vmcoreinfo.

This patch based on Linux 3.12.

The patch set for makedumpfile to filter hugepage will be sent separately.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 kernel/kexec.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 2a74f30..766c7c8 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -38,6 +38,9 @@
 #include asm/io.h
 #include asm/sections.h
 
+#include linux/hugetlb.h
+
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -1578,11 +1581,17 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_STRUCT_SIZE(mem_section);
VMCOREINFO_OFFSET(mem_section, section_mem_map);
 #endif
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_SYMBOL(hstates);
+#endif
VMCOREINFO_STRUCT_SIZE(page);
VMCOREINFO_STRUCT_SIZE(pglist_data);
VMCOREINFO_STRUCT_SIZE(zone);
VMCOREINFO_STRUCT_SIZE(free_area);
VMCOREINFO_STRUCT_SIZE(list_head);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_STRUCT_SIZE(hstate);
+#endif
VMCOREINFO_SIZE(nodemask_t);
VMCOREINFO_OFFSET(page, flags);
VMCOREINFO_OFFSET(page, _count);
@@ -1606,9 +1615,19 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(list_head, prev);
VMCOREINFO_OFFSET(vmap_area, va_start);
VMCOREINFO_OFFSET(vmap_area, list);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_OFFSET(hstate, order);
+   VMCOREINFO_OFFSET(hstate, nr_huge_pages);
+   VMCOREINFO_OFFSET(hstate, free_huge_pages);
+   VMCOREINFO_OFFSET(hstate, hugepage_activelist);
+   VMCOREINFO_OFFSET(hstate, hugepage_freelists);
+#endif
VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER);
log_buf_kexec_setup();
VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_LENGTH(hstate.hugepage_freelists, MAX_NUMNODES);
+#endif
VMCOREINFO_NUMBER(NR_FREE_PAGES);
VMCOREINFO_NUMBER(PG_lru);
VMCOREINFO_NUMBER(PG_private);
@@ -1618,6 +1637,9 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_hwpoison);
 #endif
VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_NUMBER(HUGE_MAX_HSTATE);
+#endif
 
arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] makedumpfile: hugepage filtering: add new dump levels for manual page

2013-11-05 Thread Jingbai Ma

Add new dump levels for makedumpfile manual page.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 makedumpfile.8 |  170 
 1 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/makedumpfile.8 b/makedumpfile.8
index adeb811..70e8732 100644
--- a/makedumpfile.8
+++ b/makedumpfile.8
@@ -164,43 +164,139 @@ by dump_level 11, makedumpfile retries it by dump_level 
31.
 .br
 # makedumpfile \-d 11,31 \-x vmlinux /proc/vmcore dumpfile
 
-   |  |cache  |cache  |  |
-  dump | zero |without|with   | user | free
- level | page |private|private| data | page
-.br
-\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-
- 0 |  |   |   |  |
- 1 |  X   |   |   |  |
- 2 |  |   X   |   |  |
- 3 |  X   |   X   |   |  |
- 4 |  |   X   |   X   |  |
- 5 |  X   |   X   |   X   |  |
- 6 |  |   X   |   X   |  |
- 7 |  X   |   X   |   X   |  |
- 8 |  |   |   |  X   |
- 9 |  X   |   |   |  X   |
-10 |  |   X   |   |  X   |
-11 |  X   |   X   |   |  X   |
-12 |  |   X   |   X   |  X   |
-13 |  X   |   X   |   X   |  X   |
-14 |  |   X   |   X   |  X   |
-15 |  X   |   X   |   X   |  X   |
-16 |  |   |   |  |  X
-17 |  X   |   |   |  |  X
-18 |  |   X   |   |  |  X
-19 |  X   |   X   |   |  |  X
-20 |  |   X   |   X   |  |  X
-21 |  X   |   X   |   X   |  |  X
-22 |  |   X   |   X   |  |  X
-23 |  X   |   X   |   X   |  |  X
-24 |  |   |   |  X   |  X
-25 |  X   |   |   |  X   |  X
-26 |  |   X   |   |  X   |  X
-27 |  X   |   X   |   |  X   |  X
-28 |  |   X   |   X   |  X   |  X
-29 |  X   |   X   |   X   |  X   |  X
-30 |  |   X   |   X   |  X   |  X
-31 |  X   |   X   |   X   |  X   |  X
+   |  |cache  |cache  |  |  | free | active
+  dump | zero |without|with   | user | free | huge | huge
+ level | page |private|private| data | page | page | page
+.br
+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-\-
+ 0 |  |   |   |  |  |  |
+ 1 |  X   |   |   |  |  |  |
+ 2 |  |   X   |   |  |  |  |
+ 3 |  X   |   X   |   |  |  |  |
+ 4 |  |   X   |   X   |  |  |  |
+ 5 |  X   |   X   |   X   |  |  |  |
+ 6 |  |   X   |   X   |  |  |  |
+ 7 |  X   |   X   |   X   |  |  |  |
+ 8 |  |   |   |  X   |  |  |
+ 9 |  X   |   |   |  X   |  |  |
+10 |  |   X   |   |  X   |  |  |
+11 |  X   |   X   |   |  X   |  |  |
+12 |  |   X   |   X   |  X   |  |  |
+13 |  X   |   X   |   X   |  X   |  |  |
+14 |  |   X   |   X   |  X   |  |  |
+15 |  X   |   X   |   X   |  X   |  |  |
+16 |  |   |   |  |  X   |  |
+17 |  X   |   |   |  |  X   |  |
+18 |  |   X   |   |  |  X   |  |
+19 |  X   |   X   |   |  |  X   |  |
+20 |  |   X   |   X   |  |  X   |  |
+21 |  X   |   X   |   X   |  |  X   |  |
+22 |  |   X   |   X   |  |  X   |  |
+23 |  X   |   X   |   X   |  |  X   |  |
+24 |  |   |   |  X   |  X   |  |
+25 |  X   |   |   |  X   |  X   |  |
+26 |  |   X   |   |  X   |  X   |  |
+27 |  X   |   X   |   |  X   |  X   |  |
+28 |  |   X   |   X   |  X   |  X   |  |
+29 |  X   |   X   |   X   |  X   |  X   |  |
+30 |  |   X   |   X   |  X   |  X   |  |
+31 |  X   |   X   |   X   |  X   |  X   |  |
+32 |  |   |   |  |  |  X   |
+33 |  X   |   |   |  |  |  X   |
+34 |  |   X   |   |  |  |  X   |
+35 |  X   |   X   |   |  |  |  X   |
+36 |  |   X   |   X   |  |  |  X   |
+37 |  X   |   X   |   X   |  |  |  X   |
+38 |  |   X   |   X   |  |  |  X   |
+39 |  X   |   X   |   X   |  |  |  X   |
+40 |  |   |   |  X   |  |  X   |
+41 |  X   |   |   |  X   |  |  X   |
+42 |  |   X   |   |  X   |  |  X   |
+43 |  X   |   X   |   |  X   |  |  X   |
+44 |  |   X   |   X   |  X   |  |  X   |
+45 |  X   |   X   |   X   |  X   |  |  X   |
+46 |  |   X   |   X   |  X   |  |  X   |
+47 |  X   |   X   |   X   |  X   |  |  X   |
+48 |  |   |   |  |  X   |  X

[PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions

2013-11-05 Thread Jingbai Ma

Add functions to exclude hugepage from vmcore dump.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 makedumpfile.c |  272 
 makedumpfile.h |   19 
 2 files changed, 289 insertions(+), 2 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index b42565c..f0b2531 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -46,6 +46,8 @@ unsigned long long pfn_cache_private;
 unsigned long long pfn_user;
 unsigned long long pfn_free;
 unsigned long long pfn_hwpoison;
+unsigned long long pfn_free_huge;
+unsigned long long pfn_active_huge;
 
 unsigned long long num_dumped;
 
@@ -1038,6 +1040,7 @@ get_symbol_info(void)
SYMBOL_INIT(mem_map, mem_map);
SYMBOL_INIT(vmem_map, vmem_map);
SYMBOL_INIT(mem_section, mem_section);
+   SYMBOL_INIT(hstates, hstates);
SYMBOL_INIT(pkmap_count, pkmap_count);
SYMBOL_INIT_NEXT(pkmap_count_next, pkmap_count);
SYMBOL_INIT(system_utsname, system_utsname);
@@ -1174,6 +1177,19 @@ get_structure_info(void)
OFFSET_INIT(list_head.prev, list_head, prev);
 
/*
+* Get offsets of the hstate's members.
+*/
+   SIZE_INIT(hstate, hstate);
+   OFFSET_INIT(hstate.order, hstate, order);
+   OFFSET_INIT(hstate.nr_huge_pages, hstate, nr_huge_pages);
+   OFFSET_INIT(hstate.free_huge_pages, hstate, free_huge_pages);
+   OFFSET_INIT(hstate.hugepage_activelist, hstate,
+   hugepage_activelist);
+   OFFSET_INIT(hstate.hugepage_freelists, hstate, hugepage_freelists);
+   MEMBER_ARRAY_LENGTH_INIT(hstate.hugepage_freelists, hstate,
+   hugepage_freelists);
+
+   /*
 * Get offsets of the node_memblk_s's members.
 */
SIZE_INIT(node_memblk_s, node_memblk_s);
@@ -1555,6 +1571,7 @@ write_vmcoreinfo_data(void)
WRITE_SYMBOL(mem_map, mem_map);
WRITE_SYMBOL(vmem_map, vmem_map);
WRITE_SYMBOL(mem_section, mem_section);
+   WRITE_SYMBOL(hstates, hstates);
WRITE_SYMBOL(pkmap_count, pkmap_count);
WRITE_SYMBOL(pkmap_count_next, pkmap_count_next);
WRITE_SYMBOL(system_utsname, system_utsname);
@@ -1590,6 +1607,7 @@ write_vmcoreinfo_data(void)
WRITE_STRUCTURE_SIZE(zone, zone);
WRITE_STRUCTURE_SIZE(free_area, free_area);
WRITE_STRUCTURE_SIZE(list_head, list_head);
+   WRITE_STRUCTURE_SIZE(hstate, hstate);
WRITE_STRUCTURE_SIZE(node_memblk_s, node_memblk_s);
WRITE_STRUCTURE_SIZE(nodemask_t, nodemask_t);
WRITE_STRUCTURE_SIZE(pageflags, pageflags);
@@ -1628,6 +1646,13 @@ write_vmcoreinfo_data(void)
WRITE_MEMBER_OFFSET(vm_struct.addr, vm_struct.addr);
WRITE_MEMBER_OFFSET(vmap_area.va_start, vmap_area.va_start);
WRITE_MEMBER_OFFSET(vmap_area.list, vmap_area.list);
+   WRITE_MEMBER_OFFSET(hstate.order, hstate.order);
+   WRITE_MEMBER_OFFSET(hstate.nr_huge_pages, hstate.nr_huge_pages);
+   WRITE_MEMBER_OFFSET(hstate.free_huge_pages, hstate.free_huge_pages);
+   WRITE_MEMBER_OFFSET(hstate.hugepage_activelist,
+   hstate.hugepage_activelist);
+   WRITE_MEMBER_OFFSET(hstate.hugepage_freelists,
+   hstate.hugepage_freelists);
WRITE_MEMBER_OFFSET(log.ts_nsec, log.ts_nsec);
WRITE_MEMBER_OFFSET(log.len, log.len);
WRITE_MEMBER_OFFSET(log.text_len, log.text_len);
@@ -1647,6 +1672,9 @@ write_vmcoreinfo_data(void)
WRITE_ARRAY_LENGTH(zone.free_area, zone.free_area);
WRITE_ARRAY_LENGTH(free_area.free_list, free_area.free_list);
 
+   WRITE_ARRAY_LENGTH(hstate.hugepage_freelists,
+   hstate.hugepage_freelists);
+
WRITE_NUMBER(NR_FREE_PAGES, NR_FREE_PAGES);
WRITE_NUMBER(N_ONLINE, N_ONLINE);
 
@@ -1659,6 +1687,8 @@ write_vmcoreinfo_data(void)
 
WRITE_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE, PAGE_BUDDY_MAPCOUNT_VALUE);
 
+   WRITE_NUMBER(HUGE_MAX_HSTATE, HUGE_MAX_HSTATE);
+
/*
 * write the source file of 1st kernel
 */
@@ -1874,6 +1904,7 @@ read_vmcoreinfo(void)
READ_SYMBOL(mem_map, mem_map);
READ_SYMBOL(vmem_map, vmem_map);
READ_SYMBOL(mem_section, mem_section);
+   READ_SYMBOL(hstates, hstates);
READ_SYMBOL(pkmap_count, pkmap_count);
READ_SYMBOL(pkmap_count_next, pkmap_count_next);
READ_SYMBOL(system_utsname, system_utsname);
@@ -1906,6 +1937,7 @@ read_vmcoreinfo(void)
READ_STRUCTURE_SIZE(zone, zone);
READ_STRUCTURE_SIZE(free_area, free_area);
READ_STRUCTURE_SIZE(list_head, list_head);
+   READ_STRUCTURE_SIZE(hstate, hstate);
READ_STRUCTURE_SIZE(node_memblk_s, node_memblk_s);
READ_STRUCTURE_SIZE(nodemask_t, nodemask_t);
READ_STRUCTURE_SIZE(pageflags, pageflags);
@@ -1940,6 +1972,13 @@ read_vmcoreinfo(void)
READ_MEMBER_OFFSET(vm_struct.addr, vm_struct.addr);
READ_MEMBER_OFFSET(vmap_area.va_start, vmap_area.va_start);
READ_MEMBER_OFFSET

[PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.

| cachecachefreeactive
  Dump  |  zero   without  with userfreehugehuge
  Level |  page   private  private  datapagepagepage
 ---+--
 0  |
 1  |   X
 2  |   X
 4  |   XX
 8  |X
16  |X
32  |X
64  |X   X
   127  |   X   XX   X   X   X   X

example:
To exclude all unnecessary pages:
makedumpfile -c --message-level 23 -d 127 /proc/vmcore /var/crash/kdump

To exclude all unnecessary pages but keep active hugepages:
makedumpfile -c --message-level 23 -d 63 /proc/vmcore /var/crash/kdump

---

Jingbai Ma (3):
  makedumpfile: hugepage filtering: add hugepage filtering functions
  makedumpfile: hugepage filtering: add excluding hugepage messages
  makedumpfile: hugepage filtering: add new dump levels for manual page


 makedumpfile.8 |  170 +++
 makedumpfile.c |  272 
 makedumpfile.h |   19 
 print_info.c   |   12 +-
 print_info.h   |2 
 5 files changed, 431 insertions(+), 44 deletions(-)

--

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] makedumpfile: hugepage filtering: add excluding hugepage messages

2013-11-05 Thread Jingbai Ma

Add messages for print_info.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 print_info.c |   12 +++-
 print_info.h |2 ++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/print_info.c b/print_info.c
index 06939e0..978d9fb 100644
--- a/print_info.c
+++ b/print_info.c
@@ -103,17 +103,19 @@ print_usage(void)
MSG(  The maximum of Dump_Level is 31.\n);
MSG(  Note that Dump_Level for Xen dump filtering is 0 or 1.\n);
MSG(\n);
-   MSG(| cachecache\n);
-   MSG(  Dump  |  zero   without  with userfree\n);
-   MSG(  Level |  page   private  private  datapage\n);
-   MSG( ---+---\n);
+   MSG(| cachecachefree
active\n);
+   MSG(  Dump  |  zero   without  with userfreehuge
huge\n);
+   MSG(  Level |  page   private  private  datapagepage
page\n);
+   MSG( 
---+--\n);
MSG( 0  |\n);
MSG( 1  |   X\n);
MSG( 2  |   X\n);
MSG( 4  |   XX\n);
MSG( 8  |X\n);
MSG(16  |X\n);
-   MSG(31  |   X   XX   X   X\n);
+   MSG(32  |X\n);
+   MSG(64  |X   
X\n);
+   MSG(   127  |   X   XX   X   X   X   
X\n);
MSG(\n);
MSG(  [-E]:\n);
MSG(  Create DUMPFILE in the ELF format.\n);
diff --git a/print_info.h b/print_info.h
index 01e3706..8461df6 100644
--- a/print_info.h
+++ b/print_info.h
@@ -35,6 +35,8 @@ void print_execution_time(char *step_name, struct timeval 
*tv_start);
 #define PROGRESS_HOLES Checking for memory holes  
 #define PROGRESS_UNN_PAGES Excluding unnecessary pages
 #define PROGRESS_FREE_PAGESExcluding free pages   
+#define PROGRESS_FREE_HUGE Excluding free huge pages  
+#define PROGRESS_ACTIVE_HUGE   Excluding active huge pages
 #define PROGRESS_ZERO_PAGESExcluding zero pages   
 #define PROGRESS_XEN_DOMAINExcluding xen user domain  
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma


On 11/06/2013 04:26 AM, Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


Yes, free hugepages should be filtered out with other free pages. It 
sounds reasonable.


But for active hugepages, I would offer user more choices/flexibility. 
(maybe bad).

I'm OK to filter active hugepages with other user data page.

Any other comments?




Thanks
Vivek



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issue after unsetting BSP flag

2013-08-14 Thread Jingbai Ma

On 08/13/2013 06:55 PM, Jingbai Ma wrote:
> On 08/06/2013 05:19 PM, HATAYAMA Daisuke wrote:
>> Hello,
>>
>> I've addressing kdump restriction that there's only one cpu available
>> on the kdump 2nd kernel. Now I need to check if the following CPU0 SMI
>> corruption issue fixed in the following commit can again be reproduced
>> by unsetting BSP flag of the boot cpu:
>>
>> commit 74b5820808215f65b70b05a099d6d3c969b82689
>> Author: Bjorn Helgaas
>> Date:   Wed Jul 29 15:54:25 2009 -0600
>>
>>   ACPI: bind workqueues to CPU 0 to avoid SMI corruption
>>
>>   On some machines, a software-initiated SMI causes corruption unless the
>>   SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically 
>> it's
>>   done in GPE-related methods that are run via workqueues, so we can 
>> avoid
>>   the known corruption cases by binding the workqueues to CPU 0.
>>
>>   References:
>>   http://bugzilla.kernel.org/show_bug.cgi?id=13751
>>   https://bugs.launchpad.net/bugs/157171
>>   https://bugs.launchpad.net/bugs/157691
>>
>>   Signed-off-by: Bjorn Helgaas
>>   Signed-off-by: Len Brown
>>
>> The reason is that in the current situation, I have two ideas to deal
>> with the avove kdump restriction:
>>
>> 1) Disable BSP at the 2nd kernel, posted at:
>>   [PATCH v1 0/2] x86, apic: Disable BSP if boot cpu is AP
>>   https://lkml.org/lkml/2012/10/16/15
>>
>> 2) Unset BSP flag at the 1st kernel, suggested by Eric Biederman
>>during the discussion of the idea 1).
>>
>> On the idea 1), BSP is disabled on the kdump 2nd kernel. My conclusion
>> is that we have no method to reset BSP, i.e. recover BPS's healthy
>> state, while we can recover AP by means of INIT as described in MP
>> specification.
>>
>> The idea 2) is simpler. We unset BSP flag of the boot cpu at 1st
>> kernel. The behaviour when receiving INIT depends on whether or not
>> BSP flag is set or not on its MSR; we can set and unset BSP flag of
>> MSR freely at runtime. (I don't mean we should).
>>
>> So, next thing I should do is to evalute risk of the idea 2). In fact,
>> during the discussion of the idea 1), HPA pointed out that some kind
>> of firmware affects if BSP flag is unset. Also, maybe from the same
>> reason, recently introduced cpu0 hot-plugging feature by Fenghua Yu
>> doesn't appear to unset BSP flag.
>>
>> The biggest problem next is that I don't have any machines reported in
>> the bugzilla articles; this issue inherently depends on firmware.
>>
>> So, could anyone help testing the idea 2) above if you have which of
>> the following machines? (or other ones that can lead to the same bug)
>>
>> - HP Compaq 6910p
>> - HP Compaq 6710b
>> - HP Compaq 6710s
>> - HP Compaq 6510b
>> - HP Compaq 2510p
>>
>> I prepared a small programs for this test. See the attached file.
>> The steps to try to reproduce the bug is as follows:
>>
>> 1. $ tar xf bsp_flag_modules.tar.gz; cd bsp_flag_modules
>> 2. $ make # to build these programs
>> 3. $ insmod unsetbspflag.ko # to unset BSP flag of the boot cpu
>> 4. $ insmod getcpuinfo.ko # to confirm if BSP flag of the boot cpu has
>>   # been unset.
>>$ dmesg | tail
>> 5. Close the lid of the machine.
>> 6. Wait some minutes if necessary.
>> 7. Open the lid and you can see oops on the screen if bug has
>>   successfully been reproduced.
>>
> 
> I couldn't find any model list above, but found one HP EliteBook 6930p.
> I tested this machine with kernel 2.6.30 first. After resuming from
> suspend, system hang.
> 
> Then, I tested with kernel 3.11.0-rc5, it worked well, could resume from
> suspend without any problem.
> 
> Next, I tested your program to clear BSP flag, I found the
> unsetbspflag.ko didn't work everytime, sometimes I have to execute
> insmod/rmmod several times to clear the BSP flag. (I used your
> getcpuinfo.ko to check the BSP flag)
> 
> cpu: 0 bios_apic: 0 apic: 0 AP
> cpu: 1 bios_apic: 1 apic: 1 AP
> 
> I suspended it, and them resumed it. This machine resumed from suspend
> successfully, but the BSP flag has been set back:
> 
> cpu: 0 bios_apic: 0 apic: 0 BSP
> cpu: 1 bios_apic: 1 apic: 1 AP
> 
> That's all my observation. Hope it's helpful.
> 

I found a side effect of unsetting BSP flag.
It affected system rebooting, once the BSP flags been removed, and issue
reboot command, system will hang after message:

Re: [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issue after unsetting BSP flag

2013-08-14 Thread Jingbai Ma

On 08/13/2013 06:55 PM, Jingbai Ma wrote:
 On 08/06/2013 05:19 PM, HATAYAMA Daisuke wrote:
 Hello,

 I've addressing kdump restriction that there's only one cpu available
 on the kdump 2nd kernel. Now I need to check if the following CPU0 SMI
 corruption issue fixed in the following commit can again be reproduced
 by unsetting BSP flag of the boot cpu:

 commit 74b5820808215f65b70b05a099d6d3c969b82689
 Author: Bjorn Helgaasbjorn.helg...@hp.com
 Date:   Wed Jul 29 15:54:25 2009 -0600

   ACPI: bind workqueues to CPU 0 to avoid SMI corruption

   On some machines, a software-initiated SMI causes corruption unless the
   SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically 
 it's
   done in GPE-related methods that are run via workqueues, so we can 
 avoid
   the known corruption cases by binding the workqueues to CPU 0.

   References:
   http://bugzilla.kernel.org/show_bug.cgi?id=13751
   https://bugs.launchpad.net/bugs/157171
   https://bugs.launchpad.net/bugs/157691

   Signed-off-by: Bjorn Helgaasbjorn.helg...@hp.com
   Signed-off-by: Len Brownlen.br...@intel.com

 The reason is that in the current situation, I have two ideas to deal
 with the avove kdump restriction:

 1) Disable BSP at the 2nd kernel, posted at:
   [PATCH v1 0/2] x86, apic: Disable BSP if boot cpu is AP
   https://lkml.org/lkml/2012/10/16/15

 2) Unset BSP flag at the 1st kernel, suggested by Eric Biederman
during the discussion of the idea 1).

 On the idea 1), BSP is disabled on the kdump 2nd kernel. My conclusion
 is that we have no method to reset BSP, i.e. recover BPS's healthy
 state, while we can recover AP by means of INIT as described in MP
 specification.

 The idea 2) is simpler. We unset BSP flag of the boot cpu at 1st
 kernel. The behaviour when receiving INIT depends on whether or not
 BSP flag is set or not on its MSR; we can set and unset BSP flag of
 MSR freely at runtime. (I don't mean we should).

 So, next thing I should do is to evalute risk of the idea 2). In fact,
 during the discussion of the idea 1), HPA pointed out that some kind
 of firmware affects if BSP flag is unset. Also, maybe from the same
 reason, recently introduced cpu0 hot-plugging feature by Fenghua Yu
 doesn't appear to unset BSP flag.

 The biggest problem next is that I don't have any machines reported in
 the bugzilla articles; this issue inherently depends on firmware.

 So, could anyone help testing the idea 2) above if you have which of
 the following machines? (or other ones that can lead to the same bug)

 - HP Compaq 6910p
 - HP Compaq 6710b
 - HP Compaq 6710s
 - HP Compaq 6510b
 - HP Compaq 2510p

 I prepared a small programs for this test. See the attached file.
 The steps to try to reproduce the bug is as follows:

 1. $ tar xf bsp_flag_modules.tar.gz; cd bsp_flag_modules
 2. $ make # to build these programs
 3. $ insmod unsetbspflag.ko # to unset BSP flag of the boot cpu
 4. $ insmod getcpuinfo.ko # to confirm if BSP flag of the boot cpu has
   # been unset.
$ dmesg | tail
 5. Close the lid of the machine.
 6. Wait some minutes if necessary.
 7. Open the lid and you can see oops on the screen if bug has
   successfully been reproduced.

 
 I couldn't find any model list above, but found one HP EliteBook 6930p.
 I tested this machine with kernel 2.6.30 first. After resuming from
 suspend, system hang.
 
 Then, I tested with kernel 3.11.0-rc5, it worked well, could resume from
 suspend without any problem.
 
 Next, I tested your program to clear BSP flag, I found the
 unsetbspflag.ko didn't work everytime, sometimes I have to execute
 insmod/rmmod several times to clear the BSP flag. (I used your
 getcpuinfo.ko to check the BSP flag)
 
 cpu: 0 bios_apic: 0 apic: 0 AP
 cpu: 1 bios_apic: 1 apic: 1 AP
 
 I suspended it, and them resumed it. This machine resumed from suspend
 successfully, but the BSP flag has been set back:
 
 cpu: 0 bios_apic: 0 apic: 0 BSP
 cpu: 1 bios_apic: 1 apic: 1 AP
 
 That's all my observation. Hope it's helpful.
 

I found a side effect of unsetting BSP flag.
It affected system rebooting, once the BSP flags been removed, and issue
reboot command, system will hang after message:
Restarting system.
And have to do a hardware reset to recover it.

I have reproduced this problem on the following systems:
HP EliteBook 6930p
HP Compaq DC7700
HP ProLiant DL980 (4 sockets, 40 cores)

I have an idea: To avoid such kind of issue, we can unset BSP flag in
the first kernel during crash processing, and restore it in the second
kernel in the APs initializing.

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issue after unsetting BSP flag

2013-08-13 Thread Jingbai Ma

On 08/06/2013 05:19 PM, HATAYAMA Daisuke wrote:
> Hello,
> 
> I've addressing kdump restriction that there's only one cpu available
> on the kdump 2nd kernel. Now I need to check if the following CPU0 SMI
> corruption issue fixed in the following commit can again be reproduced
> by unsetting BSP flag of the boot cpu:
> 
> commit 74b5820808215f65b70b05a099d6d3c969b82689
> Author: Bjorn Helgaas
> Date:   Wed Jul 29 15:54:25 2009 -0600
> 
>  ACPI: bind workqueues to CPU 0 to avoid SMI corruption
> 
>  On some machines, a software-initiated SMI causes corruption unless the
>  SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically 
> it's
>  done in GPE-related methods that are run via workqueues, so we can avoid
>  the known corruption cases by binding the workqueues to CPU 0.
> 
>  References:
>  http://bugzilla.kernel.org/show_bug.cgi?id=13751
>  https://bugs.launchpad.net/bugs/157171
>  https://bugs.launchpad.net/bugs/157691
> 
>  Signed-off-by: Bjorn Helgaas
>  Signed-off-by: Len Brown
> 
> The reason is that in the current situation, I have two ideas to deal
> with the avove kdump restriction:
> 
>1) Disable BSP at the 2nd kernel, posted at:
>  [PATCH v1 0/2] x86, apic: Disable BSP if boot cpu is AP
>  https://lkml.org/lkml/2012/10/16/15
> 
>2) Unset BSP flag at the 1st kernel, suggested by Eric Biederman
>   during the discussion of the idea 1).
> 
> On the idea 1), BSP is disabled on the kdump 2nd kernel. My conclusion
> is that we have no method to reset BSP, i.e. recover BPS's healthy
> state, while we can recover AP by means of INIT as described in MP
> specification.
> 
> The idea 2) is simpler. We unset BSP flag of the boot cpu at 1st
> kernel. The behaviour when receiving INIT depends on whether or not
> BSP flag is set or not on its MSR; we can set and unset BSP flag of
> MSR freely at runtime. (I don't mean we should).
> 
> So, next thing I should do is to evalute risk of the idea 2). In fact,
> during the discussion of the idea 1), HPA pointed out that some kind
> of firmware affects if BSP flag is unset. Also, maybe from the same
> reason, recently introduced cpu0 hot-plugging feature by Fenghua Yu
> doesn't appear to unset BSP flag.
> 
> The biggest problem next is that I don't have any machines reported in
> the bugzilla articles; this issue inherently depends on firmware.
> 
> So, could anyone help testing the idea 2) above if you have which of
> the following machines? (or other ones that can lead to the same bug)
> 
> - HP Compaq 6910p
> - HP Compaq 6710b
> - HP Compaq 6710s
> - HP Compaq 6510b
> - HP Compaq 2510p
> 
> I prepared a small programs for this test. See the attached file.
> The steps to try to reproduce the bug is as follows:
> 
>1. $ tar xf bsp_flag_modules.tar.gz; cd bsp_flag_modules
>2. $ make # to build these programs
>3. $ insmod unsetbspflag.ko # to unset BSP flag of the boot cpu
>4. $ insmod getcpuinfo.ko # to confirm if BSP flag of the boot cpu has
>  # been unset.
>   $ dmesg | tail
>5. Close the lid of the machine.
>6. Wait some minutes if necessary.
>7. Open the lid and you can see oops on the screen if bug has
>  successfully been reproduced.
> 

I couldn't find any model list above, but found one HP EliteBook 6930p.
I tested this machine with kernel 2.6.30 first. After resuming from
suspend, system hang.

Then, I tested with kernel 3.11.0-rc5, it worked well, could resume from
suspend without any problem.

Next, I tested your program to clear BSP flag, I found the
unsetbspflag.ko didn't work everytime, sometimes I have to execute
insmod/rmmod several times to clear the BSP flag. (I used your
getcpuinfo.ko to check the BSP flag)

cpu: 0 bios_apic: 0 apic: 0 AP
cpu: 1 bios_apic: 1 apic: 1 AP

I suspended it, and them resumed it. This machine resumed from suspend
successfully, but the BSP flag has been set back:

cpu: 0 bios_apic: 0 apic: 0 BSP
cpu: 1 bios_apic: 1 apic: 1 AP

That's all my observation. Hope it's helpful.

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issue after unsetting BSP flag

2013-08-13 Thread Jingbai Ma

On 08/06/2013 05:19 PM, HATAYAMA Daisuke wrote:
 Hello,
 
 I've addressing kdump restriction that there's only one cpu available
 on the kdump 2nd kernel. Now I need to check if the following CPU0 SMI
 corruption issue fixed in the following commit can again be reproduced
 by unsetting BSP flag of the boot cpu:
 
 commit 74b5820808215f65b70b05a099d6d3c969b82689
 Author: Bjorn Helgaasbjorn.helg...@hp.com
 Date:   Wed Jul 29 15:54:25 2009 -0600
 
  ACPI: bind workqueues to CPU 0 to avoid SMI corruption
 
  On some machines, a software-initiated SMI causes corruption unless the
  SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically 
 it's
  done in GPE-related methods that are run via workqueues, so we can avoid
  the known corruption cases by binding the workqueues to CPU 0.
 
  References:
  http://bugzilla.kernel.org/show_bug.cgi?id=13751
  https://bugs.launchpad.net/bugs/157171
  https://bugs.launchpad.net/bugs/157691
 
  Signed-off-by: Bjorn Helgaasbjorn.helg...@hp.com
  Signed-off-by: Len Brownlen.br...@intel.com
 
 The reason is that in the current situation, I have two ideas to deal
 with the avove kdump restriction:
 
1) Disable BSP at the 2nd kernel, posted at:
  [PATCH v1 0/2] x86, apic: Disable BSP if boot cpu is AP
  https://lkml.org/lkml/2012/10/16/15
 
2) Unset BSP flag at the 1st kernel, suggested by Eric Biederman
   during the discussion of the idea 1).
 
 On the idea 1), BSP is disabled on the kdump 2nd kernel. My conclusion
 is that we have no method to reset BSP, i.e. recover BPS's healthy
 state, while we can recover AP by means of INIT as described in MP
 specification.
 
 The idea 2) is simpler. We unset BSP flag of the boot cpu at 1st
 kernel. The behaviour when receiving INIT depends on whether or not
 BSP flag is set or not on its MSR; we can set and unset BSP flag of
 MSR freely at runtime. (I don't mean we should).
 
 So, next thing I should do is to evalute risk of the idea 2). In fact,
 during the discussion of the idea 1), HPA pointed out that some kind
 of firmware affects if BSP flag is unset. Also, maybe from the same
 reason, recently introduced cpu0 hot-plugging feature by Fenghua Yu
 doesn't appear to unset BSP flag.
 
 The biggest problem next is that I don't have any machines reported in
 the bugzilla articles; this issue inherently depends on firmware.
 
 So, could anyone help testing the idea 2) above if you have which of
 the following machines? (or other ones that can lead to the same bug)
 
 - HP Compaq 6910p
 - HP Compaq 6710b
 - HP Compaq 6710s
 - HP Compaq 6510b
 - HP Compaq 2510p
 
 I prepared a small programs for this test. See the attached file.
 The steps to try to reproduce the bug is as follows:
 
1. $ tar xf bsp_flag_modules.tar.gz; cd bsp_flag_modules
2. $ make # to build these programs
3. $ insmod unsetbspflag.ko # to unset BSP flag of the boot cpu
4. $ insmod getcpuinfo.ko # to confirm if BSP flag of the boot cpu has
  # been unset.
   $ dmesg | tail
5. Close the lid of the machine.
6. Wait some minutes if necessary.
7. Open the lid and you can see oops on the screen if bug has
  successfully been reproduced.
 

I couldn't find any model list above, but found one HP EliteBook 6930p.
I tested this machine with kernel 2.6.30 first. After resuming from
suspend, system hang.

Then, I tested with kernel 3.11.0-rc5, it worked well, could resume from
suspend without any problem.

Next, I tested your program to clear BSP flag, I found the
unsetbspflag.ko didn't work everytime, sometimes I have to execute
insmod/rmmod several times to clear the BSP flag. (I used your
getcpuinfo.ko to check the BSP flag)

cpu: 0 bios_apic: 0 apic: 0 AP
cpu: 1 bios_apic: 1 apic: 1 AP

I suspended it, and them resumed it. This machine resumed from suspend
successfully, but the BSP flag has been set back:

cpu: 0 bios_apic: 0 apic: 0 BSP
cpu: 1 bios_apic: 1 apic: 1 AP

That's all my observation. Hope it's helpful.

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

makedumpfile 1.5.4 + kernel 3.11-rc2+ 4TB tests

2013-07-26 Thread Jingbai Ma


Hi,

I have run some tests with makedumpfile 1.5.4 and upstream kernel 
3.11-rc2+ on a machine with 4TB memory, here is testing results:


Test environment:
Machine: HP ProLiant DL980 G7 with 4TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 CPU was enabled the 2nd kernel)
Kernel: 3.11.0-rc2+ (at patch b3a3a9c441e2c8f6b6760de9331023a7906a4ac6)
crashkernel=384MB
vmcore size: 4.0TB
Dump file size: 15GB
All measured time from debug message of makedumpfile.
As a comparison, I also have tested makedumpfile 1.5.3.

(all time in seconds)
 Excluding pages   Copy data   Total
makedumpfile 1.5.3  468 1182   1650
makedumpfile 1.5.4   93  518611


So it seems there is a great performance improvement by the mmap mechanism.

--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

makedumpfile 1.5.4 + kernel 3.11-rc2+ 4TB tests

2013-07-26 Thread Jingbai Ma


Hi,

I have run some tests with makedumpfile 1.5.4 and upstream kernel 
3.11-rc2+ on a machine with 4TB memory, here is testing results:


Test environment:
Machine: HP ProLiant DL980 G7 with 4TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 CPU was enabled the 2nd kernel)
Kernel: 3.11.0-rc2+ (at patch b3a3a9c441e2c8f6b6760de9331023a7906a4ac6)
crashkernel=384MB
vmcore size: 4.0TB
Dump file size: 15GB
All measured time from debug message of makedumpfile.
As a comparison, I also have tested makedumpfile 1.5.3.

(all time in seconds)
 Excluding pages   Copy data   Total
makedumpfile 1.5.3  468 1182   1650
makedumpfile 1.5.4   93  518611


So it seems there is a great performance improvement by the mmap mechanism.

--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

makedumpfile parallel dumping test

2013-04-22 Thread Jingbai Ma


Hi all,

I have done some experiments on parallel kernel dumping. I would like to 
share the test result with you. Hope it helps.


Test environment:
Machine: HP ProLiant DL980 G7 with 4TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores) (4 
CPU were enabled the 2nd kernel by nr_cpus=4)

Kernel 3.9.0-rc7
kexec-tools 2.0.4
makedumpfile v1.5.3 with lzo library
crashkernel=4096M (I have tested with 2048M but failed with OOM on 3 or 
4 parallels dumping in cyclic mode)


I didn't get a real multipath storage device, so I just put dump files 
on 4 different disks via 3 HP Smart Array controllers. (mounted on /0, 
/1, /2 and /3 in the capture kernel)


Measured time like this (for example: lzo compression, non-cyclic, 4 
parallels):
time makedumpfile -l -non-cyclic --split --message-level 23 -d 31 
/proc/vmcore /0/vmcore_0 /1/vmcore_1 /2/vmcore_2 /3/vmcore_3


I run several tests with different option, parallels from 1 to 4, and 
combined with zlib and lzo compression.


Test result:
-
|   |Parallels 1|Parallels 2|Parallels 3|Parallels 4|
-
|zlib cyclic| 42m25.321s|  34m0.168s| 29m44.908s| 28m50.387s|
-
|zlib non-cyclic|  42m7.842s| 28m28.275s| 23m25.750s|  21m6.476s|
-
|lzo cyclic | 23m40.010s| 18m19.932s| 21m47.903s| 22m47.605s|
-
|lzo non-cyclic | 20m45.749s| 16m42.045s| 15m41.070s| 15m18.605s|
-


--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

makedumpfile parallel dumping test

2013-04-22 Thread Jingbai Ma


Hi all,

I have done some experiments on parallel kernel dumping. I would like to 
share the test result with you. Hope it helps.


Test environment:
Machine: HP ProLiant DL980 G7 with 4TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores) (4 
CPU were enabled the 2nd kernel by nr_cpus=4)

Kernel 3.9.0-rc7
kexec-tools 2.0.4
makedumpfile v1.5.3 with lzo library
crashkernel=4096M (I have tested with 2048M but failed with OOM on 3 or 
4 parallels dumping in cyclic mode)


I didn't get a real multipath storage device, so I just put dump files 
on 4 different disks via 3 HP Smart Array controllers. (mounted on /0, 
/1, /2 and /3 in the capture kernel)


Measured time like this (for example: lzo compression, non-cyclic, 4 
parallels):
time makedumpfile -l -non-cyclic --split --message-level 23 -d 31 
/proc/vmcore /0/vmcore_0 /1/vmcore_1 /2/vmcore_2 /3/vmcore_3


I run several tests with different option, parallels from 1 to 4, and 
combined with zlib and lzo compression.


Test result:
-
|   |Parallels 1|Parallels 2|Parallels 3|Parallels 4|
-
|zlib cyclic| 42m25.321s|  34m0.168s| 29m44.908s| 28m50.387s|
-
|zlib non-cyclic|  42m7.842s| 28m28.275s| 23m25.750s|  21m6.476s|
-
|lzo cyclic | 23m40.010s| 18m19.932s| 21m47.903s| 22m47.605s|
-
|lzo non-cyclic | 20m45.749s| 16m42.045s| 15m41.070s| 15m18.605s|
-


--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: makedumpfile mmap() benchmark

2013-03-27 Thread Jingbai Ma

On 03/27/2013 02:23 PM, HATAYAMA Daisuke wrote:

From: Jingbai Ma
Subject: makedumpfile mmap() benchmark
Date: Wed, 27 Mar 2013 13:51:37 +0800

Hi,

I have tested the makedumpfile mmap patch on a machine with 2TB
memory, here is testing results:

Thanks for your benchmark. It's very helpful to see the benchmark on
different environments.

Thanks for your patch, there is a great performance improvement, very 
impressive!

Test environment:
Machine: HP ProLiant DL980 G7 with 2TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 cpu was enabled the 2nd kernel)
Kernel: 3.9.0-rc3+ with mmap kernel patch v3
vmcore size: 2.0TB
Dump file size: 3.6GB
makedumpfile mmap branch with parameters: -c --message-level 23 -d 31
--map-size

To reduce the benchmark time, I recommend LZO or snappy compressions
rather than zlib. zlib is used when -c option is specified, and it's
too slow for use of crash dump.

That's a very helpful suggestion, I will try it again with LZO/snappy 
lib again.

To build makedumpfile with each compression format supports, do
USELZO=on or USESNAPPY=on after installing necessary libraries.

All measured time from debug message of makedumpfile.

As a comparison, I also have tested with original kernel and original
makedumpfile 1.5.1 and 1.5.3.
I added all [Excluding unnecessary pages] and [Excluding free pages]
time together as "Filter Pages", and [Copyying Data] as "Copy data"
here.

makedumjpfile Kernel map-size (KB) Filter pages (s) Copy data (s)
Total (s)
1.5.13.7.0-0.36.el7.x86_64  N/A 940.28  1269.25 2209.53
1.5.33.7.0-0.36.el7.x86_64  N/A 380.09  992.77  1372.86
1.5.3   v3.9-rc3N/A 197.77  892.27  1090.04
1.5.3+mmap  v3.9-rc3+mmap   0   164.87  606.06  770.93
1.5.3+mmap  v3.9-rc3+mmap   4   88.62   576.07  664.69
1.5.3+mmap  v3.9-rc3+mmap   102483.66   477.23  560.89
1.5.3+mmap  v3.9-rc3+mmap   204883.44   477.21  560.65
1.5.3+mmap  v3.9-rc3+mmap   10240   83.84   476.56  560.4

Did you calculate "Filter pages" by adding two [Excluding unnecessary
pages] lines? The first one of the two line is displayed by
get_num_dumpable_cyclic() during the calculation of the total number
of dumpable pages, which is later used to print progress of writing
pages in percentage.

For example, here is the log, where the number of cycles is 3, and

mem_map (16399)
   mem_map: ea0801e0
   pfn_start  : 20078000
   pfn_end: 2008
read /proc/vmcore with mmap()
STEP [Excluding unnecessary pages] : 13.703842 seconds<-- this part is by 
get_num_dumpable_cyclic()
STEP [Excluding unnecessary pages] : 13.842656 seconds
STEP [Excluding unnecessary pages] : 6.857910 seconds
STEP [Excluding unnecessary pages] : 13.554281 seconds<-- this part is by the 
main filtering processing.
STEP [Excluding unnecessary pages] : 14.103593 seconds
STEP [Excluding unnecessary pages] : 7.114239 seconds
STEP [Copying data   ] : 138.442116 seconds
Writing erase info...
offset_eraseinfo: 1f4680e40, size_eraseinfo: 0

Original pages  : 0x1ffc28a4

So, get_num_dumpable_cyclic() actually does filtering operation but it
should not be included here.

If so, I guess each measured time would be about 42 seconds, right?
Then, it's almost same as the result I posted today: 35 seconds.

Yes, I added them together, the following is one dump message log:

makedumpfile  -c --message-level 23 -d 31 --map-size 10240 /proc/vmcore 
/sysroot/var/crash/vmcore_10240

cyclic buffer size has been changed: 77661798 => 77661184
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 24.17 seconds
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 17.291935 seconds
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 24.498559 seconds
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 17.278414 seconds
Copying data   : [100 %] STEP [Copying data 
  ] : 476.563428 seconds

Original pages  : 0x1ffe874d
  Excluded pages   : 0x1f79429e
Pages filled with zero  : 0x002b4c9c
Cache pages : 0x000493bc
Cache pages + private   : 0x11f3
User process data pages : 0x5c55
Free pages  : 0x1f48f3fe
Hwpoison pages  : 0x
  Remaining pages  : 0x008544af
  (The number of pages is reduced to 1%.)
Memory Hole : 0x1c0178b3
--
Total pages : 0x3c00

Thanks.
HATAYAMA, Daisuke

--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: makedumpfile mmap() benchmark

2013-03-27 Thread Jingbai Ma

On 03/27/2013 02:23 PM, HATAYAMA Daisuke wrote:

From: Jingbai Majingbai...@hp.com
Subject: makedumpfile mmap() benchmark
Date: Wed, 27 Mar 2013 13:51:37 +0800

Hi,

I have tested the makedumpfile mmap patch on a machine with 2TB
memory, here is testing results:

Thanks for your benchmark. It's very helpful to see the benchmark on
different environments.

Thanks for your patch, there is a great performance improvement, very 
impressive!

Test environment:
Machine: HP ProLiant DL980 G7 with 2TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 cpu was enabled the 2nd kernel)
Kernel: 3.9.0-rc3+ with mmap kernel patch v3
vmcore size: 2.0TB
Dump file size: 3.6GB
makedumpfile mmap branch with parameters: -c --message-level 23 -d 31
--map-sizemap-size

To reduce the benchmark time, I recommend LZO or snappy compressions
rather than zlib. zlib is used when -c option is specified, and it's
too slow for use of crash dump.

That's a very helpful suggestion, I will try it again with LZO/snappy 
lib again.

To build makedumpfile with each compression format supports, do
USELZO=on or USESNAPPY=on after installing necessary libraries.

All measured time from debug message of makedumpfile.

As a comparison, I also have tested with original kernel and original
makedumpfile 1.5.1 and 1.5.3.
I added all [Excluding unnecessary pages] and [Excluding free pages]
time together as Filter Pages, and [Copyying Data] as Copy data
here.

makedumjpfile Kernel map-size (KB) Filter pages (s) Copy data (s)
Total (s)
1.5.13.7.0-0.36.el7.x86_64  N/A 940.28  1269.25 2209.53
1.5.33.7.0-0.36.el7.x86_64  N/A 380.09  992.77  1372.86
1.5.3   v3.9-rc3N/A 197.77  892.27  1090.04
1.5.3+mmap  v3.9-rc3+mmap   0   164.87  606.06  770.93
1.5.3+mmap  v3.9-rc3+mmap   4   88.62   576.07  664.69
1.5.3+mmap  v3.9-rc3+mmap   102483.66   477.23  560.89
1.5.3+mmap  v3.9-rc3+mmap   204883.44   477.21  560.65
1.5.3+mmap  v3.9-rc3+mmap   10240   83.84   476.56  560.4

Did you calculate Filter pages by adding two [Excluding unnecessary
pages] lines? The first one of the two line is displayed by
get_num_dumpable_cyclic() during the calculation of the total number
of dumpable pages, which is later used to print progress of writing
pages in percentage.

For example, here is the log, where the number of cycles is 3, and

mem_map (16399)
   mem_map: ea0801e0
   pfn_start  : 20078000
   pfn_end: 2008
read /proc/vmcore with mmap()
STEP [Excluding unnecessary pages] : 13.703842 seconds-- this part is by 
get_num_dumpable_cyclic()
STEP [Excluding unnecessary pages] : 13.842656 seconds
STEP [Excluding unnecessary pages] : 6.857910 seconds
STEP [Excluding unnecessary pages] : 13.554281 seconds-- this part is by the 
main filtering processing.
STEP [Excluding unnecessary pages] : 14.103593 seconds
STEP [Excluding unnecessary pages] : 7.114239 seconds
STEP [Copying data   ] : 138.442116 seconds
Writing erase info...
offset_eraseinfo: 1f4680e40, size_eraseinfo: 0

Original pages  : 0x1ffc28a4
cut

So, get_num_dumpable_cyclic() actually does filtering operation but it
should not be included here.

If so, I guess each measured time would be about 42 seconds, right?
Then, it's almost same as the result I posted today: 35 seconds.

Yes, I added them together, the following is one dump message log:
Log
makedumpfile  -c --message-level 23 -d 31 --map-size 10240 /proc/vmcore 
/sysroot/var/crash/vmcore_10240

cyclic buffer size has been changed: 77661798 = 77661184
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 24.17 seconds
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 17.291935 seconds
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 24.498559 seconds
Excluding unnecessary pages: [100 %] STEP [Excluding unnecessary 
pages] : 17.278414 seconds
Copying data   : [100 %] STEP [Copying data 
  ] : 476.563428 seconds

Original pages  : 0x1ffe874d
  Excluded pages   : 0x1f79429e
Pages filled with zero  : 0x002b4c9c
Cache pages : 0x000493bc
Cache pages + private   : 0x11f3
User process data pages : 0x5c55
Free pages  : 0x1f48f3fe
Hwpoison pages  : 0x
  Remaining pages  : 0x008544af
  (The number of pages is reduced to 1%.)
Memory Hole : 0x1c0178b3
--
Total pages : 0x3c00
/Log

Thanks.
HATAYAMA, Daisuke

--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

makedumpfile mmap() benchmark

2013-03-26 Thread Jingbai Ma


Hi,

I have tested the makedumpfile mmap patch on a machine with 2TB memory, 
here is testing results:

Test environment:
Machine: HP ProLiant DL980 G7 with 2TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 cpu was enabled the 2nd kernel)
Kernel: 3.9.0-rc3+ with mmap kernel patch v3
vmcore size: 2.0TB
Dump file size: 3.6GB
makedumpfile mmap branch with parameters: -c --message-level 23 -d 31 
--map-size 

All measured time from debug message of makedumpfile.

As a comparison, I also have tested with original kernel and original 
makedumpfile 1.5.1 and 1.5.3.
I added all [Excluding unnecessary pages] and [Excluding free pages] 
time together as "Filter Pages", and [Copyying Data] as "Copy data" here.


makedumjpfile   Kernel  map-size (KB)   Filter pages (s)Copy data (s)   
Total (s)
1.5.13.7.0-0.36.el7.x86_64  N/A 940.28  1269.25 2209.53
1.5.33.7.0-0.36.el7.x86_64  N/A 380.09  992.77  1372.86
1.5.3   v3.9-rc3N/A 197.77  892.27  1090.04
1.5.3+mmap  v3.9-rc3+mmap   0   164.87  606.06  770.93
1.5.3+mmap  v3.9-rc3+mmap   4   88.62   576.07  664.69
1.5.3+mmap  v3.9-rc3+mmap   102483.66   477.23  560.89
1.5.3+mmap  v3.9-rc3+mmap   204883.44   477.21  560.65
1.5.3+mmap  v3.9-rc3+mmap   10240   83.84   476.56  560.4


Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

makedumpfile mmap() benchmark

2013-03-26 Thread Jingbai Ma


Hi,

I have tested the makedumpfile mmap patch on a machine with 2TB memory, 
here is testing results:

Test environment:
Machine: HP ProLiant DL980 G7 with 2TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 cpu was enabled the 2nd kernel)
Kernel: 3.9.0-rc3+ with mmap kernel patch v3
vmcore size: 2.0TB
Dump file size: 3.6GB
makedumpfile mmap branch with parameters: -c --message-level 23 -d 31 
--map-size map-size

All measured time from debug message of makedumpfile.

As a comparison, I also have tested with original kernel and original 
makedumpfile 1.5.1 and 1.5.3.
I added all [Excluding unnecessary pages] and [Excluding free pages] 
time together as Filter Pages, and [Copyying Data] as Copy data here.


makedumjpfile   Kernel  map-size (KB)   Filter pages (s)Copy data (s)   
Total (s)
1.5.13.7.0-0.36.el7.x86_64  N/A 940.28  1269.25 2209.53
1.5.33.7.0-0.36.el7.x86_64  N/A 380.09  992.77  1372.86
1.5.3   v3.9-rc3N/A 197.77  892.27  1090.04
1.5.3+mmap  v3.9-rc3+mmap   0   164.87  606.06  770.93
1.5.3+mmap  v3.9-rc3+mmap   4   88.62   576.07  664.69
1.5.3+mmap  v3.9-rc3+mmap   102483.66   477.23  560.89
1.5.3+mmap  v3.9-rc3+mmap   204883.44   477.21  560.65
1.5.3+mmap  v3.9-rc3+mmap   10240   83.84   476.56  560.4


Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-12 Thread Jingbai Ma


On 03/11/2013 05:42 PM, Eric W. Biederman wrote:

Jingbai Ma  writes:


On 03/08/2013 11:52 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 01:54:45PM -0800, Eric W. Biederman wrote:

Vivek Goyal   writes:


On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.


Actually it does not appear to be doing any work in the first kernel.


Looks like patch3 in series is doing that.

  machine_crash_shutdown(_regs);
+   generate_crash_dump_bitmap();
  machine_kexec(kexec_crash_image);

So this bitmap seems to be being set just before transitioning into
second kernel.

I am sure you would not like this extra code in this path. :-)


I was thought this function code is pretty simple, could be called
here safely.
If it's not proper for here, how about before the function
machine_crash_shutdown(_regs)?
Furthermore, could you explain the real risks to execute more codes here?


The kernel is known bad.  What is bad is unclear.
Executing any extra code is a bad idea.

The history here is that before kexec-on-panic there were lots of dump
routines that did all of the crashdump logic in the kernel before they
shutdown.  They all worked beautifully during development, and on
developers test machines and were absolutely worthless in real world
situations.


I also have learned some from the old style kernel dump. Yes, they do 
have some problems in real world situations. The primary problems come 
from I/O operations (disk writing/network sending) and invalid page table.




A piece of code that walks all of the page tables is most definitely
opening itself up to all kinds of failure situations I can't even
imagine.


Agree, invalid page table will cause disaster.
But even in the capture kernel with user space program, it may only 
causes a core dump, user still have chance to dump the crashed system by 
themselves with some special tools, It's possible, but should be very 
rare in real world.

I doubt how many users be able to handle it in such kind of situations.
So in most cases, if page tables have corrupted, and can not dump it 
normally, user would like to reboot the system directly.




The only way that it would be ok to do this would be to maintain the
bitmap in real time with the existing page table maintenance code,
and that would only be ok if it did not add a performance penalty.


I also have a prototype that can trace the page table changes in real 
time, but I still didn't test the performance penalty. I will test it 
again if I have time.




Every once in a great while there is a new cpu architecture feature
we need to deal with, but otherwise the only thing that is ok to
do on that code path is to reduce it until it much more closely
resembles the glorified jump instruction that it really is.


Agree. But if we can find some solution that can be proved as robust as 
a jump that may apply.




Speaking of have you given this code any coverage testing with lkdtm?


Still not, But I will test it with lkdtm.
Before that, I would like to test the mmap() solution first.

Thanks for your very valuable comments, that helped me a lot!



Eric




--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-12 Thread Jingbai Ma


On 03/11/2013 05:42 PM, Eric W. Biederman wrote:

Jingbai Majingbai...@hp.com  writes:


On 03/08/2013 11:52 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 01:54:45PM -0800, Eric W. Biederman wrote:

Vivek Goyalvgo...@redhat.com   writes:


On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.


Actually it does not appear to be doing any work in the first kernel.


Looks like patch3 in series is doing that.

  machine_crash_shutdown(fixed_regs);
+   generate_crash_dump_bitmap();
  machine_kexec(kexec_crash_image);

So this bitmap seems to be being set just before transitioning into
second kernel.

I am sure you would not like this extra code in this path. :-)


I was thought this function code is pretty simple, could be called
here safely.
If it's not proper for here, how about before the function
machine_crash_shutdown(fixed_regs)?
Furthermore, could you explain the real risks to execute more codes here?


The kernel is known bad.  What is bad is unclear.
Executing any extra code is a bad idea.

The history here is that before kexec-on-panic there were lots of dump
routines that did all of the crashdump logic in the kernel before they
shutdown.  They all worked beautifully during development, and on
developers test machines and were absolutely worthless in real world
situations.


I also have learned some from the old style kernel dump. Yes, they do 
have some problems in real world situations. The primary problems come 
from I/O operations (disk writing/network sending) and invalid page table.




A piece of code that walks all of the page tables is most definitely
opening itself up to all kinds of failure situations I can't even
imagine.


Agree, invalid page table will cause disaster.
But even in the capture kernel with user space program, it may only 
causes a core dump, user still have chance to dump the crashed system by 
themselves with some special tools, It's possible, but should be very 
rare in real world.

I doubt how many users be able to handle it in such kind of situations.
So in most cases, if page tables have corrupted, and can not dump it 
normally, user would like to reboot the system directly.




The only way that it would be ok to do this would be to maintain the
bitmap in real time with the existing page table maintenance code,
and that would only be ok if it did not add a performance penalty.


I also have a prototype that can trace the page table changes in real 
time, but I still didn't test the performance penalty. I will test it 
again if I have time.




Every once in a great while there is a new cpu architecture feature
we need to deal with, but otherwise the only thing that is ok to
do on that code path is to reduce it until it much more closely
resembles the glorified jump instruction that it really is.


Agree. But if we can find some solution that can be proved as robust as 
a jump that may apply.




Speaking of have you given this code any coverage testing with lkdtm?


Still not, But I will test it with lkdtm.
Before that, I would like to test the mmap() solution first.

Thanks for your very valuable comments, that helped me a lot!



Eric




--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma

On 03/09/2013 12:31 PM, HATAYAMA Daisuke wrote:

From: Jingbai Ma
Subject: Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to 
speedup kernel dump process
Date: Fri, 8 Mar 2013 18:06:31 +0800

On 03/07/2013 11:21 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

...

First of all 64MB per TB should not be a huge deal. And makedumpfile
also has this cyclic mode where you process a map, discard it and then
move on to next section. So memory usage remains constant at the
expense
of processing time.

Yes, that's true. But in cyclic mode, makedumpfile will have to
write/read bitmap from storage, it will also impact the performance.
I have measured the penalty for cyclic mode is about 70%
slowdown. Maybe could be faster after mmap implemented.

I guess the slowdown came from the issue that enough VMCOREINFO was
not provided from the kernel, and unnecessary filtering processing for
free pages is done multiple times.

Thanks for your comments! It would be very helpful.
I will test it on the machine again.

--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma


On 03/09/2013 12:19 AM, Vivek Goyal wrote:

On Fri, Mar 08, 2013 at 06:06:31PM +0800, Jingbai Ma wrote:

[..]

- First of all it is doing more stuff in first kernel. And that runs
   contrary to kdump design where we want to do stuff in second kernel.
   After a kernel crash, you can't trust running kernel's data structures.
   So to improve reliability just do minial stuff in crashed kernel and
   get out quickly.


I agreed with you, the first kernel should do as less as possible.
Intuitively, filter memory pages in the first kernel will harm the
reliability of kernel dump, but let's think it thoroughly:

1. It only relies on the memory management data structure that
makedumpfile also relies on, so no any reliability degradation at
this point.


Its not same. If there is something wrong with memory management
data structures, you can panic() again and self lock yourself and
never even transition to the second kernel.

With makedumpfile, if something is wrong, either we will save wrong
bits or get segmentation fault. But one can still try to be careful
or save whole dump and try to get specific pieces out.

So it it is not apples to apples comparison.



Understood, the double panic() does harm the reliabilities. But consider 
the chance to panic in to memory filtering code, it shouldn't increase 
the risks very much.
If the filtering code panicked, I doubt even without it, the second 
kernel could be booted up normally.



[..]

Looks like now hpa and yinghai have done the work to be able to load
kdump kernel above 4GB. I am assuming this also removes the restriction
that we can only reserve 512MB or 896MB in second kernel. If that's
the case, then I don't see why people can't get away with reserving
64MB per TB.


That's true. With kernel 3.9-rc1 with kexec-tools 2.0.4, capture
kernel will have enough memory to run. And makedumpfile could be
always run at non-cyclic mode, but we still concern about the kernel
dump performance on systems with huge memory (above 4TB).


I would think that lets first try to make mmap() on /proc/vmcore work and
optimize makefumpfile to make use of it and then see if performance is
acceptable or not on large machines. And then take it from there.


Sure, you are right, I'm going to test the mmap() solution first, if it 
doesn't meet the performance requirement on large machine, We still need 
a solution here.


Thanks!



Thanks
Vivek



--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma


On 03/09/2013 12:13 AM, Eric W. Biederman wrote:

"Ma, Jingbai (Kingboard)"  writes:


On 3/8/13 6:33 PM, "H. Peter Anvin"  wrote:



On 03/08/2013 02:06 AM, Jingbai Ma wrote:


Kernel do have some abilities that user space haven't. It's possible to
map whole memory space of the first kernel into user space on the second
kernel. But the user space code has to re-implement some parts of the
kernel memory management system again. And worse, it's architecture
dependent, more architectures supported, more codes have to be
implemented. All implementation in user space must be sync to kernel
implementation. It's may called "flexibility", but it's painful to
maintain the codes.



What?  You are basically talking about /dev/mem... there is nothing
particularly magic about it at all.


What we are talking about is filtering memory pages (AKA memory pages
classification)
The makedumpfile (or any other dumper in user space) has to know the
exactly
memory layout of the memory management data structures, it not only
architecture dependent, but also may varies in different kernel release.
At this point, /dev/mem doesn't give any help.
So IMHO, I would like to do it in kernel, rather than So keep tracking
changes in user space code.


But the fact is there is no requirment that the crash dump capture
kernel is the same version as the kernel that crashed.  In fact it has
been common at some points in time to use slightly different build
options, or slightly different kernels.  Say a 32bit PAE kernel to
capture a 64bit x86_64 kernel.


The filtering code will be executed in the first kernel, so this problem 
will not be exist.




So in fact performing this work in the kernel and is actively harmful to
reliability and maintenance because it adds an incorrect assumption.

If you do want the benefit of shared maintenance with the kernel one
solution that has been suggested several times is to put code into
tools/makedumpfile (probably a library) that encapsulates the kernel
specific knowledge that can be loaded into the ramdisk when the
crahsdump kernel is being loaded.

That would allow shared maintenance along without breaking the
possibility of supporting kernel versions.


Yes, you are right. But it requires makedumpfile changes significantly, 
and if we also want to shared the code with kernel memory management 
subsystem, I believe that's not a easy job. (at least to my limited 
kernel knowledge)




Eric



--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma


On 03/08/2013 11:52 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 01:54:45PM -0800, Eric W. Biederman wrote:

Vivek Goyal  writes:


On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.


Actually it does not appear to be doing any work in the first kernel.


Looks like patch3 in series is doing that.

 machine_crash_shutdown(_regs);
+   generate_crash_dump_bitmap();
 machine_kexec(kexec_crash_image);

So this bitmap seems to be being set just before transitioning into
second kernel.

I am sure you would not like this extra code in this path. :-)


I was thought this function code is pretty simple, could be called here 
safely.
If it's not proper for here, how about before the function 
machine_crash_shutdown(_regs)?

Furthermore, could you explain the real risks to execute more codes here?

Thanks!



Thanks
Vivek



--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma


On 03/08/2013 11:52 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 01:54:45PM -0800, Eric W. Biederman wrote:

Vivek Goyalvgo...@redhat.com  writes:


On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.


Actually it does not appear to be doing any work in the first kernel.


Looks like patch3 in series is doing that.

 machine_crash_shutdown(fixed_regs);
+   generate_crash_dump_bitmap();
 machine_kexec(kexec_crash_image);

So this bitmap seems to be being set just before transitioning into
second kernel.

I am sure you would not like this extra code in this path. :-)


I was thought this function code is pretty simple, could be called here 
safely.
If it's not proper for here, how about before the function 
machine_crash_shutdown(fixed_regs)?

Furthermore, could you explain the real risks to execute more codes here?

Thanks!



Thanks
Vivek



--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma


On 03/09/2013 12:13 AM, Eric W. Biederman wrote:

Ma, Jingbai (Kingboard)kingboard...@hp.com  writes:


On 3/8/13 6:33 PM, H. Peter Anvinh...@zytor.com  wrote:



On 03/08/2013 02:06 AM, Jingbai Ma wrote:


Kernel do have some abilities that user space haven't. It's possible to
map whole memory space of the first kernel into user space on the second
kernel. But the user space code has to re-implement some parts of the
kernel memory management system again. And worse, it's architecture
dependent, more architectures supported, more codes have to be
implemented. All implementation in user space must be sync to kernel
implementation. It's may called flexibility, but it's painful to
maintain the codes.



What?  You are basically talking about /dev/mem... there is nothing
particularly magic about it at all.


What we are talking about is filtering memory pages (AKA memory pages
classification)
The makedumpfile (or any other dumper in user space) has to know the
exactly
memory layout of the memory management data structures, it not only
architecture dependent, but also may varies in different kernel release.
At this point, /dev/mem doesn't give any help.
So IMHO, I would like to do it in kernel, rather than So keep tracking
changes in user space code.


But the fact is there is no requirment that the crash dump capture
kernel is the same version as the kernel that crashed.  In fact it has
been common at some points in time to use slightly different build
options, or slightly different kernels.  Say a 32bit PAE kernel to
capture a 64bit x86_64 kernel.


The filtering code will be executed in the first kernel, so this problem 
will not be exist.




So in fact performing this work in the kernel and is actively harmful to
reliability and maintenance because it adds an incorrect assumption.

If you do want the benefit of shared maintenance with the kernel one
solution that has been suggested several times is to put code into
tools/makedumpfile (probably a library) that encapsulates the kernel
specific knowledge that can be loaded into the ramdisk when the
crahsdump kernel is being loaded.

That would allow shared maintenance along without breaking the
possibility of supporting kernel versions.


Yes, you are right. But it requires makedumpfile changes significantly, 
and if we also want to shared the code with kernel memory management 
subsystem, I believe that's not a easy job. (at least to my limited 
kernel knowledge)




Eric



--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma


On 03/09/2013 12:19 AM, Vivek Goyal wrote:

On Fri, Mar 08, 2013 at 06:06:31PM +0800, Jingbai Ma wrote:

[..]

- First of all it is doing more stuff in first kernel. And that runs
   contrary to kdump design where we want to do stuff in second kernel.
   After a kernel crash, you can't trust running kernel's data structures.
   So to improve reliability just do minial stuff in crashed kernel and
   get out quickly.


I agreed with you, the first kernel should do as less as possible.
Intuitively, filter memory pages in the first kernel will harm the
reliability of kernel dump, but let's think it thoroughly:

1. It only relies on the memory management data structure that
makedumpfile also relies on, so no any reliability degradation at
this point.


Its not same. If there is something wrong with memory management
data structures, you can panic() again and self lock yourself and
never even transition to the second kernel.

With makedumpfile, if something is wrong, either we will save wrong
bits or get segmentation fault. But one can still try to be careful
or save whole dump and try to get specific pieces out.

So it it is not apples to apples comparison.



Understood, the double panic() does harm the reliabilities. But consider 
the chance to panic in to memory filtering code, it shouldn't increase 
the risks very much.
If the filtering code panicked, I doubt even without it, the second 
kernel could be booted up normally.



[..]

Looks like now hpa and yinghai have done the work to be able to load
kdump kernel above 4GB. I am assuming this also removes the restriction
that we can only reserve 512MB or 896MB in second kernel. If that's
the case, then I don't see why people can't get away with reserving
64MB per TB.


That's true. With kernel 3.9-rc1 with kexec-tools 2.0.4, capture
kernel will have enough memory to run. And makedumpfile could be
always run at non-cyclic mode, but we still concern about the kernel
dump performance on systems with huge memory (above 4TB).


I would think that lets first try to make mmap() on /proc/vmcore work and
optimize makefumpfile to make use of it and then see if performance is
acceptable or not on large machines. And then take it from there.


Sure, you are right, I'm going to test the mmap() solution first, if it 
doesn't meet the performance requirement on large machine, We still need 
a solution here.


Thanks!



Thanks
Vivek



--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-11 Thread Jingbai Ma

On 03/09/2013 12:31 PM, HATAYAMA Daisuke wrote:

From: Jingbai Majingbai...@hp.com
Subject: Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to 
speedup kernel dump process
Date: Fri, 8 Mar 2013 18:06:31 +0800

On 03/07/2013 11:21 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

...

First of all 64MB per TB should not be a huge deal. And makedumpfile
also has this cyclic mode where you process a map, discard it and then
move on to next section. So memory usage remains constant at the
expense
of processing time.

Yes, that's true. But in cyclic mode, makedumpfile will have to
write/read bitmap from storage, it will also impact the performance.
I have measured the penalty for cyclic mode is about 70%
slowdown. Maybe could be faster after mmap implemented.

I guess the slowdown came from the issue that enough VMCOREINFO was
not provided from the kernel, and unnecessary filtering processing for
free pages is done multiple times.

Thanks for your comments! It would be very helpful.
I will test it on the machine again.

--
Jingbai Ma (jingbai...@hp.com)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-08 Thread Jingbai Ma


On 03/07/2013 11:21 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.

- First of all it is doing more stuff in first kernel. And that runs
   contrary to kdump design where we want to do stuff in second kernel.
   After a kernel crash, you can't trust running kernel's data structures.
   So to improve reliability just do minial stuff in crashed kernel and
   get out quickly.


I agreed with you, the first kernel should do as less as possible.
Intuitively, filter memory pages in the first kernel will harm the 
reliability of kernel dump, but let's think it thoroughly:


1. It only relies on the memory management data structure that 
makedumpfile also relies on, so no any reliability degradation at this 
point.


2. Filtering code itself is very simple and straightforward, doesn't 
depend on kernel functions too much. Current code calls 
pgdat_resize_lock() and spin_lock_irqsave() for testing purpose in 
non-crash situation, and can be removed safely in crash processing. It 
may affects reliability but very limit.


3. Before calling filtering code, the machine_crash_shutdown() has been 
executed, so all IRQs have been disabled, all other CPUs have been 
halted. We only need to make sure NMI from watchdog has been disabled here.
So far, we stay on a separate stack, no any potential interrupts here, 
only executes a little piece of code with very limit system functions.
Compares to the complicated functions been executed previously, the 
risks from the filtering code should be acceptable.




- Secondly, it moves filetering policy in kernel. I think keeping it
   in user space gives us the extra flexibility.


It doesn't keep user from extra flexibility, just adds another 
possibility. I have added a flag in makedumpfile, user can decide to 
filter memory pages by makedumpfile itself or just use the bitmap came 
from the first kernel.




It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)


makedumpfile requires 2bits per 4K page. That is 64MB per TB. In your
patches also you are reserving 1bit per page and that is 32MB per TB
in first kernel.

So memory is anyway being reserved, just that makedumpfile seems to be
needing this extra bit. Not sure if that can be optimized or not.


Yes, you are right. It's only a POC (proof of concept) implementation 
currently. I can add a mmap interface to allow makedumpfile to access 
the bitmap memory directly without reserving memory for it again.




First of all 64MB per TB should not be a huge deal. And makedumpfile
also has this cyclic mode where you process a map, discard it and then
move on to next section. So memory usage remains constant at the expense
of processing time.


Yes, that's true. But in cyclic mode, makedumpfile will have to 
write/read bitmap from storage, it will also impact the performance.
I have measured the penalty for cyclic mode is about 70% slowdown. Maybe 
could be faster after mmap implemented.




Looks like now hpa and yinghai have done the work to be able to load
kdump kernel above 4GB. I am assuming this also removes the restriction
that we can only reserve 512MB or 896MB in second kernel. If that's
the case, then I don't see why people can't get away with reserving
64MB per TB.


That's true. With kernel 3.9-rc1 with kexec-tools 2.0.4, capture kernel 
will have enough memory to run. And makedumpfile could be always run at 
non-cyclic mode, but we still concern about the kernel dump performance 
on systems with huge memory (above 4TB).




2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-08 Thread Jingbai Ma


On 03/07/2013 11:21 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.

- First of all it is doing more stuff in first kernel. And that runs
   contrary to kdump design where we want to do stuff in second kernel.
   After a kernel crash, you can't trust running kernel's data structures.
   So to improve reliability just do minial stuff in crashed kernel and
   get out quickly.


I agreed with you, the first kernel should do as less as possible.
Intuitively, filter memory pages in the first kernel will harm the 
reliability of kernel dump, but let's think it thoroughly:


1. It only relies on the memory management data structure that 
makedumpfile also relies on, so no any reliability degradation at this 
point.


2. Filtering code itself is very simple and straightforward, doesn't 
depend on kernel functions too much. Current code calls 
pgdat_resize_lock() and spin_lock_irqsave() for testing purpose in 
non-crash situation, and can be removed safely in crash processing. It 
may affects reliability but very limit.


3. Before calling filtering code, the machine_crash_shutdown() has been 
executed, so all IRQs have been disabled, all other CPUs have been 
halted. We only need to make sure NMI from watchdog has been disabled here.
So far, we stay on a separate stack, no any potential interrupts here, 
only executes a little piece of code with very limit system functions.
Compares to the complicated functions been executed previously, the 
risks from the filtering code should be acceptable.




- Secondly, it moves filetering policy in kernel. I think keeping it
   in user space gives us the extra flexibility.


It doesn't keep user from extra flexibility, just adds another 
possibility. I have added a flag in makedumpfile, user can decide to 
filter memory pages by makedumpfile itself or just use the bitmap came 
from the first kernel.




It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)


makedumpfile requires 2bits per 4K page. That is 64MB per TB. In your
patches also you are reserving 1bit per page and that is 32MB per TB
in first kernel.

So memory is anyway being reserved, just that makedumpfile seems to be
needing this extra bit. Not sure if that can be optimized or not.


Yes, you are right. It's only a POC (proof of concept) implementation 
currently. I can add a mmap interface to allow makedumpfile to access 
the bitmap memory directly without reserving memory for it again.




First of all 64MB per TB should not be a huge deal. And makedumpfile
also has this cyclic mode where you process a map, discard it and then
move on to next section. So memory usage remains constant at the expense
of processing time.


Yes, that's true. But in cyclic mode, makedumpfile will have to 
write/read bitmap from storage, it will also impact the performance.
I have measured the penalty for cyclic mode is about 70% slowdown. Maybe 
could be faster after mmap implemented.




Looks like now hpa and yinghai have done the work to be able to load
kdump kernel above 4GB. I am assuming this also removes the restriction
that we can only reserve 512MB or 896MB in second kernel. If that's
the case, then I don't see why people can't get away with reserving
64MB per TB.


That's true. With kernel 3.9-rc1 with kexec-tools 2.0.4, capture kernel 
will have enough memory to run. And makedumpfile could be always run at 
non-cyclic mode, but we still concern about the kernel dump performance 
on systems with huge memory (above 4TB).




2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-08 Thread Jingbai Ma


On 03/07/2013 11:21 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.

- First of all it is doing more stuff in first kernel. And that runs
   contrary to kdump design where we want to do stuff in second kernel.
   After a kernel crash, you can't trust running kernel's data structures.
   So to improve reliability just do minial stuff in crashed kernel and
   get out quickly.


I agreed with you, the first kernel should do as less as possible.
Intuitively, filter memory pages in the first kernel will harm the 
reliability of kernel dump, but let's think it thoroughly:


1. It only relies on the memory management data structure that 
makedumpfile also relies on, so no any reliability degradation at this 
point.


2. Filtering code itself is very simple and straightforward, doesn't 
depend on kernel functions too much. Current code calls 
pgdat_resize_lock() and spin_lock_irqsave() for testing purpose in 
non-crash situation, and can be removed safely in crash processing. It 
may affects reliability but very limit.


3. Before calling filtering code, the machine_crash_shutdown() has been 
executed, so all IRQs have been disabled, all other CPUs have been 
halted. We only need to make sure NMI from watchdog has been disabled here.
So far, we stay on a separate stack, no any potential interrupts here, 
only executes a little piece of code with very limit system functions.
Compares to the complicated functions been executed previously, the 
risks from the filtering code should be acceptable.




- Secondly, it moves filetering policy in kernel. I think keeping it
   in user space gives us the extra flexibility.


It doesn't keep user from extra flexibility, just adds another 
possibility. I have added a flag in makedumpfile, user can decide to 
filter memory pages by makedumpfile itself or just use the bitmap came 
from the first kernel.




It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)


makedumpfile requires 2bits per 4K page. That is 64MB per TB. In your
patches also you are reserving 1bit per page and that is 32MB per TB
in first kernel.

So memory is anyway being reserved, just that makedumpfile seems to be
needing this extra bit. Not sure if that can be optimized or not.


Yes, you are right. It's only a POC (proof of concept) implementation 
currently. I can add a mmap interface to allow makedumpfile to access 
the bitmap memory directly without reserving memory for it again.




First of all 64MB per TB should not be a huge deal. And makedumpfile
also has this cyclic mode where you process a map, discard it and then
move on to next section. So memory usage remains constant at the expense
of processing time.


Yes, that's true. But in cyclic mode, makedumpfile will have to 
write/read bitmap from storage, it will also impact the performance.
I have measured the penalty for cyclic mode is about 70% slowdown. Maybe 
could be faster after mmap implemented.




Looks like now hpa and yinghai have done the work to be able to load
kdump kernel above 4GB. I am assuming this also removes the restriction
that we can only reserve 512MB or 896MB in second kernel. If that's
the case, then I don't see why people can't get away with reserving
64MB per TB.


That's true. With kernel 3.9-rc1 with kexec-tools 2.0.4, capture kernel 
will have enough memory to run. And makedumpfile could be always run at 
non-cyclic mode, but we still concern about the kernel dump performance 
on systems with huge memory (above 4TB).




2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33

Re: [RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-08 Thread Jingbai Ma


On 03/07/2013 11:21 PM, Vivek Goyal wrote:

On Thu, Mar 07, 2013 at 10:58:18PM +0800, Jingbai Ma wrote:

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).


I think this is not a good idea. It has several issues.

- First of all it is doing more stuff in first kernel. And that runs
   contrary to kdump design where we want to do stuff in second kernel.
   After a kernel crash, you can't trust running kernel's data structures.
   So to improve reliability just do minial stuff in crashed kernel and
   get out quickly.


I agreed with you, the first kernel should do as less as possible.
Intuitively, filter memory pages in the first kernel will harm the 
reliability of kernel dump, but let's think it thoroughly:


1. It only relies on the memory management data structure that 
makedumpfile also relies on, so no any reliability degradation at this 
point.


2. Filtering code itself is very simple and straightforward, doesn't 
depend on kernel functions too much. Current code calls 
pgdat_resize_lock() and spin_lock_irqsave() for testing purpose in 
non-crash situation, and can be removed safely in crash processing. It 
may affects reliability but very limit.


3. Before calling filtering code, the machine_crash_shutdown() has been 
executed, so all IRQs have been disabled, all other CPUs have been 
halted. We only need to make sure NMI from watchdog has been disabled here.
So far, we stay on a separate stack, no any potential interrupts here, 
only executes a little piece of code with very limit system functions.
Compares to the complicated functions been executed previously, the 
risks from the filtering code should be acceptable.




- Secondly, it moves filetering policy in kernel. I think keeping it
   in user space gives us the extra flexibility.


It doesn't keep user from extra flexibility, just adds another 
possibility. I have added a flag in makedumpfile, user can decide to 
filter memory pages by makedumpfile itself or just use the bitmap came 
from the first kernel.




It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)


makedumpfile requires 2bits per 4K page. That is 64MB per TB. In your
patches also you are reserving 1bit per page and that is 32MB per TB
in first kernel.

So memory is anyway being reserved, just that makedumpfile seems to be
needing this extra bit. Not sure if that can be optimized or not.


Yes, you are right. It's only a POC (proof of concept) implementation 
currently. I can add a mmap interface to allow makedumpfile to access 
the bitmap memory directly without reserving memory for it again.




First of all 64MB per TB should not be a huge deal. And makedumpfile
also has this cyclic mode where you process a map, discard it and then
move on to next section. So memory usage remains constant at the expense
of processing time.


Yes, that's true. But in cyclic mode, makedumpfile will have to 
write/read bitmap from storage, it will also impact the performance.
I have measured the penalty for cyclic mode is about 70% slowdown. Maybe 
could be faster after mmap implemented.




Looks like now hpa and yinghai have done the work to be able to load
kdump kernel above 4GB. I am assuming this also removes the restriction
that we can only reserve 512MB or 896MB in second kernel. If that's
the case, then I don't see why people can't get away with reserving
64MB per TB.


That's true. With kernel 3.9-rc1 with kexec-tools 2.0.4, capture kernel 
will have enough memory to run. And makedumpfile could be always run at 
non-cyclic mode, but we still concern about the kernel dump performance 
on systems with huge memory (above 4TB).




2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33

[RFC PATCH 5/5] crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue

2013-03-07 Thread Jingbai Ma

Linux kernel 3.9-rc1 allows crashkernel above 4GB, but current
kexec-tools doesn't support it yet.
This patch is only a workaround to make kdump work again.
This patch should be removed after kexec-tools 2.0.4 release.

Signed-off-by: Jingbai Ma 
---
 arch/x86/kernel/setup.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 165c831..15321d6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -506,7 +506,8 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 #ifdef CONFIG_X86_32
 # define CRASH_KERNEL_ADDR_MAX (512 << 20)
 #else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+/* # define CRASH_KERNEL_ADDR_MAX  MAXMEM */
+# define CRASH_KERNEL_ADDR_MAX (896 << 20)
 #endif
 
 static void __init reserve_crashkernel_low(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 4/5] crash dump bitmap: add a proc interface for crash dump bitmap

2013-03-07 Thread Jingbai Ma

Add a procfs driver for selecting exclude pages in userspace.
/proc/crash_dump_bitmap/

Signed-off-by: Jingbai Ma 
---
 fs/proc/Makefile|1 
 fs/proc/crash_dump_bitmap.c |  221 +++
 2 files changed, 222 insertions(+), 0 deletions(-)
 create mode 100644 fs/proc/crash_dump_bitmap.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 712f24d..2dfcff1 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -27,6 +27,7 @@ proc-$(CONFIG_PROC_SYSCTL)+= proc_sysctl.o
 proc-$(CONFIG_NET) += proc_net.o
 proc-$(CONFIG_PROC_KCORE)  += kcore.o
 proc-$(CONFIG_PROC_VMCORE) += vmcore.o
+proc-$(CONFIG_CRASH_DUMP_BITMAP)   += crash_dump_bitmap.o
 proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
 proc-$(CONFIG_PRINTK)  += kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)   += page.o
diff --git a/fs/proc/crash_dump_bitmap.c b/fs/proc/crash_dump_bitmap.c
new file mode 100644
index 000..77ecaae
--- /dev/null
+++ b/fs/proc/crash_dump_bitmap.c
@@ -0,0 +1,221 @@
+/*
+ *fs/proc/crash_dump_bitmap.c
+ *Interface for controlling the crash dump bitmap from user space.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma 
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jingbai Ma ");
+MODULE_DESCRIPTION("Crash dump bitmap support driver");
+
+static const char *proc_dir_name = "crash_dump_bitmap";
+static const char *proc_page_status_name = "page_status";
+static const char *proc_dump_level_name = "dump_level";
+
+static struct proc_dir_entry *proc_dir, *proc_page_status, *proc_dump_level;
+
+static unsigned int get_dump_level(void)
+{
+   unsigned int dump_level;
+
+   dump_level = crash_dump_bitmap_ctrl.exclude_zero_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_private_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_user_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_free_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES : 0;
+
+   return dump_level;
+}
+
+static void set_dump_level(unsigned int dump_level)
+{
+   crash_dump_bitmap_ctrl.exclude_zero_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES)
+? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_user_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_free_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES) ? 1 : 0;
+}
+
+static int proc_page_status_show(struct seq_file *m, void *v)
+{
+   u64 start, duration;
+
+   if (!crash_dump_bitmap_mem) {
+   seq_printf(m,
+   "crash_dump_bitmap: crash_dump_bitmap_mem not 
found!\n");
+
+   return -EINVAL;
+   }
+
+   seq_printf(m, "Exclude page flag status:\n");
+   seq_printf(m, "exclude_dump_bitmap_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages);
+   seq_printf(m, "exclude_zero_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_zero_pages);
+   seq_printf(m, "exclude_cache_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_cache_pages);
+   seq_printf(m, "exclude_cache_private_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages);
+   seq_printf(m, "exclude_user_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_user_pages);
+   seq_printf(m, "exclude_free_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_free_pages);
+
+   seq_printf(m, "Scanning all memory pages:\n&quo

[RFC PATCH 2/5] crash dump bitmap: init crash dump bitmap in kernel booting process

2013-03-07 Thread Jingbai Ma

Reserve a memory block for crash_dump_bitmap in kernel booting process.

Signed-off-by: Jingbai Ma 
---
 arch/x86/kernel/setup.c   |   59 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 +
 kernel/crash_dump_bitmap.c|   45 
 4 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84d3285..165c831 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -67,6 +67,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -601,6 +602,62 @@ static void __init reserve_crashkernel(void)
 }
 #endif
 
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+static void __init crash_dump_bitmap_init(void)
+{
+   static unsigned long BITSPERBYTE = 8;
+
+   unsigned long long mem_start;
+   unsigned long long mem_size;
+
+   if (is_kdump_kernel())
+   return;
+
+   mem_start = (1ULL << 24); /* 16MB */
+   mem_size = roundup((roundup(max_pfn, BITSPERBYTE) / BITSPERBYTE),
+   PAGE_SIZE);
+
+   crash_dump_bitmap_mem = memblock_find_in_range(mem_start,
+   MEMBLOCK_ALLOC_ACCESSIBLE, mem_size, PAGE_SIZE);
+
+   if (!crash_dump_bitmap_mem) {
+   pr_err(
+   "crash_dump_bitmap: allocate error! size=%lldkB, from=%lldMB\n",
+   mem_size >> 10, mem_start >> 20);
+
+   return;
+   }
+
+   crash_dump_bitmap_mem_size = mem_size;
+   memblock_reserve(crash_dump_bitmap_mem, crash_dump_bitmap_mem_size);
+   pr_info("crash_dump_bitmap: bitmap_mem=%lldMB. size=%lldkB\n",
+   (unsigned long long)crash_dump_bitmap_mem >> 20,
+   mem_size >> 10);
+
+   crash_dump_bitmap_res.start = crash_dump_bitmap_mem;
+   crash_dump_bitmap_res.end   = crash_dump_bitmap_mem + mem_size - 1;
+   insert_resource(_resource, _dump_bitmap_res);
+
+   crash_dump_bitmap_info.version = CRASH_DUMP_BITMAP_VERSION;
+
+   crash_dump_bitmap_info.bitmap = crash_dump_bitmap_mem;
+   crash_dump_bitmap_info.bitmap_size = crash_dump_bitmap_mem_size;
+
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_zero_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_user_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_free_pages = 1;
+
+   pr_info("crash_dump_bitmap: Initialized!\n");
+}
+#else
+static void __init crash_dump_bitmap_init(void)
+{
+}
+#endif
+
 static struct resource standard_io_resources[] = {
{ .name = "dma1", .start = 0x00, .end = 0x1f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
@@ -1094,6 +1151,8 @@ void __init setup_arch(char **cmdline_p)
 
reserve_crashkernel();
 
+   crash_dump_bitmap_init();
+
vsmp_init();
 
io_delay_init();
diff --git a/include/linux/crash_dump_bitmap.h 
b/include/linux/crash_dump_bitmap.h
new file mode 100644
index 000..63b1264
--- /dev/null
+++ b/include/linux/crash_dump_bitmap.h
@@ -0,0 +1,59 @@
+/*
+ *include/linux/crash_dump_bitmap.h
+ *Declaration of crash dump bitmap functions and data structures.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma 
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#ifndef _LINUX_CRASH_DUMP_BITMAP_H
+#define _LINUX_CRASH_DUMP_BITMAP_H
+
+#define CRASH_DUMP_BITMAP_VERSION 1;
+
+enum {
+   CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES = 1,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES = 2,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES = 4,
+   CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES = 8,
+   CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES = 16
+};
+
+struct crash_dump_bitmap_ctrl {
+   char exclude_crash_dump_bitmap_pages;
+   char exclude_zero_pages;/* only for tracking dump level */
+   char exclude_cache_pages;
+   char exclude_cache_private_pages;
+   char exclude_user_pages;
+   char exclude_free_pages;
+};
+
+struct crash_dump_bitmap_info {
+   unsigned int version;
+   phys_addr_t bitmap;
+   phys_addr_t bitmap_size;
+   unsigned long cache_pages;
+   unsigned long cache_

[RFC PATCH 1/5] crash dump bitmap: add a kernel config and help document

2013-03-07 Thread Jingbai Ma

Add a kernel config and help document for CRASH_DUMP_BITMAP.

Signed-off-by: Jingbai Ma 
---
 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 2 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt

diff --git a/Documentation/kdump/crash_dump_bitmap.txt 
b/Documentation/kdump/crash_dump_bitmap.txt
new file mode 100644
index 000..468cdf2
--- /dev/null
+++ b/Documentation/kdump/crash_dump_bitmap.txt
@@ -0,0 +1,378 @@
+
+Documentation for Crash Dump Bitmap
+
+
+This document includes overview, setup and installation, and analysis
+information.
+
+Overview
+
+
+Traditionally, to reduce the size of dump file, dumper scans all memory
+pages to exclude the unnecessary memory pages after capture kernel
+booted, and scan it in userspace code (makedumpfile).
+
+It introduces several problems:
+
+1. Requires more memory to store memory bitmap on systems with large
+amount of memory installed. And in capture kernel there is only a few
+free memory available, it will cause an out of memory error and fail.
+(Non-cyclic mode)
+
+2. Scans all memory pages in makedumpfile is a very slow process. On
+system with 1TB or more memory installed, the scanning process is very
+long. Typically on 1TB idle system, it takes about 19 minutes. On system
+with 4TB or more memory installed, it even doesn't work. To address the
+out of memory issue on system with big memory (4TB or more memory
+installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
+scans a piece of memory pages each time, and do it cyclically to scan
+all memory pages. But it runs more slowly, on 1TB system, takes about 33
+minutes.
+
+3. Scans memory pages code in makedumpfile is very complicated, without
+kernel memory management related data structure, makedumpfile has to
+build up its on data structure, and will not able to use some macros
+that only be available in kernel (e.g. page_to_pfn), and has to use some
+slow lookup algorithm instead.
+
+This patch introduces a new way to scan memory pages. It reserves a piece of
+memory (1 bit for each page, 32MB per TB memory on x86 systems) in the first
+kernel. During the kernel panic process, it scans all memory pages, clear the
+bit for all excluded memory pages in the reserved memory.
+
+We have several benefits by this new approach:
+
+1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
+scan all memory pages!
+
+2. Reduces the memory requirement of makedumpfile by putting the
+reserved memory in the first kernel memory space.
+
+3. Simplifies the complexity of existing memory pages scanning code in
+userspace.
+
+
+Usage
+=
+
+1) Enable "kernel crash dump bitmap" in "Processor type and features", under
+"kernel crash dumps".
+
+CONFIG_CRASH_DUMP_BITMAP=y
+
+it depends on "kexec system call" and "kernel crash dumps", so there features
+must be enabled also.
+
+CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
+
+2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo filesystems.".
+
+   CONFIG_SYSFS=y
+
+3) Compile and install the new kernel.
+
+4) Check the new kernel.
+Once new kernel has booted, there will be a new foler
+/proc/crash_dump_bitmap.
+Check current dump level:
+cat /proc/crash_dump_bitmap/dump_level
+
+Set dump level:
+echo "dump level" > /proc/crash_dump_bitmap/dump_level
+
+The dump level is as same as the parameter of makedumpfile -d dump_level.
+
+Run page scan and check page status:
+cat /proc/crash_dump_bitmap/page_status
+
+5) Download makedumpfile v1.5.3 or later from sourceforge:
+http://sourceforge.net/projects/makedumpfile/
+
+6) Patch it with the patch at the end of this file.
+
+7) Compile it and copy the patched makedumpfile into the right folder
+(/sbin or /usr/sbin)
+
+8) Change the /etc/kdump.conf, and a "-q" in the makedumpfile parameter
+line. It will tell makedumpfile to use the crash dump bitmap in kernel.
+core_collector makedumpfile --non-cyclic -q -c -d 31 --message-level 23
+
+9) Regenerate initramfs to make sure the patched makedumpfile and config
+has been included in it.
+
+
+To Do
+=
+
+It only supports x86-64 architecture currently, need to add supports for
+other architectures.
+
+
+Contact
+===
+
+Jingbai Ma (jingbai...@hp.com)
+
+
+Patch (for makedumpfile v1.5.3)
+
+Please forgive me, for some format issues of makedumpfile source, I have
+to wrap this patch with '#'.  Please use this sed command to get the
+patch for makedumpfile:
+
+sed -n -e "s/^#\(.*\)#$/\1/p" crash_dump_bitmap.txt > makedumpfile.patch
+
+=
+#diff --git a/makedumpfile.c b/makedumpfile.c#
+#index acb1b21..f29b6a5 100644#
+#--- a/makedumpfile.c#
+#+++ b/m

[RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-07 Thread Jingbai Ma

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).

It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)

2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33
minutes.

3. Scans memory pages code in makedumpfile is very complicated, without
kernel memory management related data structure, makedumpfile has to
build up its own data structure, and will not able to use some macros
that only be available in kernel (e.g. page_to_pfn), and has to use some
slow lookup algorithm instead.

This patch introduces a new way to scan memory pages. It reserves a
piece of memory (1 bit for each page, 32MB per TB memory on x86 systems)
in the first kernel. During the kernel crash process, it scans all
memory pages, clear the bit for all excluded memory pages in the
reserved memory.

We have several benefits by this new approach:

1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
scan all memory pages!

2. Reduces the memory requirement of makedumpfile by putting the
reserved memory in the first kernel memory space.

3. Simplifies the complexity of existing memory pages scanning code in
userspace.

To do:
1. It only has been verified on x86 64bit platform, needs to be modified
for other platforms. (ARM, XEN, PPC, etc...)

---

Jingbai Ma (5):
  crash dump bitmap: add a kernel config and help document
  crash dump bitmap: init crash dump bitmap in kernel booting process
  crash dump bitmap: scan memory pages in kernel crash process
  crash dump bitmap: add a proc interface for crash dump bitmap
  crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue


 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 arch/x86/kernel/setup.c   |   62 +
 fs/proc/Makefile  |1 
 fs/proc/crash_dump_bitmap.c   |  221 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 
 kernel/crash_dump_bitmap.c|  201 +++
 kernel/kexec.c|5 
 9 files changed, 943 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt
 create mode 100644 fs/proc/crash_dump_bitmap.c
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

--
Jingbai Ma 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 4/5] crash dump bitmap: add a proc interface for crash dump bitmap

2013-03-07 Thread Jingbai Ma

Add a procfs driver for selecting exclude pages in userspace.
/proc/crash_dump_bitmap/

Signed-off-by: Jingbai Ma 
---
 fs/proc/Makefile|1 
 fs/proc/crash_dump_bitmap.c |  221 +++
 2 files changed, 222 insertions(+), 0 deletions(-)
 create mode 100644 fs/proc/crash_dump_bitmap.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 712f24d..2dfcff1 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -27,6 +27,7 @@ proc-$(CONFIG_PROC_SYSCTL)+= proc_sysctl.o
 proc-$(CONFIG_NET) += proc_net.o
 proc-$(CONFIG_PROC_KCORE)  += kcore.o
 proc-$(CONFIG_PROC_VMCORE) += vmcore.o
+proc-$(CONFIG_CRASH_DUMP_BITMAP)   += crash_dump_bitmap.o
 proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
 proc-$(CONFIG_PRINTK)  += kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)   += page.o
diff --git a/fs/proc/crash_dump_bitmap.c b/fs/proc/crash_dump_bitmap.c
new file mode 100644
index 000..77ecaae
--- /dev/null
+++ b/fs/proc/crash_dump_bitmap.c
@@ -0,0 +1,221 @@
+/*
+ *fs/proc/crash_dump_bitmap.c
+ *Interface for controlling the crash dump bitmap from user space.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma 
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jingbai Ma ");
+MODULE_DESCRIPTION("Crash dump bitmap support driver");
+
+static const char *proc_dir_name = "crash_dump_bitmap";
+static const char *proc_page_status_name = "page_status";
+static const char *proc_dump_level_name = "dump_level";
+
+static struct proc_dir_entry *proc_dir, *proc_page_status, *proc_dump_level;
+
+static unsigned int get_dump_level(void)
+{
+   unsigned int dump_level;
+
+   dump_level = crash_dump_bitmap_ctrl.exclude_zero_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_private_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_user_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_free_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES : 0;
+
+   return dump_level;
+}
+
+static void set_dump_level(unsigned int dump_level)
+{
+   crash_dump_bitmap_ctrl.exclude_zero_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES)
+? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_user_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_free_pages =
+   (dump_level & CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES) ? 1 : 0;
+}
+
+static int proc_page_status_show(struct seq_file *m, void *v)
+{
+   u64 start, duration;
+
+   if (!crash_dump_bitmap_mem) {
+   seq_printf(m,
+   "crash_dump_bitmap: crash_dump_bitmap_mem not 
found!\n");
+
+   return -EINVAL;
+   }
+
+   seq_printf(m, "Exclude page flag status:\n");
+   seq_printf(m, "exclude_dump_bitmap_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages);
+   seq_printf(m, "exclude_zero_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_zero_pages);
+   seq_printf(m, "exclude_cache_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_cache_pages);
+   seq_printf(m, "exclude_cache_private_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages);
+   seq_printf(m, "exclude_user_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_user_pages);
+   seq_printf(m, "exclude_free_pages=%d\n",
+   crash_dump_bitmap_ctrl.exclude_free_pages);
+
+   seq_printf(m, "Scanning all memory pages:\n&quo

[RFC PATCH 5/5] crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue

2013-03-07 Thread Jingbai Ma

Linux kernel 3.9-rc1 allows crashkernel above 4GB, but current
kexec-tools doesn't support it yet.
This patch is only a workaround to make kdump work again.
This patch should be removed after kexec-tools 2.0.4 release.

Signed-off-by: Jingbai Ma 
---
 arch/x86/kernel/setup.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 165c831..15321d6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -506,7 +506,8 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 #ifdef CONFIG_X86_32
 # define CRASH_KERNEL_ADDR_MAX (512 << 20)
 #else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+/* # define CRASH_KERNEL_ADDR_MAX  MAXMEM */
+# define CRASH_KERNEL_ADDR_MAX (896 << 20)
 #endif
 
 static void __init reserve_crashkernel_low(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 3/5] crash dump bitmap: scan memory pages in kernel crash process

2013-03-07 Thread Jingbai Ma

In the kernel crash process, call generate_crash_dump_bitmap() to scans
all memory pages, clear the bit for all excluded memory pages in the
reserved memory.

Signed-off-by: Jingbai Ma 
---
 kernel/crash_dump_bitmap.c |  156 
 kernel/kexec.c |5 +
 2 files changed, 161 insertions(+), 0 deletions(-)

diff --git a/kernel/crash_dump_bitmap.c b/kernel/crash_dump_bitmap.c
index e743cdd..eed13ca 100644
--- a/kernel/crash_dump_bitmap.c
+++ b/kernel/crash_dump_bitmap.c
@@ -23,6 +23,8 @@
 
 #ifdef CONFIG_CRASH_DUMP_BITMAP
 
+#define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
+
 phys_addr_t crash_dump_bitmap_mem;
 EXPORT_SYMBOL(crash_dump_bitmap_mem);
 
@@ -35,6 +37,7 @@ EXPORT_SYMBOL(crash_dump_bitmap_ctrl);
 struct crash_dump_bitmap_info crash_dump_bitmap_info;
 EXPORT_SYMBOL(crash_dump_bitmap_info);
 
+
 /* Location of the reserved area for the crash_dump_bitmap */
 struct resource crash_dump_bitmap_res = {
.name  = "Crash dump bitmap",
@@ -42,4 +45,157 @@ struct resource crash_dump_bitmap_res = {
.end   = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
+
+inline void set_crash_dump_bitmap(unsigned long pfn, int val)
+{
+   phys_addr_t paddr = crash_dump_bitmap_info.bitmap + (pfn >> 3);
+   unsigned char *vaddr;
+   unsigned char bit = (pfn & 7);
+
+   if (unlikely(paddr > (crash_dump_bitmap_mem
+   + crash_dump_bitmap_mem_size))) {
+   pr_err(
+   "crash_dump_bitmap: pfn exceed limit. pfn=%ld, addr=0x%llX\n",
+   pfn, paddr);
+   return;
+   }
+
+   vaddr = (unsigned char *)__va(paddr);
+
+   if (val)
+   *vaddr |= (1U << bit);
+   else
+   *vaddr &= (~(1U << bit));
+}
+
+void generate_crash_dump_bitmap(void)
+{
+   pg_data_t *pgdat;
+   struct zone *zone;
+   unsigned long flags;
+   int order, t;
+   struct list_head *curr;
+   unsigned long zone_free_pages;
+   phys_addr_t addr;
+
+   if (!crash_dump_bitmap_mem) {
+   pr_info("crash_dump_bitmap: no crash_dump_bitmap memory.\n");
+   return;
+   }
+
+   pr_info(
+   "Excluding pages: bitmap=%d, cache=%d, private=%d, user=%d, free=%d\n",
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages,
+   crash_dump_bitmap_ctrl.exclude_cache_pages,
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages,
+   crash_dump_bitmap_ctrl.exclude_user_pages,
+   crash_dump_bitmap_ctrl.exclude_free_pages);
+
+   crash_dump_bitmap_info.free_pages = 0;
+   crash_dump_bitmap_info.cache_pages = 0;
+   crash_dump_bitmap_info.cache_private_pages = 0;
+   crash_dump_bitmap_info.user_pages = 0;
+   crash_dump_bitmap_info.hwpoison_pages = 0;
+
+   /* Set all bits on bitmap */
+   memset(__va(crash_dump_bitmap_info.bitmap), 0xff,
+   crash_dump_bitmap_info.bitmap_size);
+
+   /* Exclude all crash_dump_bitmap pages */
+   if (crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages) {
+   for (addr = crash_dump_bitmap_mem; addr <
+   crash_dump_bitmap_mem + crash_dump_bitmap_mem_size;
+   addr += PAGE_SIZE)
+   set_crash_dump_bitmap(
+   virt_to_pfn(__va(addr)), 0);
+   }
+
+   /* Exclude unnecessary pages */
+   for_each_online_pgdat(pgdat) {
+   unsigned long i;
+   unsigned long flags;
+
+   pgdat_resize_lock(pgdat, );
+   for (i = 0; i < pgdat->node_spanned_pages; i++) {
+   struct page *page;
+   unsigned long pfn = pgdat->node_start_pfn + i;
+
+   if (!pfn_valid(pfn))
+   continue;
+
+   page = pfn_to_page(pfn);
+
+   /* Exclude the cache pages without the private page */
+   if (crash_dump_bitmap_ctrl.exclude_cache_pages
+   && (PageLRU(page) || PageSwapCache(page))
+   && !page_has_private(page) && !PageAnon(page)) {
+   set_crash_dump_bitmap(pfn, 0);
+   crash_dump_bitmap_info.cache_pages++;
+   }
+   /* Exclude the cache pages with private page */
+   else if (
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages
+   && (PageLRU(page) || PageSwapCache(page))
+   && !PageAnon(page)) {
+   set_crash_dump_bitmap(pfn, 0);
+   crash_dump

[RFC PATCH 2/5] crash dump bitmap: init crash dump bitmap in kernel booting process

2013-03-07 Thread Jingbai Ma

Reserve a memory block for crash_dump_bitmap in kernel booting process.

Signed-off-by: Jingbai Ma 
---
 arch/x86/kernel/setup.c   |   59 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 +
 kernel/crash_dump_bitmap.c|   45 
 4 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84d3285..165c831 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -67,6 +67,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -601,6 +602,62 @@ static void __init reserve_crashkernel(void)
 }
 #endif
 
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+static void __init crash_dump_bitmap_init(void)
+{
+   static unsigned long BITSPERBYTE = 8;
+
+   unsigned long long mem_start;
+   unsigned long long mem_size;
+
+   if (is_kdump_kernel())
+   return;
+
+   mem_start = (1ULL << 24); /* 16MB */
+   mem_size = roundup((roundup(max_pfn, BITSPERBYTE) / BITSPERBYTE),
+   PAGE_SIZE);
+
+   crash_dump_bitmap_mem = memblock_find_in_range(mem_start,
+   MEMBLOCK_ALLOC_ACCESSIBLE, mem_size, PAGE_SIZE);
+
+   if (!crash_dump_bitmap_mem) {
+   pr_err(
+   "crash_dump_bitmap: allocate error! size=%lldkB, from=%lldMB\n",
+   mem_size >> 10, mem_start >> 20);
+
+   return;
+   }
+
+   crash_dump_bitmap_mem_size = mem_size;
+   memblock_reserve(crash_dump_bitmap_mem, crash_dump_bitmap_mem_size);
+   pr_info("crash_dump_bitmap: bitmap_mem=%lldMB. size=%lldkB\n",
+   (unsigned long long)crash_dump_bitmap_mem >> 20,
+   mem_size >> 10);
+
+   crash_dump_bitmap_res.start = crash_dump_bitmap_mem;
+   crash_dump_bitmap_res.end   = crash_dump_bitmap_mem + mem_size - 1;
+   insert_resource(_resource, _dump_bitmap_res);
+
+   crash_dump_bitmap_info.version = CRASH_DUMP_BITMAP_VERSION;
+
+   crash_dump_bitmap_info.bitmap = crash_dump_bitmap_mem;
+   crash_dump_bitmap_info.bitmap_size = crash_dump_bitmap_mem_size;
+
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_zero_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_user_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_free_pages = 1;
+
+   pr_info("crash_dump_bitmap: Initialized!\n");
+}
+#else
+static void __init crash_dump_bitmap_init(void)
+{
+}
+#endif
+
 static struct resource standard_io_resources[] = {
{ .name = "dma1", .start = 0x00, .end = 0x1f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
@@ -1094,6 +1151,8 @@ void __init setup_arch(char **cmdline_p)
 
reserve_crashkernel();
 
+   crash_dump_bitmap_init();
+
vsmp_init();
 
io_delay_init();
diff --git a/include/linux/crash_dump_bitmap.h 
b/include/linux/crash_dump_bitmap.h
new file mode 100644
index 000..63b1264
--- /dev/null
+++ b/include/linux/crash_dump_bitmap.h
@@ -0,0 +1,59 @@
+/*
+ *include/linux/crash_dump_bitmap.h
+ *Declaration of crash dump bitmap functions and data structures.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma 
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#ifndef _LINUX_CRASH_DUMP_BITMAP_H
+#define _LINUX_CRASH_DUMP_BITMAP_H
+
+#define CRASH_DUMP_BITMAP_VERSION 1;
+
+enum {
+   CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES = 1,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES = 2,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES = 4,
+   CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES = 8,
+   CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES = 16
+};
+
+struct crash_dump_bitmap_ctrl {
+   char exclude_crash_dump_bitmap_pages;
+   char exclude_zero_pages;/* only for tracking dump level */
+   char exclude_cache_pages;
+   char exclude_cache_private_pages;
+   char exclude_user_pages;
+   char exclude_free_pages;
+};
+
+struct crash_dump_bitmap_info {
+   unsigned int version;
+   phys_addr_t bitmap;
+   phys_addr_t bitmap_size;
+   unsigned long cache_pages;
+   unsigned long cache_

[RFC PATCH 1/5] crash dump bitmap: add a kernel config and help document

2013-03-07 Thread Jingbai Ma

Add a kernel config and help document for CRASH_DUMP_BITMAP.

Signed-off-by: Jingbai Ma 
---
 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 2 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt

diff --git a/Documentation/kdump/crash_dump_bitmap.txt 
b/Documentation/kdump/crash_dump_bitmap.txt
new file mode 100644
index 000..468cdf2
--- /dev/null
+++ b/Documentation/kdump/crash_dump_bitmap.txt
@@ -0,0 +1,378 @@
+
+Documentation for Crash Dump Bitmap
+
+
+This document includes overview, setup and installation, and analysis
+information.
+
+Overview
+
+
+Traditionally, to reduce the size of dump file, dumper scans all memory
+pages to exclude the unnecessary memory pages after capture kernel
+booted, and scan it in userspace code (makedumpfile).
+
+It introduces several problems:
+
+1. Requires more memory to store memory bitmap on systems with large
+amount of memory installed. And in capture kernel there is only a few
+free memory available, it will cause an out of memory error and fail.
+(Non-cyclic mode)
+
+2. Scans all memory pages in makedumpfile is a very slow process. On
+system with 1TB or more memory installed, the scanning process is very
+long. Typically on 1TB idle system, it takes about 19 minutes. On system
+with 4TB or more memory installed, it even doesn't work. To address the
+out of memory issue on system with big memory (4TB or more memory
+installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
+scans a piece of memory pages each time, and do it cyclically to scan
+all memory pages. But it runs more slowly, on 1TB system, takes about 33
+minutes.
+
+3. Scans memory pages code in makedumpfile is very complicated, without
+kernel memory management related data structure, makedumpfile has to
+build up its on data structure, and will not able to use some macros
+that only be available in kernel (e.g. page_to_pfn), and has to use some
+slow lookup algorithm instead.
+
+This patch introduces a new way to scan memory pages. It reserves a piece of
+memory (1 bit for each page, 32MB per TB memory on x86 systems) in the first
+kernel. During the kernel panic process, it scans all memory pages, clear the
+bit for all excluded memory pages in the reserved memory.
+
+We have several benefits by this new approach:
+
+1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
+scan all memory pages!
+
+2. Reduces the memory requirement of makedumpfile by putting the
+reserved memory in the first kernel memory space.
+
+3. Simplifies the complexity of existing memory pages scanning code in
+userspace.
+
+
+Usage
+=
+
+1) Enable "kernel crash dump bitmap" in "Processor type and features", under
+"kernel crash dumps".
+
+CONFIG_CRASH_DUMP_BITMAP=y
+
+it depends on "kexec system call" and "kernel crash dumps", so there features
+must be enabled also.
+
+CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
+
+2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo filesystems.".
+
+   CONFIG_SYSFS=y
+
+3) Compile and install the new kernel.
+
+4) Check the new kernel.
+Once new kernel has booted, there will be a new foler
+/proc/crash_dump_bitmap.
+Check current dump level:
+cat /proc/crash_dump_bitmap/dump_level
+
+Set dump level:
+echo "dump level" > /proc/crash_dump_bitmap/dump_level
+
+The dump level is as same as the parameter of makedumpfile -d dump_level.
+
+Run page scan and check page status:
+cat /proc/crash_dump_bitmap/page_status
+
+5) Download makedumpfile v1.5.3 or later from sourceforge:
+http://sourceforge.net/projects/makedumpfile/
+
+6) Patch it with the patch at the end of this file.
+
+7) Compile it and copy the patched makedumpfile into the right folder
+(/sbin or /usr/sbin)
+
+8) Change the /etc/kdump.conf, and a "-q" in the makedumpfile parameter
+line. It will tell makedumpfile to use the crash dump bitmap in kernel.
+core_collector makedumpfile --non-cyclic -q -c -d 31 --message-level 23
+
+9) Regenerate initramfs to make sure the patched makedumpfile and config
+has been included in it.
+
+
+To Do
+=
+
+It only supports x86-64 architecture currently, need to add supports for
+other architectures.
+
+
+Contact
+===
+
+Jingbai Ma (jingbai...@hp.com)
+
+
+Patch (for makedumpfile v1.5.3)
+
+Please forgive me, for some format issues of makedumpfile source, I have
+to wrap this patch with '#'.  Please use this sed command to get the
+patch for makedumpfile:
+
+sed -n -e "s/^#\(.*\)#$/\1/p" crash_dump_bitmap.txt > makedumpfile.patch
+
+=
+#diff --git a/makedumpfile.c b/makedumpfile.c#
+#index acb1b21..f29b6a5 100644#
+#--- a/makedumpfile.c#
+#+++ b/m

[RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-07 Thread Jingbai Ma

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).

It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)

2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33
minutes.

3. Scans memory pages code in makedumpfile is very complicated, without
kernel memory management related data structure, makedumpfile has to
build up its own data structure, and will not able to use some macros
that only be available in kernel (e.g. page_to_pfn), and has to use some
slow lookup algorithm instead.

This patch introduces a new way to scan memory pages. It reserves a
piece of memory (1 bit for each page, 32MB per TB memory on x86 systems)
in the first kernel. During the kernel crash process, it scans all
memory pages, clear the bit for all excluded memory pages in the
reserved memory.

We have several benefits by this new approach:

1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
scan all memory pages!

2. Reduces the memory requirement of makedumpfile by putting the
reserved memory in the first kernel memory space.

3. Simplifies the complexity of existing memory pages scanning code in
userspace.

To do:
1. It only has been verified on x86 64bit platform, needs to be modified
for other platforms. (ARM, XEN, PPC, etc...)

---

Jingbai Ma (5):
  crash dump bitmap: add a kernel config and help document
  crash dump bitmap: init crash dump bitmap in kernel booting process
  crash dump bitmap: scan memory pages in kernel crash process
  crash dump bitmap: add a proc interface for crash dump bitmap
  crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue


 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 arch/x86/kernel/setup.c   |   62 +
 fs/proc/Makefile  |1 
 fs/proc/crash_dump_bitmap.c   |  221 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 
 kernel/crash_dump_bitmap.c|  201 +++
 kernel/kexec.c|5 
 9 files changed, 943 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt
 create mode 100644 fs/proc/crash_dump_bitmap.c
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

--
Jingbai Ma 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-07 Thread Jingbai Ma

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).

It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)

2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33
minutes.

3. Scans memory pages code in makedumpfile is very complicated, without
kernel memory management related data structure, makedumpfile has to
build up its own data structure, and will not able to use some macros
that only be available in kernel (e.g. page_to_pfn), and has to use some
slow lookup algorithm instead.

This patch introduces a new way to scan memory pages. It reserves a
piece of memory (1 bit for each page, 32MB per TB memory on x86 systems)
in the first kernel. During the kernel crash process, it scans all
memory pages, clear the bit for all excluded memory pages in the
reserved memory.

We have several benefits by this new approach:

1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
scan all memory pages!

2. Reduces the memory requirement of makedumpfile by putting the
reserved memory in the first kernel memory space.

3. Simplifies the complexity of existing memory pages scanning code in
userspace.

To do:
1. It only has been verified on x86 64bit platform, needs to be modified
for other platforms. (ARM, XEN, PPC, etc...)

---

Jingbai Ma (5):
  crash dump bitmap: add a kernel config and help document
  crash dump bitmap: init crash dump bitmap in kernel booting process
  crash dump bitmap: scan memory pages in kernel crash process
  crash dump bitmap: add a proc interface for crash dump bitmap
  crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue


 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 arch/x86/kernel/setup.c   |   62 +
 fs/proc/Makefile  |1 
 fs/proc/crash_dump_bitmap.c   |  221 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 
 kernel/crash_dump_bitmap.c|  201 +++
 kernel/kexec.c|5 
 9 files changed, 943 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt
 create mode 100644 fs/proc/crash_dump_bitmap.c
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

--
Jingbai Ma jingbai...@hp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/5] crash dump bitmap: add a kernel config and help document

2013-03-07 Thread Jingbai Ma

Add a kernel config and help document for CRASH_DUMP_BITMAP.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 2 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt

diff --git a/Documentation/kdump/crash_dump_bitmap.txt 
b/Documentation/kdump/crash_dump_bitmap.txt
new file mode 100644
index 000..468cdf2
--- /dev/null
+++ b/Documentation/kdump/crash_dump_bitmap.txt
@@ -0,0 +1,378 @@
+
+Documentation for Crash Dump Bitmap
+
+
+This document includes overview, setup and installation, and analysis
+information.
+
+Overview
+
+
+Traditionally, to reduce the size of dump file, dumper scans all memory
+pages to exclude the unnecessary memory pages after capture kernel
+booted, and scan it in userspace code (makedumpfile).
+
+It introduces several problems:
+
+1. Requires more memory to store memory bitmap on systems with large
+amount of memory installed. And in capture kernel there is only a few
+free memory available, it will cause an out of memory error and fail.
+(Non-cyclic mode)
+
+2. Scans all memory pages in makedumpfile is a very slow process. On
+system with 1TB or more memory installed, the scanning process is very
+long. Typically on 1TB idle system, it takes about 19 minutes. On system
+with 4TB or more memory installed, it even doesn't work. To address the
+out of memory issue on system with big memory (4TB or more memory
+installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
+scans a piece of memory pages each time, and do it cyclically to scan
+all memory pages. But it runs more slowly, on 1TB system, takes about 33
+minutes.
+
+3. Scans memory pages code in makedumpfile is very complicated, without
+kernel memory management related data structure, makedumpfile has to
+build up its on data structure, and will not able to use some macros
+that only be available in kernel (e.g. page_to_pfn), and has to use some
+slow lookup algorithm instead.
+
+This patch introduces a new way to scan memory pages. It reserves a piece of
+memory (1 bit for each page, 32MB per TB memory on x86 systems) in the first
+kernel. During the kernel panic process, it scans all memory pages, clear the
+bit for all excluded memory pages in the reserved memory.
+
+We have several benefits by this new approach:
+
+1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
+scan all memory pages!
+
+2. Reduces the memory requirement of makedumpfile by putting the
+reserved memory in the first kernel memory space.
+
+3. Simplifies the complexity of existing memory pages scanning code in
+userspace.
+
+
+Usage
+=
+
+1) Enable kernel crash dump bitmap in Processor type and features, under
+kernel crash dumps.
+
+CONFIG_CRASH_DUMP_BITMAP=y
+
+it depends on kexec system call and kernel crash dumps, so there features
+must be enabled also.
+
+CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
+
+2) Enable sysfs file system support in Filesystem - Pseudo filesystems..
+
+   CONFIG_SYSFS=y
+
+3) Compile and install the new kernel.
+
+4) Check the new kernel.
+Once new kernel has booted, there will be a new foler
+/proc/crash_dump_bitmap.
+Check current dump level:
+cat /proc/crash_dump_bitmap/dump_level
+
+Set dump level:
+echo dump level  /proc/crash_dump_bitmap/dump_level
+
+The dump level is as same as the parameter of makedumpfile -d dump_level.
+
+Run page scan and check page status:
+cat /proc/crash_dump_bitmap/page_status
+
+5) Download makedumpfile v1.5.3 or later from sourceforge:
+http://sourceforge.net/projects/makedumpfile/
+
+6) Patch it with the patch at the end of this file.
+
+7) Compile it and copy the patched makedumpfile into the right folder
+(/sbin or /usr/sbin)
+
+8) Change the /etc/kdump.conf, and a -q in the makedumpfile parameter
+line. It will tell makedumpfile to use the crash dump bitmap in kernel.
+core_collector makedumpfile --non-cyclic -q -c -d 31 --message-level 23
+
+9) Regenerate initramfs to make sure the patched makedumpfile and config
+has been included in it.
+
+
+To Do
+=
+
+It only supports x86-64 architecture currently, need to add supports for
+other architectures.
+
+
+Contact
+===
+
+Jingbai Ma (jingbai...@hp.com)
+
+
+Patch (for makedumpfile v1.5.3)
+
+Please forgive me, for some format issues of makedumpfile source, I have
+to wrap this patch with '#'.  Please use this sed command to get the
+patch for makedumpfile:
+
+sed -n -e s/^#\(.*\)#$/\1/p crash_dump_bitmap.txt  makedumpfile.patch
+
+=
+#diff --git a/makedumpfile.c b/makedumpfile.c#
+#index acb1b21..f29b6a5 100644#
+#--- a/makedumpfile.c#
+#+++ b/makedumpfile.c#
+#@@ -34,6 +34,10 @@ struct srcfile_table   srcfile_table;#
+# struct vm_table  vt = { 0 };#
+# struct

[RFC PATCH 2/5] crash dump bitmap: init crash dump bitmap in kernel booting process

2013-03-07 Thread Jingbai Ma

Reserve a memory block for crash_dump_bitmap in kernel booting process.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 arch/x86/kernel/setup.c   |   59 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 +
 kernel/crash_dump_bitmap.c|   45 
 4 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84d3285..165c831 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -67,6 +67,7 @@
 
 #include linux/percpu.h
 #include linux/crash_dump.h
+#include linux/crash_dump_bitmap.h
 #include linux/tboot.h
 #include linux/jiffies.h
 
@@ -601,6 +602,62 @@ static void __init reserve_crashkernel(void)
 }
 #endif
 
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+static void __init crash_dump_bitmap_init(void)
+{
+   static unsigned long BITSPERBYTE = 8;
+
+   unsigned long long mem_start;
+   unsigned long long mem_size;
+
+   if (is_kdump_kernel())
+   return;
+
+   mem_start = (1ULL  24); /* 16MB */
+   mem_size = roundup((roundup(max_pfn, BITSPERBYTE) / BITSPERBYTE),
+   PAGE_SIZE);
+
+   crash_dump_bitmap_mem = memblock_find_in_range(mem_start,
+   MEMBLOCK_ALLOC_ACCESSIBLE, mem_size, PAGE_SIZE);
+
+   if (!crash_dump_bitmap_mem) {
+   pr_err(
+   crash_dump_bitmap: allocate error! size=%lldkB, from=%lldMB\n,
+   mem_size  10, mem_start  20);
+
+   return;
+   }
+
+   crash_dump_bitmap_mem_size = mem_size;
+   memblock_reserve(crash_dump_bitmap_mem, crash_dump_bitmap_mem_size);
+   pr_info(crash_dump_bitmap: bitmap_mem=%lldMB. size=%lldkB\n,
+   (unsigned long long)crash_dump_bitmap_mem  20,
+   mem_size  10);
+
+   crash_dump_bitmap_res.start = crash_dump_bitmap_mem;
+   crash_dump_bitmap_res.end   = crash_dump_bitmap_mem + mem_size - 1;
+   insert_resource(iomem_resource, crash_dump_bitmap_res);
+
+   crash_dump_bitmap_info.version = CRASH_DUMP_BITMAP_VERSION;
+
+   crash_dump_bitmap_info.bitmap = crash_dump_bitmap_mem;
+   crash_dump_bitmap_info.bitmap_size = crash_dump_bitmap_mem_size;
+
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_zero_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_user_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_free_pages = 1;
+
+   pr_info(crash_dump_bitmap: Initialized!\n);
+}
+#else
+static void __init crash_dump_bitmap_init(void)
+{
+}
+#endif
+
 static struct resource standard_io_resources[] = {
{ .name = dma1, .start = 0x00, .end = 0x1f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
@@ -1094,6 +1151,8 @@ void __init setup_arch(char **cmdline_p)
 
reserve_crashkernel();
 
+   crash_dump_bitmap_init();
+
vsmp_init();
 
io_delay_init();
diff --git a/include/linux/crash_dump_bitmap.h 
b/include/linux/crash_dump_bitmap.h
new file mode 100644
index 000..63b1264
--- /dev/null
+++ b/include/linux/crash_dump_bitmap.h
@@ -0,0 +1,59 @@
+/*
+ *include/linux/crash_dump_bitmap.h
+ *Declaration of crash dump bitmap functions and data structures.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma jingbai...@hp.com
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#ifndef _LINUX_CRASH_DUMP_BITMAP_H
+#define _LINUX_CRASH_DUMP_BITMAP_H
+
+#define CRASH_DUMP_BITMAP_VERSION 1;
+
+enum {
+   CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES = 1,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES = 2,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES = 4,
+   CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES = 8,
+   CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES = 16
+};
+
+struct crash_dump_bitmap_ctrl {
+   char exclude_crash_dump_bitmap_pages;
+   char exclude_zero_pages;/* only for tracking dump level */
+   char exclude_cache_pages;
+   char exclude_cache_private_pages;
+   char exclude_user_pages;
+   char exclude_free_pages;
+};
+
+struct crash_dump_bitmap_info {
+   unsigned int version;
+   phys_addr_t bitmap;
+   phys_addr_t bitmap_size;
+   unsigned long

[RFC PATCH 3/5] crash dump bitmap: scan memory pages in kernel crash process

2013-03-07 Thread Jingbai Ma

In the kernel crash process, call generate_crash_dump_bitmap() to scans
all memory pages, clear the bit for all excluded memory pages in the
reserved memory.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 kernel/crash_dump_bitmap.c |  156 
 kernel/kexec.c |5 +
 2 files changed, 161 insertions(+), 0 deletions(-)

diff --git a/kernel/crash_dump_bitmap.c b/kernel/crash_dump_bitmap.c
index e743cdd..eed13ca 100644
--- a/kernel/crash_dump_bitmap.c
+++ b/kernel/crash_dump_bitmap.c
@@ -23,6 +23,8 @@
 
 #ifdef CONFIG_CRASH_DUMP_BITMAP
 
+#define virt_to_pfn(kaddr) (__pa(kaddr)  PAGE_SHIFT)
+
 phys_addr_t crash_dump_bitmap_mem;
 EXPORT_SYMBOL(crash_dump_bitmap_mem);
 
@@ -35,6 +37,7 @@ EXPORT_SYMBOL(crash_dump_bitmap_ctrl);
 struct crash_dump_bitmap_info crash_dump_bitmap_info;
 EXPORT_SYMBOL(crash_dump_bitmap_info);
 
+
 /* Location of the reserved area for the crash_dump_bitmap */
 struct resource crash_dump_bitmap_res = {
.name  = Crash dump bitmap,
@@ -42,4 +45,157 @@ struct resource crash_dump_bitmap_res = {
.end   = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
+
+inline void set_crash_dump_bitmap(unsigned long pfn, int val)
+{
+   phys_addr_t paddr = crash_dump_bitmap_info.bitmap + (pfn  3);
+   unsigned char *vaddr;
+   unsigned char bit = (pfn  7);
+
+   if (unlikely(paddr  (crash_dump_bitmap_mem
+   + crash_dump_bitmap_mem_size))) {
+   pr_err(
+   crash_dump_bitmap: pfn exceed limit. pfn=%ld, addr=0x%llX\n,
+   pfn, paddr);
+   return;
+   }
+
+   vaddr = (unsigned char *)__va(paddr);
+
+   if (val)
+   *vaddr |= (1U  bit);
+   else
+   *vaddr = (~(1U  bit));
+}
+
+void generate_crash_dump_bitmap(void)
+{
+   pg_data_t *pgdat;
+   struct zone *zone;
+   unsigned long flags;
+   int order, t;
+   struct list_head *curr;
+   unsigned long zone_free_pages;
+   phys_addr_t addr;
+
+   if (!crash_dump_bitmap_mem) {
+   pr_info(crash_dump_bitmap: no crash_dump_bitmap memory.\n);
+   return;
+   }
+
+   pr_info(
+   Excluding pages: bitmap=%d, cache=%d, private=%d, user=%d, free=%d\n,
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages,
+   crash_dump_bitmap_ctrl.exclude_cache_pages,
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages,
+   crash_dump_bitmap_ctrl.exclude_user_pages,
+   crash_dump_bitmap_ctrl.exclude_free_pages);
+
+   crash_dump_bitmap_info.free_pages = 0;
+   crash_dump_bitmap_info.cache_pages = 0;
+   crash_dump_bitmap_info.cache_private_pages = 0;
+   crash_dump_bitmap_info.user_pages = 0;
+   crash_dump_bitmap_info.hwpoison_pages = 0;
+
+   /* Set all bits on bitmap */
+   memset(__va(crash_dump_bitmap_info.bitmap), 0xff,
+   crash_dump_bitmap_info.bitmap_size);
+
+   /* Exclude all crash_dump_bitmap pages */
+   if (crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages) {
+   for (addr = crash_dump_bitmap_mem; addr 
+   crash_dump_bitmap_mem + crash_dump_bitmap_mem_size;
+   addr += PAGE_SIZE)
+   set_crash_dump_bitmap(
+   virt_to_pfn(__va(addr)), 0);
+   }
+
+   /* Exclude unnecessary pages */
+   for_each_online_pgdat(pgdat) {
+   unsigned long i;
+   unsigned long flags;
+
+   pgdat_resize_lock(pgdat, flags);
+   for (i = 0; i  pgdat-node_spanned_pages; i++) {
+   struct page *page;
+   unsigned long pfn = pgdat-node_start_pfn + i;
+
+   if (!pfn_valid(pfn))
+   continue;
+
+   page = pfn_to_page(pfn);
+
+   /* Exclude the cache pages without the private page */
+   if (crash_dump_bitmap_ctrl.exclude_cache_pages
+(PageLRU(page) || PageSwapCache(page))
+!page_has_private(page)  !PageAnon(page)) {
+   set_crash_dump_bitmap(pfn, 0);
+   crash_dump_bitmap_info.cache_pages++;
+   }
+   /* Exclude the cache pages with private page */
+   else if (
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages
+(PageLRU(page) || PageSwapCache(page))
+!PageAnon(page)) {
+   set_crash_dump_bitmap(pfn, 0);
+   crash_dump_bitmap_info.cache_private_pages++;
+   }
+   /* Exclude the pages used by user process

[RFC PATCH 4/5] crash dump bitmap: add a proc interface for crash dump bitmap

2013-03-07 Thread Jingbai Ma

Add a procfs driver for selecting exclude pages in userspace.
/proc/crash_dump_bitmap/

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 fs/proc/Makefile|1 
 fs/proc/crash_dump_bitmap.c |  221 +++
 2 files changed, 222 insertions(+), 0 deletions(-)
 create mode 100644 fs/proc/crash_dump_bitmap.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 712f24d..2dfcff1 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -27,6 +27,7 @@ proc-$(CONFIG_PROC_SYSCTL)+= proc_sysctl.o
 proc-$(CONFIG_NET) += proc_net.o
 proc-$(CONFIG_PROC_KCORE)  += kcore.o
 proc-$(CONFIG_PROC_VMCORE) += vmcore.o
+proc-$(CONFIG_CRASH_DUMP_BITMAP)   += crash_dump_bitmap.o
 proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
 proc-$(CONFIG_PRINTK)  += kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)   += page.o
diff --git a/fs/proc/crash_dump_bitmap.c b/fs/proc/crash_dump_bitmap.c
new file mode 100644
index 000..77ecaae
--- /dev/null
+++ b/fs/proc/crash_dump_bitmap.c
@@ -0,0 +1,221 @@
+/*
+ *fs/proc/crash_dump_bitmap.c
+ *Interface for controlling the crash dump bitmap from user space.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma jingbai...@hp.com
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/proc_fs.h
+#include linux/seq_file.h
+#include linux/jiffies.h
+#include linux/crash_dump.h
+#include linux/crash_dump_bitmap.h
+
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Jingbai Ma jingbai...@hp.com);
+MODULE_DESCRIPTION(Crash dump bitmap support driver);
+
+static const char *proc_dir_name = crash_dump_bitmap;
+static const char *proc_page_status_name = page_status;
+static const char *proc_dump_level_name = dump_level;
+
+static struct proc_dir_entry *proc_dir, *proc_page_status, *proc_dump_level;
+
+static unsigned int get_dump_level(void)
+{
+   unsigned int dump_level;
+
+   dump_level = crash_dump_bitmap_ctrl.exclude_zero_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_private_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_user_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_free_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES : 0;
+
+   return dump_level;
+}
+
+static void set_dump_level(unsigned int dump_level)
+{
+   crash_dump_bitmap_ctrl.exclude_zero_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES)
+? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_user_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_free_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES) ? 1 : 0;
+}
+
+static int proc_page_status_show(struct seq_file *m, void *v)
+{
+   u64 start, duration;
+
+   if (!crash_dump_bitmap_mem) {
+   seq_printf(m,
+   crash_dump_bitmap: crash_dump_bitmap_mem not 
found!\n);
+
+   return -EINVAL;
+   }
+
+   seq_printf(m, Exclude page flag status:\n);
+   seq_printf(m, exclude_dump_bitmap_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages);
+   seq_printf(m, exclude_zero_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_zero_pages);
+   seq_printf(m, exclude_cache_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_cache_pages);
+   seq_printf(m, exclude_cache_private_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages);
+   seq_printf(m, exclude_user_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_user_pages);
+   seq_printf(m, exclude_free_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_free_pages);
+
+   seq_printf(m, Scanning all memory pages:\n);
+   start = get_jiffies_64

[RFC PATCH 5/5] crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue

2013-03-07 Thread Jingbai Ma

Linux kernel 3.9-rc1 allows crashkernel above 4GB, but current
kexec-tools doesn't support it yet.
This patch is only a workaround to make kdump work again.
This patch should be removed after kexec-tools 2.0.4 release.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 arch/x86/kernel/setup.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 165c831..15321d6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -506,7 +506,8 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 #ifdef CONFIG_X86_32
 # define CRASH_KERNEL_ADDR_MAX (512  20)
 #else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+/* # define CRASH_KERNEL_ADDR_MAX  MAXMEM */
+# define CRASH_KERNEL_ADDR_MAX (896  20)
 #endif
 
 static void __init reserve_crashkernel_low(void)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/5] crash dump bitmap: scan memory pages in kernel to speedup kernel dump process

2013-03-07 Thread Jingbai Ma

This patch intend to speedup the memory pages scanning process in
selective dump mode.

Test result (On HP ProLiant DL980 G7 with 1TB RAM, makedumpfile
v1.5.3):

Total scan Time
Original kernel
+ makedumpfile v1.5.3 cyclic mode   1958.05 seconds
Original kernel
+ makedumpfile v1.5.3 non-cyclic mode   1151.50 seconds
Patched kernel
+ patched makedumpfile v1.5.3   17.50 seconds

Traditionally, to reduce the size of dump file, dumper scans all memory
pages to exclude the unnecessary memory pages after capture kernel
booted, and scan it in userspace code (makedumpfile).

It introduces several problems:

1. Requires more memory to store memory bitmap on systems with large
amount of memory installed. And in capture kernel there is only a few
free memory available, it will cause an out of memory error and fail.
(Non-cyclic mode)

2. Scans all memory pages in makedumpfile is a very slow process. On
system with 1TB or more memory installed, the scanning process is very
long. Typically on 1TB idle system, it takes about 19 minutes. On system
with 4TB or more memory installed, it even doesn't work. To address the
out of memory issue on system with big memory (4TB or more memory
installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
scans a piece of memory pages each time, and do it cyclically to scan
all memory pages. But it runs more slowly, on 1TB system, takes about 33
minutes.

3. Scans memory pages code in makedumpfile is very complicated, without
kernel memory management related data structure, makedumpfile has to
build up its own data structure, and will not able to use some macros
that only be available in kernel (e.g. page_to_pfn), and has to use some
slow lookup algorithm instead.

This patch introduces a new way to scan memory pages. It reserves a
piece of memory (1 bit for each page, 32MB per TB memory on x86 systems)
in the first kernel. During the kernel crash process, it scans all
memory pages, clear the bit for all excluded memory pages in the
reserved memory.

We have several benefits by this new approach:

1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
scan all memory pages!

2. Reduces the memory requirement of makedumpfile by putting the
reserved memory in the first kernel memory space.

3. Simplifies the complexity of existing memory pages scanning code in
userspace.

To do:
1. It only has been verified on x86 64bit platform, needs to be modified
for other platforms. (ARM, XEN, PPC, etc...)

---

Jingbai Ma (5):
  crash dump bitmap: add a kernel config and help document
  crash dump bitmap: init crash dump bitmap in kernel booting process
  crash dump bitmap: scan memory pages in kernel crash process
  crash dump bitmap: add a proc interface for crash dump bitmap
  crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue


 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 arch/x86/kernel/setup.c   |   62 +
 fs/proc/Makefile  |1 
 fs/proc/crash_dump_bitmap.c   |  221 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 
 kernel/crash_dump_bitmap.c|  201 +++
 kernel/kexec.c|5 
 9 files changed, 943 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt
 create mode 100644 fs/proc/crash_dump_bitmap.c
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

--
Jingbai Ma jingbai...@hp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/5] crash dump bitmap: add a kernel config and help document

2013-03-07 Thread Jingbai Ma

Add a kernel config and help document for CRASH_DUMP_BITMAP.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 Documentation/kdump/crash_dump_bitmap.txt |  378 +
 arch/x86/Kconfig  |   16 +
 2 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kdump/crash_dump_bitmap.txt

diff --git a/Documentation/kdump/crash_dump_bitmap.txt 
b/Documentation/kdump/crash_dump_bitmap.txt
new file mode 100644
index 000..468cdf2
--- /dev/null
+++ b/Documentation/kdump/crash_dump_bitmap.txt
@@ -0,0 +1,378 @@
+
+Documentation for Crash Dump Bitmap
+
+
+This document includes overview, setup and installation, and analysis
+information.
+
+Overview
+
+
+Traditionally, to reduce the size of dump file, dumper scans all memory
+pages to exclude the unnecessary memory pages after capture kernel
+booted, and scan it in userspace code (makedumpfile).
+
+It introduces several problems:
+
+1. Requires more memory to store memory bitmap on systems with large
+amount of memory installed. And in capture kernel there is only a few
+free memory available, it will cause an out of memory error and fail.
+(Non-cyclic mode)
+
+2. Scans all memory pages in makedumpfile is a very slow process. On
+system with 1TB or more memory installed, the scanning process is very
+long. Typically on 1TB idle system, it takes about 19 minutes. On system
+with 4TB or more memory installed, it even doesn't work. To address the
+out of memory issue on system with big memory (4TB or more memory
+installed),  makedumpfile v1.5.1 introduces a new cyclic mode. It only
+scans a piece of memory pages each time, and do it cyclically to scan
+all memory pages. But it runs more slowly, on 1TB system, takes about 33
+minutes.
+
+3. Scans memory pages code in makedumpfile is very complicated, without
+kernel memory management related data structure, makedumpfile has to
+build up its on data structure, and will not able to use some macros
+that only be available in kernel (e.g. page_to_pfn), and has to use some
+slow lookup algorithm instead.
+
+This patch introduces a new way to scan memory pages. It reserves a piece of
+memory (1 bit for each page, 32MB per TB memory on x86 systems) in the first
+kernel. During the kernel panic process, it scans all memory pages, clear the
+bit for all excluded memory pages in the reserved memory.
+
+We have several benefits by this new approach:
+
+1. It's extremely fast, on 1TB system only takes about 17.5 seconds to
+scan all memory pages!
+
+2. Reduces the memory requirement of makedumpfile by putting the
+reserved memory in the first kernel memory space.
+
+3. Simplifies the complexity of existing memory pages scanning code in
+userspace.
+
+
+Usage
+=
+
+1) Enable kernel crash dump bitmap in Processor type and features, under
+kernel crash dumps.
+
+CONFIG_CRASH_DUMP_BITMAP=y
+
+it depends on kexec system call and kernel crash dumps, so there features
+must be enabled also.
+
+CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
+
+2) Enable sysfs file system support in Filesystem - Pseudo filesystems..
+
+   CONFIG_SYSFS=y
+
+3) Compile and install the new kernel.
+
+4) Check the new kernel.
+Once new kernel has booted, there will be a new foler
+/proc/crash_dump_bitmap.
+Check current dump level:
+cat /proc/crash_dump_bitmap/dump_level
+
+Set dump level:
+echo dump level  /proc/crash_dump_bitmap/dump_level
+
+The dump level is as same as the parameter of makedumpfile -d dump_level.
+
+Run page scan and check page status:
+cat /proc/crash_dump_bitmap/page_status
+
+5) Download makedumpfile v1.5.3 or later from sourceforge:
+http://sourceforge.net/projects/makedumpfile/
+
+6) Patch it with the patch at the end of this file.
+
+7) Compile it and copy the patched makedumpfile into the right folder
+(/sbin or /usr/sbin)
+
+8) Change the /etc/kdump.conf, and a -q in the makedumpfile parameter
+line. It will tell makedumpfile to use the crash dump bitmap in kernel.
+core_collector makedumpfile --non-cyclic -q -c -d 31 --message-level 23
+
+9) Regenerate initramfs to make sure the patched makedumpfile and config
+has been included in it.
+
+
+To Do
+=
+
+It only supports x86-64 architecture currently, need to add supports for
+other architectures.
+
+
+Contact
+===
+
+Jingbai Ma (jingbai...@hp.com)
+
+
+Patch (for makedumpfile v1.5.3)
+
+Please forgive me, for some format issues of makedumpfile source, I have
+to wrap this patch with '#'.  Please use this sed command to get the
+patch for makedumpfile:
+
+sed -n -e s/^#\(.*\)#$/\1/p crash_dump_bitmap.txt  makedumpfile.patch
+
+=
+#diff --git a/makedumpfile.c b/makedumpfile.c#
+#index acb1b21..f29b6a5 100644#
+#--- a/makedumpfile.c#
+#+++ b/makedumpfile.c#
+#@@ -34,6 +34,10 @@ struct srcfile_table   srcfile_table;#
+# struct vm_table  vt = { 0 };#
+# struct

[RFC PATCH 2/5] crash dump bitmap: init crash dump bitmap in kernel booting process

2013-03-07 Thread Jingbai Ma

Reserve a memory block for crash_dump_bitmap in kernel booting process.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 arch/x86/kernel/setup.c   |   59 +
 include/linux/crash_dump_bitmap.h |   59 +
 kernel/Makefile   |1 +
 kernel/crash_dump_bitmap.c|   45 
 4 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/crash_dump_bitmap.h
 create mode 100644 kernel/crash_dump_bitmap.c

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84d3285..165c831 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -67,6 +67,7 @@
 
 #include linux/percpu.h
 #include linux/crash_dump.h
+#include linux/crash_dump_bitmap.h
 #include linux/tboot.h
 #include linux/jiffies.h
 
@@ -601,6 +602,62 @@ static void __init reserve_crashkernel(void)
 }
 #endif
 
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+static void __init crash_dump_bitmap_init(void)
+{
+   static unsigned long BITSPERBYTE = 8;
+
+   unsigned long long mem_start;
+   unsigned long long mem_size;
+
+   if (is_kdump_kernel())
+   return;
+
+   mem_start = (1ULL  24); /* 16MB */
+   mem_size = roundup((roundup(max_pfn, BITSPERBYTE) / BITSPERBYTE),
+   PAGE_SIZE);
+
+   crash_dump_bitmap_mem = memblock_find_in_range(mem_start,
+   MEMBLOCK_ALLOC_ACCESSIBLE, mem_size, PAGE_SIZE);
+
+   if (!crash_dump_bitmap_mem) {
+   pr_err(
+   crash_dump_bitmap: allocate error! size=%lldkB, from=%lldMB\n,
+   mem_size  10, mem_start  20);
+
+   return;
+   }
+
+   crash_dump_bitmap_mem_size = mem_size;
+   memblock_reserve(crash_dump_bitmap_mem, crash_dump_bitmap_mem_size);
+   pr_info(crash_dump_bitmap: bitmap_mem=%lldMB. size=%lldkB\n,
+   (unsigned long long)crash_dump_bitmap_mem  20,
+   mem_size  10);
+
+   crash_dump_bitmap_res.start = crash_dump_bitmap_mem;
+   crash_dump_bitmap_res.end   = crash_dump_bitmap_mem + mem_size - 1;
+   insert_resource(iomem_resource, crash_dump_bitmap_res);
+
+   crash_dump_bitmap_info.version = CRASH_DUMP_BITMAP_VERSION;
+
+   crash_dump_bitmap_info.bitmap = crash_dump_bitmap_mem;
+   crash_dump_bitmap_info.bitmap_size = crash_dump_bitmap_mem_size;
+
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_zero_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_user_pages = 1;
+   crash_dump_bitmap_ctrl.exclude_free_pages = 1;
+
+   pr_info(crash_dump_bitmap: Initialized!\n);
+}
+#else
+static void __init crash_dump_bitmap_init(void)
+{
+}
+#endif
+
 static struct resource standard_io_resources[] = {
{ .name = dma1, .start = 0x00, .end = 0x1f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
@@ -1094,6 +1151,8 @@ void __init setup_arch(char **cmdline_p)
 
reserve_crashkernel();
 
+   crash_dump_bitmap_init();
+
vsmp_init();
 
io_delay_init();
diff --git a/include/linux/crash_dump_bitmap.h 
b/include/linux/crash_dump_bitmap.h
new file mode 100644
index 000..63b1264
--- /dev/null
+++ b/include/linux/crash_dump_bitmap.h
@@ -0,0 +1,59 @@
+/*
+ *include/linux/crash_dump_bitmap.h
+ *Declaration of crash dump bitmap functions and data structures.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma jingbai...@hp.com
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#ifndef _LINUX_CRASH_DUMP_BITMAP_H
+#define _LINUX_CRASH_DUMP_BITMAP_H
+
+#define CRASH_DUMP_BITMAP_VERSION 1;
+
+enum {
+   CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES = 1,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES = 2,
+   CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES = 4,
+   CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES = 8,
+   CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES = 16
+};
+
+struct crash_dump_bitmap_ctrl {
+   char exclude_crash_dump_bitmap_pages;
+   char exclude_zero_pages;/* only for tracking dump level */
+   char exclude_cache_pages;
+   char exclude_cache_private_pages;
+   char exclude_user_pages;
+   char exclude_free_pages;
+};
+
+struct crash_dump_bitmap_info {
+   unsigned int version;
+   phys_addr_t bitmap;
+   phys_addr_t bitmap_size;
+   unsigned long

[RFC PATCH 4/5] crash dump bitmap: add a proc interface for crash dump bitmap

2013-03-07 Thread Jingbai Ma

Add a procfs driver for selecting exclude pages in userspace.
/proc/crash_dump_bitmap/

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 fs/proc/Makefile|1 
 fs/proc/crash_dump_bitmap.c |  221 +++
 2 files changed, 222 insertions(+), 0 deletions(-)
 create mode 100644 fs/proc/crash_dump_bitmap.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 712f24d..2dfcff1 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -27,6 +27,7 @@ proc-$(CONFIG_PROC_SYSCTL)+= proc_sysctl.o
 proc-$(CONFIG_NET) += proc_net.o
 proc-$(CONFIG_PROC_KCORE)  += kcore.o
 proc-$(CONFIG_PROC_VMCORE) += vmcore.o
+proc-$(CONFIG_CRASH_DUMP_BITMAP)   += crash_dump_bitmap.o
 proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
 proc-$(CONFIG_PRINTK)  += kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)   += page.o
diff --git a/fs/proc/crash_dump_bitmap.c b/fs/proc/crash_dump_bitmap.c
new file mode 100644
index 000..77ecaae
--- /dev/null
+++ b/fs/proc/crash_dump_bitmap.c
@@ -0,0 +1,221 @@
+/*
+ *fs/proc/crash_dump_bitmap.c
+ *Interface for controlling the crash dump bitmap from user space.
+ *
+ *(C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *Author: Jingbai Ma jingbai...@hp.com
+ *
+ *This program is free software; you can redistribute it and/or modify
+ *it under the terms of version 2 of the GNU General Public License as
+ *published by the Free Software Foundation.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ *General Public License for more details.
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/proc_fs.h
+#include linux/seq_file.h
+#include linux/jiffies.h
+#include linux/crash_dump.h
+#include linux/crash_dump_bitmap.h
+
+#ifdef CONFIG_CRASH_DUMP_BITMAP
+
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Jingbai Ma jingbai...@hp.com);
+MODULE_DESCRIPTION(Crash dump bitmap support driver);
+
+static const char *proc_dir_name = crash_dump_bitmap;
+static const char *proc_page_status_name = page_status;
+static const char *proc_dump_level_name = dump_level;
+
+static struct proc_dir_entry *proc_dir, *proc_page_status, *proc_dump_level;
+
+static unsigned int get_dump_level(void)
+{
+   unsigned int dump_level;
+
+   dump_level = crash_dump_bitmap_ctrl.exclude_zero_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_cache_private_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_user_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES : 0;
+   dump_level |= crash_dump_bitmap_ctrl.exclude_free_pages
+   ? CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES : 0;
+
+   return dump_level;
+}
+
+static void set_dump_level(unsigned int dump_level)
+{
+   crash_dump_bitmap_ctrl.exclude_zero_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_ZERO_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_CACHE_PRIVATE_PAGES)
+? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_user_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_USER_PAGES) ? 1 : 0;
+   crash_dump_bitmap_ctrl.exclude_free_pages =
+   (dump_level  CRASH_DUMP_LEVEL_EXCLUDE_FREE_PAGES) ? 1 : 0;
+}
+
+static int proc_page_status_show(struct seq_file *m, void *v)
+{
+   u64 start, duration;
+
+   if (!crash_dump_bitmap_mem) {
+   seq_printf(m,
+   crash_dump_bitmap: crash_dump_bitmap_mem not 
found!\n);
+
+   return -EINVAL;
+   }
+
+   seq_printf(m, Exclude page flag status:\n);
+   seq_printf(m, exclude_dump_bitmap_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_crash_dump_bitmap_pages);
+   seq_printf(m, exclude_zero_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_zero_pages);
+   seq_printf(m, exclude_cache_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_cache_pages);
+   seq_printf(m, exclude_cache_private_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_cache_private_pages);
+   seq_printf(m, exclude_user_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_user_pages);
+   seq_printf(m, exclude_free_pages=%d\n,
+   crash_dump_bitmap_ctrl.exclude_free_pages);
+
+   seq_printf(m, Scanning all memory pages:\n);
+   start = get_jiffies_64

[RFC PATCH 5/5] crash dump bitmap: workaround for kernel 3.9-rc1 kdump issue

2013-03-07 Thread Jingbai Ma

Linux kernel 3.9-rc1 allows crashkernel above 4GB, but current
kexec-tools doesn't support it yet.
This patch is only a workaround to make kdump work again.
This patch should be removed after kexec-tools 2.0.4 release.

Signed-off-by: Jingbai Ma jingbai...@hp.com
---
 arch/x86/kernel/setup.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 165c831..15321d6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -506,7 +506,8 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 #ifdef CONFIG_X86_32
 # define CRASH_KERNEL_ADDR_MAX (512  20)
 #else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+/* # define CRASH_KERNEL_ADDR_MAX  MAXMEM */
+# define CRASH_KERNEL_ADDR_MAX (896  20)
 #endif
 
 static void __init reserve_crashkernel_low(void)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

64 matches

Mail list logo