Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread Atsushi Kumagai
On 2013/12/03 18:06:13, kexec  wrote:
> >> This is a suggestion from different point of view...
> >>
> >> In general, data on crash dump can be corrupted. Thus, order contained in 
> >> a page
> >> descriptor can also be corrupted. For example, if the corrupted value were 
> >> a huge
> >> number, wide range of pages after buddy page would be filtered falsely.
> >>
> >> So, actually we should sanity check data in crash dump before using them 
> >> for application
> >> level feature. I've picked up order contained in page descriptor, so there 
> >> would be other
> >> data used in makedumpfile that are not checked.
> > 
> > What you said is reasonable, but how will you do such sanity check ?
> > Certain standard values are necessary for sanity check, how will
> > you prepare such values ?
> > (Get them from kernel source and hard-code them in makedumpfile ?)
> > 
> >> Unlike diskdump, we no longer need to care about kernel/hardware level 
> >> data integrity
> >> outside of user-land, but we still care about data its own integrity.
> >>
> >> On the other hand, if we do it, we might face some difficulty, for 
> >> example, hardness of
> >> maintenance or performance bottleneck; it might be the reason why we don't 
> >> see sanity
> >> check in makedumpfile now.
> > 
> > There are many values which should be checked, e.g. page.flags, page._count,
> > page.mapping, list_head.next and so on.
> > If we introduce sanity check for them, the issues you mentioned will be 
> > appear
> > distinctly.
> > 
> > So I think makedumpfile has to trust crash dump in practice.
> > 
> 
> Yes, I don't mean such very drastic checking; I understand hardness because I 
> often
> handle/write this kind of code; I don't want to fight tremendously many 
> dependencies...
> 
> So we need to concentrate on things that can affect makedumpfile's behavior 
> significantly,
> e.g. infinite loop caused by broken linked list objects, buffer overrun 
> cauesd by large values
> from broken data, etc. We should be able to deal with them by carefully 
> handling
> dump data against makedumpfile's runtime data structure, e.g., buffer size.
> 
> Is it OK to consider this is a policy of makedumpfile for data corruption?

Right. 
Of course, if there is a very simple and effective check for a dump data, 
then we can take it.


Thanks
Atsushi Kumagai

> -- 
> Thanks.
> HATAYAMA, Daisuke
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread HATAYAMA Daisuke
(2013/12/03 17:05), Atsushi Kumagai wrote:
> On 2013/11/29 13:57:21, kexec  wrote:
>> (2013/11/29 13:23), Atsushi Kumagai wrote:
>>> On 2013/11/29 12:24:45, kexec  wrote:
 (2013/11/29 12:02), Atsushi Kumagai wrote:
> On 2013/11/28 16:50:21, kexec  wrote:
 ping, in case you overlooked this...
>>>
>>> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
>>>
>>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
>>> as you said. In addition, I'm considering other way to address such 
>>> case,
>>> that is to bring the number of "overflowed pages" to the next cycle and
>>> exclude them at the top of __exclude_unnecessary_pages() like below:
>>>
>>>/*
>>> * The pages which should be excluded still remain.
>>> */
>>>if (remainder >= 1) {
>>>int i;
>>>unsigned long tmp;
>>>for (i = 0; i < remainder; ++i) {
>>>if 
>>> (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
>>>pfn_user++;
>>>tmp++;
>>>}
>>>}
>>>pfn += tmp;
>>>remainder -= tmp;
>>>mem_map += (tmp - 1) * SIZE(page);
>>>continue;
>>>}
>>>
>>> If this way works well, then aligning info->buf_size_cyclic will be
>>> unnecessary.
>>>
>>
>> I selected the current implementation of changing cyclic buffer size 
>> becuase
>> I thought it was simpler than carrying over remaining filtered pages to 
>> next cycle
>> in that there was no need to add extra code in filtering processing.
>>
>> I guess the reason why you think this is better now is how to detect 
>> maximum order of
>> huge page is hard in some way, right?
>
> The maximum order will be gotten from HUGETLB_PAGE_ORDER or 
> HPAGE_PMD_ORDER,
> so I don't say it's hard. However, the carrying over method doesn't 
> depend on
> such kernel symbols, so I think it's robuster.
>

 Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
 of free page
 filtering in __exclude_unnecessary_pages(). Could you do that too?
>>>
>>> Sure, I'll modify it too.
>>>
>>
>> This is a suggestion from different point of view...
>>
>> In general, data on crash dump can be corrupted. Thus, order contained in a 
>> page
>> descriptor can also be corrupted. For example, if the corrupted value were a 
>> huge
>> number, wide range of pages after buddy page would be filtered falsely.
>>
>> So, actually we should sanity check data in crash dump before using them for 
>> application
>> level feature. I've picked up order contained in page descriptor, so there 
>> would be other
>> data used in makedumpfile that are not checked.
> 
> What you said is reasonable, but how will you do such sanity check ?
> Certain standard values are necessary for sanity check, how will
> you prepare such values ?
> (Get them from kernel source and hard-code them in makedumpfile ?)
> 
>> Unlike diskdump, we no longer need to care about kernel/hardware level data 
>> integrity
>> outside of user-land, but we still care about data its own integrity.
>>
>> On the other hand, if we do it, we might face some difficulty, for example, 
>> hardness of
>> maintenance or performance bottleneck; it might be the reason why we don't 
>> see sanity
>> check in makedumpfile now.
> 
> There are many values which should be checked, e.g. page.flags, page._count,
> page.mapping, list_head.next and so on.
> If we introduce sanity check for them, the issues you mentioned will be appear
> distinctly.
> 
> So I think makedumpfile has to trust crash dump in practice.
> 

Yes, I don't mean such very drastic checking; I understand hardness because I 
often
handle/write this kind of code; I don't want to fight tremendously many 
dependencies...

So we need to concentrate on things that can affect makedumpfile's behavior 
significantly,
e.g. infinite loop caused by broken linked list objects, buffer overrun cauesd 
by large values
from broken data, etc. We should be able to deal with them by carefully handling
dump data against makedumpfile's runtime data structure, e.g., buffer size.

Is it OK to consider this is a policy of makedumpfile for data corruption?

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread Atsushi Kumagai
On 2013/11/29 13:57:21, kexec  wrote:
> (2013/11/29 13:23), Atsushi Kumagai wrote:
> > On 2013/11/29 12:24:45, kexec  wrote:
> >> (2013/11/29 12:02), Atsushi Kumagai wrote:
> >>> On 2013/11/28 16:50:21, kexec  wrote:
> >> ping, in case you overlooked this...
> >
> > Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> >
> > Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> > as you said. In addition, I'm considering other way to address such 
> > case,
> > that is to bring the number of "overflowed pages" to the next cycle and
> > exclude them at the top of __exclude_unnecessary_pages() like below:
> >
> >   /*
> >* The pages which should be excluded still remain.
> >*/
> >   if (remainder >= 1) {
> >   int i;
> >   unsigned long tmp;
> >   for (i = 0; i < remainder; ++i) {
> >   if 
> > (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
> >   pfn_user++;
> >   tmp++;
> >   }
> >   }
> >   pfn += tmp;
> >   remainder -= tmp;
> >   mem_map += (tmp - 1) * SIZE(page);
> >   continue;
> >   }
> >
> > If this way works well, then aligning info->buf_size_cyclic will be
> > unnecessary.
> >
> 
>  I selected the current implementation of changing cyclic buffer size 
>  becuase
>  I thought it was simpler than carrying over remaining filtered pages to 
>  next cycle
>  in that there was no need to add extra code in filtering processing.
> 
>  I guess the reason why you think this is better now is how to detect 
>  maximum order of
>  huge page is hard in some way, right?
> >>>
> >>> The maximum order will be gotten from HUGETLB_PAGE_ORDER or 
> >>> HPAGE_PMD_ORDER,
> >>> so I don't say it's hard. However, the carrying over method doesn't 
> >>> depend on
> >>> such kernel symbols, so I think it's robuster.
> >>>
> >>
> >> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
> >> of free page
> >> filtering in __exclude_unnecessary_pages(). Could you do that too?
> >
> > Sure, I'll modify it too.
> >
>
> This is a suggestion from different point of view...
>
> In general, data on crash dump can be corrupted. Thus, order contained in a 
> page
> descriptor can also be corrupted. For example, if the corrupted value were a 
> huge
> number, wide range of pages after buddy page would be filtered falsely.
>
> So, actually we should sanity check data in crash dump before using them for 
> application
> level feature. I've picked up order contained in page descriptor, so there 
> would be other
> data used in makedumpfile that are not checked.

What you said is reasonable, but how will you do such sanity check ?
Certain standard values are necessary for sanity check, how will
you prepare such values ?
(Get them from kernel source and hard-code them in makedumpfile ?)

> Unlike diskdump, we no longer need to care about kernel/hardware level data 
> integrity
> outside of user-land, but we still care about data its own integrity.
>
> On the other hand, if we do it, we might face some difficulty, for example, 
> hardness of
> maintenance or performance bottleneck; it might be the reason why we don't 
> see sanity
> check in makedumpfile now.

There are many values which should be checked, e.g. page.flags, page._count,
page.mapping, list_head.next and so on.
If we introduce sanity check for them, the issues you mentioned will be appear
distinctly.

So I think makedumpfile has to trust crash dump in practice.


Thanks
Atsushi Kumagai

> --
> Thanks.
> HATAYAMA, Daisuke
>
>
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread Atsushi Kumagai
On 2013/11/29 13:57:21, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/29 13:23), Atsushi Kumagai wrote:
  On 2013/11/29 12:24:45, kexec kexec-boun...@lists.infradead.org wrote:
  (2013/11/29 12:02), Atsushi Kumagai wrote:
  On 2013/11/28 16:50:21, kexec kexec-boun...@lists.infradead.org wrote:
  ping, in case you overlooked this...
 
  Sorry for the delayed response, I prioritize the release of v1.5.5 now.
 
  Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
  as you said. In addition, I'm considering other way to address such 
  case,
  that is to bring the number of overflowed pages to the next cycle and
  exclude them at the top of __exclude_unnecessary_pages() like below:
 
/*
 * The pages which should be excluded still remain.
 */
if (remainder = 1) {
int i;
unsigned long tmp;
for (i = 0; i  remainder; ++i) {
if 
  (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
pfn_user++;
tmp++;
}
}
pfn += tmp;
remainder -= tmp;
mem_map += (tmp - 1) * SIZE(page);
continue;
}
 
  If this way works well, then aligning info-buf_size_cyclic will be
  unnecessary.
 
 
  I selected the current implementation of changing cyclic buffer size 
  becuase
  I thought it was simpler than carrying over remaining filtered pages to 
  next cycle
  in that there was no need to add extra code in filtering processing.
 
  I guess the reason why you think this is better now is how to detect 
  maximum order of
  huge page is hard in some way, right?
 
  The maximum order will be gotten from HUGETLB_PAGE_ORDER or 
  HPAGE_PMD_ORDER,
  so I don't say it's hard. However, the carrying over method doesn't 
  depend on
  such kernel symbols, so I think it's robuster.
 
 
  Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
  of free page
  filtering in __exclude_unnecessary_pages(). Could you do that too?
 
  Sure, I'll modify it too.
 

 This is a suggestion from different point of view...

 In general, data on crash dump can be corrupted. Thus, order contained in a 
 page
 descriptor can also be corrupted. For example, if the corrupted value were a 
 huge
 number, wide range of pages after buddy page would be filtered falsely.

 So, actually we should sanity check data in crash dump before using them for 
 application
 level feature. I've picked up order contained in page descriptor, so there 
 would be other
 data used in makedumpfile that are not checked.

What you said is reasonable, but how will you do such sanity check ?
Certain standard values are necessary for sanity check, how will
you prepare such values ?
(Get them from kernel source and hard-code them in makedumpfile ?)

 Unlike diskdump, we no longer need to care about kernel/hardware level data 
 integrity
 outside of user-land, but we still care about data its own integrity.

 On the other hand, if we do it, we might face some difficulty, for example, 
 hardness of
 maintenance or performance bottleneck; it might be the reason why we don't 
 see sanity
 check in makedumpfile now.

There are many values which should be checked, e.g. page.flags, page._count,
page.mapping, list_head.next and so on.
If we introduce sanity check for them, the issues you mentioned will be appear
distinctly.

So I think makedumpfile has to trust crash dump in practice.


Thanks
Atsushi Kumagai

 --
 Thanks.
 HATAYAMA, Daisuke


 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread HATAYAMA Daisuke
(2013/12/03 17:05), Atsushi Kumagai wrote:
 On 2013/11/29 13:57:21, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/29 13:23), Atsushi Kumagai wrote:
 On 2013/11/29 12:24:45, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/29 12:02), Atsushi Kumagai wrote:
 On 2013/11/28 16:50:21, kexec kexec-boun...@lists.infradead.org wrote:
 ping, in case you overlooked this...

 Sorry for the delayed response, I prioritize the release of v1.5.5 now.

 Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
 as you said. In addition, I'm considering other way to address such 
 case,
 that is to bring the number of overflowed pages to the next cycle and
 exclude them at the top of __exclude_unnecessary_pages() like below:

/*
 * The pages which should be excluded still remain.
 */
if (remainder = 1) {
int i;
unsigned long tmp;
for (i = 0; i  remainder; ++i) {
if 
 (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
pfn_user++;
tmp++;
}
}
pfn += tmp;
remainder -= tmp;
mem_map += (tmp - 1) * SIZE(page);
continue;
}

 If this way works well, then aligning info-buf_size_cyclic will be
 unnecessary.


 I selected the current implementation of changing cyclic buffer size 
 becuase
 I thought it was simpler than carrying over remaining filtered pages to 
 next cycle
 in that there was no need to add extra code in filtering processing.

 I guess the reason why you think this is better now is how to detect 
 maximum order of
 huge page is hard in some way, right?

 The maximum order will be gotten from HUGETLB_PAGE_ORDER or 
 HPAGE_PMD_ORDER,
 so I don't say it's hard. However, the carrying over method doesn't 
 depend on
 such kernel symbols, so I think it's robuster.


 Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
 of free page
 filtering in __exclude_unnecessary_pages(). Could you do that too?

 Sure, I'll modify it too.


 This is a suggestion from different point of view...

 In general, data on crash dump can be corrupted. Thus, order contained in a 
 page
 descriptor can also be corrupted. For example, if the corrupted value were a 
 huge
 number, wide range of pages after buddy page would be filtered falsely.

 So, actually we should sanity check data in crash dump before using them for 
 application
 level feature. I've picked up order contained in page descriptor, so there 
 would be other
 data used in makedumpfile that are not checked.
 
 What you said is reasonable, but how will you do such sanity check ?
 Certain standard values are necessary for sanity check, how will
 you prepare such values ?
 (Get them from kernel source and hard-code them in makedumpfile ?)
 
 Unlike diskdump, we no longer need to care about kernel/hardware level data 
 integrity
 outside of user-land, but we still care about data its own integrity.

 On the other hand, if we do it, we might face some difficulty, for example, 
 hardness of
 maintenance or performance bottleneck; it might be the reason why we don't 
 see sanity
 check in makedumpfile now.
 
 There are many values which should be checked, e.g. page.flags, page._count,
 page.mapping, list_head.next and so on.
 If we introduce sanity check for them, the issues you mentioned will be appear
 distinctly.
 
 So I think makedumpfile has to trust crash dump in practice.
 

Yes, I don't mean such very drastic checking; I understand hardness because I 
often
handle/write this kind of code; I don't want to fight tremendously many 
dependencies...

So we need to concentrate on things that can affect makedumpfile's behavior 
significantly,
e.g. infinite loop caused by broken linked list objects, buffer overrun cauesd 
by large values
from broken data, etc. We should be able to deal with them by carefully handling
dump data against makedumpfile's runtime data structure, e.g., buffer size.

Is it OK to consider this is a policy of makedumpfile for data corruption?

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread Atsushi Kumagai
On 2013/12/03 18:06:13, kexec kexec-boun...@lists.infradead.org wrote:
  This is a suggestion from different point of view...
 
  In general, data on crash dump can be corrupted. Thus, order contained in 
  a page
  descriptor can also be corrupted. For example, if the corrupted value were 
  a huge
  number, wide range of pages after buddy page would be filtered falsely.
 
  So, actually we should sanity check data in crash dump before using them 
  for application
  level feature. I've picked up order contained in page descriptor, so there 
  would be other
  data used in makedumpfile that are not checked.
  
  What you said is reasonable, but how will you do such sanity check ?
  Certain standard values are necessary for sanity check, how will
  you prepare such values ?
  (Get them from kernel source and hard-code them in makedumpfile ?)
  
  Unlike diskdump, we no longer need to care about kernel/hardware level 
  data integrity
  outside of user-land, but we still care about data its own integrity.
 
  On the other hand, if we do it, we might face some difficulty, for 
  example, hardness of
  maintenance or performance bottleneck; it might be the reason why we don't 
  see sanity
  check in makedumpfile now.
  
  There are many values which should be checked, e.g. page.flags, page._count,
  page.mapping, list_head.next and so on.
  If we introduce sanity check for them, the issues you mentioned will be 
  appear
  distinctly.
  
  So I think makedumpfile has to trust crash dump in practice.
  
 
 Yes, I don't mean such very drastic checking; I understand hardness because I 
 often
 handle/write this kind of code; I don't want to fight tremendously many 
 dependencies...
 
 So we need to concentrate on things that can affect makedumpfile's behavior 
 significantly,
 e.g. infinite loop caused by broken linked list objects, buffer overrun 
 cauesd by large values
 from broken data, etc. We should be able to deal with them by carefully 
 handling
 dump data against makedumpfile's runtime data structure, e.g., buffer size.
 
 Is it OK to consider this is a policy of makedumpfile for data corruption?

Right. 
Of course, if there is a very simple and effective check for a dump data, 
then we can take it.


Thanks
Atsushi Kumagai

 -- 
 Thanks.
 HATAYAMA, Daisuke
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread HATAYAMA Daisuke
(2013/11/29 13:23), Atsushi Kumagai wrote:
> On 2013/11/29 12:24:45, kexec  wrote:
>> (2013/11/29 12:02), Atsushi Kumagai wrote:
>>> On 2013/11/28 16:50:21, kexec  wrote:
>> ping, in case you overlooked this...
>
> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
>
> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> as you said. In addition, I'm considering other way to address such case,
> that is to bring the number of "overflowed pages" to the next cycle and
> exclude them at the top of __exclude_unnecessary_pages() like below:
>
>   /*
>* The pages which should be excluded still remain.
>*/
>   if (remainder >= 1) {
>   int i;
>   unsigned long tmp;
>   for (i = 0; i < remainder; ++i) {
>   if 
> (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
>   pfn_user++;
>   tmp++;
>   }
>   }
>   pfn += tmp;
>   remainder -= tmp;
>   mem_map += (tmp - 1) * SIZE(page);
>   continue;
>   }
>
> If this way works well, then aligning info->buf_size_cyclic will be
> unnecessary.
>

 I selected the current implementation of changing cyclic buffer size 
 becuase
 I thought it was simpler than carrying over remaining filtered pages to 
 next cycle
 in that there was no need to add extra code in filtering processing.

 I guess the reason why you think this is better now is how to detect 
 maximum order of
 huge page is hard in some way, right?
>>>
>>> The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
>>> so I don't say it's hard. However, the carrying over method doesn't depend 
>>> on
>>> such kernel symbols, so I think it's robuster.
>>>
>>
>> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
>> of free page
>> filtering in __exclude_unnecessary_pages(). Could you do that too?
> 
> Sure, I'll modify it too.
> 

This is a suggestion from different point of view...

In general, data on crash dump can be corrupted. Thus, order contained in a page
descriptor can also be corrupted. For example, if the corrupted value were a 
huge
number, wide range of pages after buddy page would be filtered falsely.

So, actually we should sanity check data in crash dump before using them for 
application
level feature. I've picked up order contained in page descriptor, so there 
would be other
data used in makedumpfile that are not checked.

Unlike diskdump, we no longer need to care about kernel/hardware level data 
integrity
outside of user-land, but we still care about data its own integrity.

On the other hand, if we do it, we might face some difficulty, for example, 
hardness of
maintenance or performance bottleneck; it might be the reason why we don't see 
sanity
check in makedumpfile now.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread Atsushi Kumagai
On 2013/11/29 12:24:45, kexec  wrote:
> (2013/11/29 12:02), Atsushi Kumagai wrote:
> > On 2013/11/28 16:50:21, kexec  wrote:
>  ping, in case you overlooked this...
> >>>
> >>> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> >>>
> >>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> >>> as you said. In addition, I'm considering other way to address such case,
> >>> that is to bring the number of "overflowed pages" to the next cycle and
> >>> exclude them at the top of __exclude_unnecessary_pages() like below:
> >>>
> >>>  /*
> >>>   * The pages which should be excluded still remain.
> >>>   */
> >>>  if (remainder >= 1) {
> >>>  int i;
> >>>  unsigned long tmp;
> >>>  for (i = 0; i < remainder; ++i) {
> >>>  if 
> >>> (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
> >>>  pfn_user++;
> >>>  tmp++;
> >>>  }
> >>>  }
> >>>  pfn += tmp;
> >>>  remainder -= tmp;
> >>>  mem_map += (tmp - 1) * SIZE(page);
> >>>  continue;
> >>>  }
> >>>
> >>> If this way works well, then aligning info->buf_size_cyclic will be
> >>> unnecessary.
> >>>
> >>
> >> I selected the current implementation of changing cyclic buffer size 
> >> becuase
> >> I thought it was simpler than carrying over remaining filtered pages to 
> >> next cycle
> >> in that there was no need to add extra code in filtering processing.
> >>
> >> I guess the reason why you think this is better now is how to detect 
> >> maximum order of
> >> huge page is hard in some way, right?
> > 
> > The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
> > so I don't say it's hard. However, the carrying over method doesn't depend 
> > on
> > such kernel symbols, so I think it's robuster.
> > 
> 
> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of 
> free page
> filtering in __exclude_unnecessary_pages(). Could you do that too?

Sure, I'll modify it too.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread HATAYAMA Daisuke
(2013/11/29 12:02), Atsushi Kumagai wrote:
> On 2013/11/28 16:50:21, kexec  wrote:
 ping, in case you overlooked this...
>>>
>>> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
>>>
>>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
>>> as you said. In addition, I'm considering other way to address such case,
>>> that is to bring the number of "overflowed pages" to the next cycle and
>>> exclude them at the top of __exclude_unnecessary_pages() like below:
>>>
>>>  /*
>>>   * The pages which should be excluded still remain.
>>>   */
>>>  if (remainder >= 1) {
>>>  int i;
>>>  unsigned long tmp;
>>>  for (i = 0; i < remainder; ++i) {
>>>  if (clear_bit_on_2nd_bitmap_for_kernel(pfn 
>>> + i)) {
>>>  pfn_user++;
>>>  tmp++;
>>>  }
>>>  }
>>>  pfn += tmp;
>>>  remainder -= tmp;
>>>  mem_map += (tmp - 1) * SIZE(page);
>>>  continue;
>>>  }
>>>
>>> If this way works well, then aligning info->buf_size_cyclic will be
>>> unnecessary.
>>>
>>
>> I selected the current implementation of changing cyclic buffer size becuase
>> I thought it was simpler than carrying over remaining filtered pages to next 
>> cycle
>> in that there was no need to add extra code in filtering processing.
>>
>> I guess the reason why you think this is better now is how to detect maximum 
>> order of
>> huge page is hard in some way, right?
> 
> The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
> so I don't say it's hard. However, the carrying over method doesn't depend on
> such kernel symbols, so I think it's robuster.
> 

Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of 
free page
filtering in __exclude_unnecessary_pages(). Could you do that too?

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread Atsushi Kumagai
On 2013/11/28 16:50:21, kexec  wrote:
> >> ping, in case you overlooked this...
> >
> > Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> >
> > Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> > as you said. In addition, I'm considering other way to address such case,
> > that is to bring the number of "overflowed pages" to the next cycle and
> > exclude them at the top of __exclude_unnecessary_pages() like below:
> >
> > /*
> >  * The pages which should be excluded still remain.
> >  */
> > if (remainder >= 1) {
> > int i;
> > unsigned long tmp;
> > for (i = 0; i < remainder; ++i) {
> > if (clear_bit_on_2nd_bitmap_for_kernel(pfn 
> > + i)) {
> > pfn_user++;
> > tmp++;
> > }
> > }
> > pfn += tmp;
> > remainder -= tmp;
> > mem_map += (tmp - 1) * SIZE(page);
> > continue;
> > }
> >
> > If this way works well, then aligning info->buf_size_cyclic will be
> > unnecessary.
> >
>
> I selected the current implementation of changing cyclic buffer size becuase
> I thought it was simpler than carrying over remaining filtered pages to next 
> cycle
> in that there was no need to add extra code in filtering processing.
>
> I guess the reason why you think this is better now is how to detect maximum 
> order of
> huge page is hard in some way, right?

The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
so I don't say it's hard. However, the carrying over method doesn't depend on
such kernel symbols, so I think it's robuster.


Thanks
Atsushi Kumagai

> --
> Thanks.
> HATAYAMA, Daisuke
>
>
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread Atsushi Kumagai
On 2013/11/28 16:50:21, kexec kexec-boun...@lists.infradead.org wrote:
  ping, in case you overlooked this...
 
  Sorry for the delayed response, I prioritize the release of v1.5.5 now.
 
  Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
  as you said. In addition, I'm considering other way to address such case,
  that is to bring the number of overflowed pages to the next cycle and
  exclude them at the top of __exclude_unnecessary_pages() like below:
 
  /*
   * The pages which should be excluded still remain.
   */
  if (remainder = 1) {
  int i;
  unsigned long tmp;
  for (i = 0; i  remainder; ++i) {
  if (clear_bit_on_2nd_bitmap_for_kernel(pfn 
  + i)) {
  pfn_user++;
  tmp++;
  }
  }
  pfn += tmp;
  remainder -= tmp;
  mem_map += (tmp - 1) * SIZE(page);
  continue;
  }
 
  If this way works well, then aligning info-buf_size_cyclic will be
  unnecessary.
 

 I selected the current implementation of changing cyclic buffer size becuase
 I thought it was simpler than carrying over remaining filtered pages to next 
 cycle
 in that there was no need to add extra code in filtering processing.

 I guess the reason why you think this is better now is how to detect maximum 
 order of
 huge page is hard in some way, right?

The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
so I don't say it's hard. However, the carrying over method doesn't depend on
such kernel symbols, so I think it's robuster.


Thanks
Atsushi Kumagai

 --
 Thanks.
 HATAYAMA, Daisuke


 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread HATAYAMA Daisuke
(2013/11/29 12:02), Atsushi Kumagai wrote:
 On 2013/11/28 16:50:21, kexec kexec-boun...@lists.infradead.org wrote:
 ping, in case you overlooked this...

 Sorry for the delayed response, I prioritize the release of v1.5.5 now.

 Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
 as you said. In addition, I'm considering other way to address such case,
 that is to bring the number of overflowed pages to the next cycle and
 exclude them at the top of __exclude_unnecessary_pages() like below:

  /*
   * The pages which should be excluded still remain.
   */
  if (remainder = 1) {
  int i;
  unsigned long tmp;
  for (i = 0; i  remainder; ++i) {
  if (clear_bit_on_2nd_bitmap_for_kernel(pfn 
 + i)) {
  pfn_user++;
  tmp++;
  }
  }
  pfn += tmp;
  remainder -= tmp;
  mem_map += (tmp - 1) * SIZE(page);
  continue;
  }

 If this way works well, then aligning info-buf_size_cyclic will be
 unnecessary.


 I selected the current implementation of changing cyclic buffer size becuase
 I thought it was simpler than carrying over remaining filtered pages to next 
 cycle
 in that there was no need to add extra code in filtering processing.

 I guess the reason why you think this is better now is how to detect maximum 
 order of
 huge page is hard in some way, right?
 
 The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
 so I don't say it's hard. However, the carrying over method doesn't depend on
 such kernel symbols, so I think it's robuster.
 

Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of 
free page
filtering in __exclude_unnecessary_pages(). Could you do that too?

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread Atsushi Kumagai
On 2013/11/29 12:24:45, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/29 12:02), Atsushi Kumagai wrote:
  On 2013/11/28 16:50:21, kexec kexec-boun...@lists.infradead.org wrote:
  ping, in case you overlooked this...
 
  Sorry for the delayed response, I prioritize the release of v1.5.5 now.
 
  Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
  as you said. In addition, I'm considering other way to address such case,
  that is to bring the number of overflowed pages to the next cycle and
  exclude them at the top of __exclude_unnecessary_pages() like below:
 
   /*
* The pages which should be excluded still remain.
*/
   if (remainder = 1) {
   int i;
   unsigned long tmp;
   for (i = 0; i  remainder; ++i) {
   if 
  (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
   pfn_user++;
   tmp++;
   }
   }
   pfn += tmp;
   remainder -= tmp;
   mem_map += (tmp - 1) * SIZE(page);
   continue;
   }
 
  If this way works well, then aligning info-buf_size_cyclic will be
  unnecessary.
 
 
  I selected the current implementation of changing cyclic buffer size 
  becuase
  I thought it was simpler than carrying over remaining filtered pages to 
  next cycle
  in that there was no need to add extra code in filtering processing.
 
  I guess the reason why you think this is better now is how to detect 
  maximum order of
  huge page is hard in some way, right?
  
  The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
  so I don't say it's hard. However, the carrying over method doesn't depend 
  on
  such kernel symbols, so I think it's robuster.
  
 
 Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of 
 free page
 filtering in __exclude_unnecessary_pages(). Could you do that too?

Sure, I'll modify it too.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread HATAYAMA Daisuke
(2013/11/29 13:23), Atsushi Kumagai wrote:
 On 2013/11/29 12:24:45, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/29 12:02), Atsushi Kumagai wrote:
 On 2013/11/28 16:50:21, kexec kexec-boun...@lists.infradead.org wrote:
 ping, in case you overlooked this...

 Sorry for the delayed response, I prioritize the release of v1.5.5 now.

 Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
 as you said. In addition, I'm considering other way to address such case,
 that is to bring the number of overflowed pages to the next cycle and
 exclude them at the top of __exclude_unnecessary_pages() like below:

   /*
* The pages which should be excluded still remain.
*/
   if (remainder = 1) {
   int i;
   unsigned long tmp;
   for (i = 0; i  remainder; ++i) {
   if 
 (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
   pfn_user++;
   tmp++;
   }
   }
   pfn += tmp;
   remainder -= tmp;
   mem_map += (tmp - 1) * SIZE(page);
   continue;
   }

 If this way works well, then aligning info-buf_size_cyclic will be
 unnecessary.


 I selected the current implementation of changing cyclic buffer size 
 becuase
 I thought it was simpler than carrying over remaining filtered pages to 
 next cycle
 in that there was no need to add extra code in filtering processing.

 I guess the reason why you think this is better now is how to detect 
 maximum order of
 huge page is hard in some way, right?

 The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
 so I don't say it's hard. However, the carrying over method doesn't depend 
 on
 such kernel symbols, so I think it's robuster.


 Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
 of free page
 filtering in __exclude_unnecessary_pages(). Could you do that too?
 
 Sure, I'll modify it too.
 

This is a suggestion from different point of view...

In general, data on crash dump can be corrupted. Thus, order contained in a page
descriptor can also be corrupted. For example, if the corrupted value were a 
huge
number, wide range of pages after buddy page would be filtered falsely.

So, actually we should sanity check data in crash dump before using them for 
application
level feature. I've picked up order contained in page descriptor, so there 
would be other
data used in makedumpfile that are not checked.

Unlike diskdump, we no longer need to care about kernel/hardware level data 
integrity
outside of user-land, but we still care about data its own integrity.

On the other hand, if we do it, we might face some difficulty, for example, 
hardness of
maintenance or performance bottleneck; it might be the reason why we don't see 
sanity
check in makedumpfile now.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-27 Thread HATAYAMA Daisuke
(2013/11/28 16:08), Atsushi Kumagai wrote:
> On 2013/11/22 16:18:20, kexec  wrote:
>> (2013/11/07 9:54), HATAYAMA Daisuke wrote:
>>> (2013/11/06 11:21), Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>> This patch set intend to exclude unnecessary hugepages from vmcore dump 
>> file.
>>
>> This patch requires the kernel patch to export necessary data structures 
>> into
>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>
>> This patch introduce two new dump levels 32 and 64 to exclude all unused 
>> and
>> active hugepages. The level to exclude all unnecessary pages will be 127 
>> now.
>
> Interesting. Why hugepages should be treated any differentely than normal
> pages?
>
> If user asked to filter out free page, then it should be filtered and
> it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.

>>>
 @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 && !isAnon(mapping)) {
 if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 pfn_cache_private++;
 +/*
 + * NOTE: If THP for cache is introduced, the check for
 + *   compound pages is needed here.
 + */
 }
 /*
  * Exclude the data page of the user process.
  */
 -else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
 -&& isAnon(mapping)) {
 -if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 -pfn_user++;
 +else if (info->dump_level & DL_EXCLUDE_USER_DATA) {
 +/*
 + * Exclude the anonnymous pages as user pages.
 + */
 +if (isAnon(mapping)) {
 +if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 +pfn_user++;
 +
 +/*
 + * Check the compound page
 + */
 +if (page_is_hugepage(flags) && compound_order > 0) {
 +int i, nr_pages = 1 << compound_order;
 +
 +for (i = 1; i < nr_pages; ++i) {
 +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
 +pfn_user++;
 +}
 +pfn += nr_pages - 2;
 +mem_map += (nr_pages - 1) * SIZE(page);
 +}
 +}
 +/*
 + * Exclude the hugetlbfs pages as user pages.
 + */
 +else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
 +int i, nr_pages = 1 << compound_order;
 +
 +for (i = 0; i < nr_pages; ++i) {
 +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
 +pfn_user++;
 +}
 +pfn += nr_pages - 1;
 +mem_map += (nr_pages - 1) * SIZE(page);
 +}
 }
 /*
  * Exclude the hwpoison page.
>>>
>>> I'm concerned about the case that filtering is not performed to part of 
>>> mem_map
>>> entries not belonging to the current cyclic range.
>>>
>>> If maximum value of compound_order is larger than maximum value of
>>> CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
>>> ARRAY_LENGTH(zone.free_area),
>>> it's necessary to align info->bufsize_cyclic with larger one in
>>> check_cyclic_buffer_overrun().
>>>
>>
>> ping, in case you overlooked this...
> 
> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> 
> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> as you said. In addition, I'm considering other way to address such case,
> that is to bring the number of "overflowed pages" to the next cycle and
> exclude them at the top of __exclude_unnecessary_pages() like below:
> 
> /*
>  * The pages which should be excluded still remain.
>  */
> if (remainder >= 1) {
> int i;
> unsigned long tmp;
> for (i = 0; i < remainder; ++i) {
> if (clear_bit_on_2nd_bitmap_for_kernel(pfn + 
> i)) {
>   

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-27 Thread Atsushi Kumagai
On 2013/11/22 16:18:20, kexec  wrote:
> (2013/11/07 9:54), HATAYAMA Daisuke wrote:
> > (2013/11/06 11:21), Atsushi Kumagai wrote:
> >> (2013/11/06 5:27), Vivek Goyal wrote:
> >>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>  This patch set intend to exclude unnecessary hugepages from vmcore dump 
>  file.
> 
>  This patch requires the kernel patch to export necessary data structures 
>  into
>  vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>  http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
> 
>  This patch introduce two new dump levels 32 and 64 to exclude all unused 
>  and
>  active hugepages. The level to exclude all unnecessary pages will be 127 
>  now.
> >>>
> >>> Interesting. Why hugepages should be treated any differentely than normal
> >>> pages?
> >>>
> >>> If user asked to filter out free page, then it should be filtered and
> >>> it should not matter whether it is a huge page or not?
> >>
> >> I'm making a RFC patch of hugepages filtering based on such policy.
> >>
> >> I attach the prototype version.
> >> It's able to filter out also THPs, and suitable for cyclic processing
> >> because it depends on mem_map and looking up it can be divided into
> >> cycles. This is the same idea as page_is_buddy().
> >>
> >> So I think it's better.
> >>
> >
> >> @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
> >>&& !isAnon(mapping)) {
> >>if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> >>pfn_cache_private++;
> >> +/*
> >> + * NOTE: If THP for cache is introduced, the check for
> >> + *   compound pages is needed here.
> >> + */
> >>}
> >>/*
> >> * Exclude the data page of the user process.
> >> */
> >> -else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
> >> -&& isAnon(mapping)) {
> >> -if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> >> -pfn_user++;
> >> +else if (info->dump_level & DL_EXCLUDE_USER_DATA) {
> >> +/*
> >> + * Exclude the anonnymous pages as user pages.
> >> + */
> >> +if (isAnon(mapping)) {
> >> +if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> >> +pfn_user++;
> >> +
> >> +/*
> >> + * Check the compound page
> >> + */
> >> +if (page_is_hugepage(flags) && compound_order > 0) {
> >> +int i, nr_pages = 1 << compound_order;
> >> +
> >> +for (i = 1; i < nr_pages; ++i) {
> >> +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
> >> +pfn_user++;
> >> +}
> >> +pfn += nr_pages - 2;
> >> +mem_map += (nr_pages - 1) * SIZE(page);
> >> +}
> >> +}
> >> +/*
> >> + * Exclude the hugetlbfs pages as user pages.
> >> + */
> >> +else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
> >> +int i, nr_pages = 1 << compound_order;
> >> +
> >> +for (i = 0; i < nr_pages; ++i) {
> >> +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
> >> +pfn_user++;
> >> +}
> >> +pfn += nr_pages - 1;
> >> +mem_map += (nr_pages - 1) * SIZE(page);
> >> +}
> >>}
> >>/*
> >> * Exclude the hwpoison page.
> >
> > I'm concerned about the case that filtering is not performed to part of 
> > mem_map
> > entries not belonging to the current cyclic range.
> >
> > If maximum value of compound_order is larger than maximum value of
> > CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
> > ARRAY_LENGTH(zone.free_area),
> > it's necessary to align info->bufsize_cyclic with larger one in
> > check_cyclic_buffer_overrun().
> >
> 
> ping, in case you overlooked this...

Sorry for the delayed response, I prioritize the release of v1.5.5 now.

Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
as you said. In addition, I'm considering other way to address such case,
that is to bring the number of "overflowed pages" to the next cycle and
exclude them at the top of __exclude_unnecessary_pages() like below:

   /*
* The pages which should be excluded still remain.
*/
   if (remainder >= 1) {
   int i;
   unsigned long tmp;
   for (i = 0; i < remainder; ++i) {
   if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) 
{
   pfn_user++;
   tmp++;
  

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-27 Thread Atsushi Kumagai
On 2013/11/22 16:18:20, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/07 9:54), HATAYAMA Daisuke wrote:
  (2013/11/06 11:21), Atsushi Kumagai wrote:
  (2013/11/06 5:27), Vivek Goyal wrote:
  On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
  This patch set intend to exclude unnecessary hugepages from vmcore dump 
  file.
 
  This patch requires the kernel patch to export necessary data structures 
  into
  vmcore: kexec: export hugepage data structure into vmcoreinfo
  http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
 
  This patch introduce two new dump levels 32 and 64 to exclude all unused 
  and
  active hugepages. The level to exclude all unnecessary pages will be 127 
  now.
 
  Interesting. Why hugepages should be treated any differentely than normal
  pages?
 
  If user asked to filter out free page, then it should be filtered and
  it should not matter whether it is a huge page or not?
 
  I'm making a RFC patch of hugepages filtering based on such policy.
 
  I attach the prototype version.
  It's able to filter out also THPs, and suitable for cyclic processing
  because it depends on mem_map and looking up it can be divided into
  cycles. This is the same idea as page_is_buddy().
 
  So I think it's better.
 
 
  @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
  !isAnon(mapping)) {
 if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 pfn_cache_private++;
  +/*
  + * NOTE: If THP for cache is introduced, the check for
  + *   compound pages is needed here.
  + */
 }
 /*
  * Exclude the data page of the user process.
  */
  -else if ((info-dump_level  DL_EXCLUDE_USER_DATA)
  - isAnon(mapping)) {
  -if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
  -pfn_user++;
  +else if (info-dump_level  DL_EXCLUDE_USER_DATA) {
  +/*
  + * Exclude the anonnymous pages as user pages.
  + */
  +if (isAnon(mapping)) {
  +if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
  +pfn_user++;
  +
  +/*
  + * Check the compound page
  + */
  +if (page_is_hugepage(flags)  compound_order  0) {
  +int i, nr_pages = 1  compound_order;
  +
  +for (i = 1; i  nr_pages; ++i) {
  +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
  +pfn_user++;
  +}
  +pfn += nr_pages - 2;
  +mem_map += (nr_pages - 1) * SIZE(page);
  +}
  +}
  +/*
  + * Exclude the hugetlbfs pages as user pages.
  + */
  +else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
  +int i, nr_pages = 1  compound_order;
  +
  +for (i = 0; i  nr_pages; ++i) {
  +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
  +pfn_user++;
  +}
  +pfn += nr_pages - 1;
  +mem_map += (nr_pages - 1) * SIZE(page);
  +}
 }
 /*
  * Exclude the hwpoison page.
 
  I'm concerned about the case that filtering is not performed to part of 
  mem_map
  entries not belonging to the current cyclic range.
 
  If maximum value of compound_order is larger than maximum value of
  CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
  ARRAY_LENGTH(zone.free_area),
  it's necessary to align info-bufsize_cyclic with larger one in
  check_cyclic_buffer_overrun().
 
 
 ping, in case you overlooked this...

Sorry for the delayed response, I prioritize the release of v1.5.5 now.

Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
as you said. In addition, I'm considering other way to address such case,
that is to bring the number of overflowed pages to the next cycle and
exclude them at the top of __exclude_unnecessary_pages() like below:

   /*
* The pages which should be excluded still remain.
*/
   if (remainder = 1) {
   int i;
   unsigned long tmp;
   for (i = 0; i  remainder; ++i) {
   if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) 
{
   pfn_user++;
   tmp++;
   }
   }
   pfn += tmp;
   remainder -= tmp;
   mem_map += (tmp - 1) * SIZE(page);
   continue;
   }

If this way works well, then aligning info-buf_size_cyclic will be
unnecessary.



Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-27 Thread HATAYAMA Daisuke
(2013/11/28 16:08), Atsushi Kumagai wrote:
 On 2013/11/22 16:18:20, kexec kexec-boun...@lists.infradead.org wrote:
 (2013/11/07 9:54), HATAYAMA Daisuke wrote:
 (2013/11/06 11:21), Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump 
 file.

 This patch requires the kernel patch to export necessary data structures 
 into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

 This patch introduce two new dump levels 32 and 64 to exclude all unused 
 and
 active hugepages. The level to exclude all unnecessary pages will be 127 
 now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.


 @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
  !isAnon(mapping)) {
 if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 pfn_cache_private++;
 +/*
 + * NOTE: If THP for cache is introduced, the check for
 + *   compound pages is needed here.
 + */
 }
 /*
  * Exclude the data page of the user process.
  */
 -else if ((info-dump_level  DL_EXCLUDE_USER_DATA)
 - isAnon(mapping)) {
 -if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 -pfn_user++;
 +else if (info-dump_level  DL_EXCLUDE_USER_DATA) {
 +/*
 + * Exclude the anonnymous pages as user pages.
 + */
 +if (isAnon(mapping)) {
 +if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
 +pfn_user++;
 +
 +/*
 + * Check the compound page
 + */
 +if (page_is_hugepage(flags)  compound_order  0) {
 +int i, nr_pages = 1  compound_order;
 +
 +for (i = 1; i  nr_pages; ++i) {
 +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
 +pfn_user++;
 +}
 +pfn += nr_pages - 2;
 +mem_map += (nr_pages - 1) * SIZE(page);
 +}
 +}
 +/*
 + * Exclude the hugetlbfs pages as user pages.
 + */
 +else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
 +int i, nr_pages = 1  compound_order;
 +
 +for (i = 0; i  nr_pages; ++i) {
 +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
 +pfn_user++;
 +}
 +pfn += nr_pages - 1;
 +mem_map += (nr_pages - 1) * SIZE(page);
 +}
 }
 /*
  * Exclude the hwpoison page.

 I'm concerned about the case that filtering is not performed to part of 
 mem_map
 entries not belonging to the current cyclic range.

 If maximum value of compound_order is larger than maximum value of
 CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
 ARRAY_LENGTH(zone.free_area),
 it's necessary to align info-bufsize_cyclic with larger one in
 check_cyclic_buffer_overrun().


 ping, in case you overlooked this...
 
 Sorry for the delayed response, I prioritize the release of v1.5.5 now.
 
 Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
 as you said. In addition, I'm considering other way to address such case,
 that is to bring the number of overflowed pages to the next cycle and
 exclude them at the top of __exclude_unnecessary_pages() like below:
 
 /*
  * The pages which should be excluded still remain.
  */
 if (remainder = 1) {
 int i;
 unsigned long tmp;
 for (i = 0; i  remainder; ++i) {
 if (clear_bit_on_2nd_bitmap_for_kernel(pfn + 
 i)) {
 pfn_user++;
 tmp++;
 }
 }
 pfn += tmp;
 remainder -= tmp;
 mem_map += (tmp - 1) * SIZE(page);
 continue;
 }
 
 If this way works well, then aligning info-buf_size_cyclic will be
 unnecessary.

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-21 Thread HATAYAMA Daisuke

(2013/11/07 9:54), HATAYAMA Daisuke wrote:

(2013/11/06 11:21), Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.




@@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
   && !isAnon(mapping)) {
   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
   pfn_cache_private++;
+/*
+ * NOTE: If THP for cache is introduced, the check for
+ *   compound pages is needed here.
+ */
   }
   /*
* Exclude the data page of the user process.
*/
-else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
-&& isAnon(mapping)) {
-if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
-pfn_user++;
+else if (info->dump_level & DL_EXCLUDE_USER_DATA) {
+/*
+ * Exclude the anonnymous pages as user pages.
+ */
+if (isAnon(mapping)) {
+if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+pfn_user++;
+
+/*
+ * Check the compound page
+ */
+if (page_is_hugepage(flags) && compound_order > 0) {
+int i, nr_pages = 1 << compound_order;
+
+for (i = 1; i < nr_pages; ++i) {
+if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+pfn_user++;
+}
+pfn += nr_pages - 2;
+mem_map += (nr_pages - 1) * SIZE(page);
+}
+}
+/*
+ * Exclude the hugetlbfs pages as user pages.
+ */
+else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
+int i, nr_pages = 1 << compound_order;
+
+for (i = 0; i < nr_pages; ++i) {
+if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+pfn_user++;
+}
+pfn += nr_pages - 1;
+mem_map += (nr_pages - 1) * SIZE(page);
+}
   }
   /*
* Exclude the hwpoison page.


I'm concerned about the case that filtering is not performed to part of mem_map
entries not belonging to the current cyclic range.

If maximum value of compound_order is larger than maximum value of
CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
ARRAY_LENGTH(zone.free_area),
it's necessary to align info->bufsize_cyclic with larger one in
check_cyclic_buffer_overrun().



ping, in case you overlooked this...

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-21 Thread HATAYAMA Daisuke

(2013/11/07 9:54), HATAYAMA Daisuke wrote:

(2013/11/06 11:21), Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.




@@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
!isAnon(mapping)) {
   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
   pfn_cache_private++;
+/*
+ * NOTE: If THP for cache is introduced, the check for
+ *   compound pages is needed here.
+ */
   }
   /*
* Exclude the data page of the user process.
*/
-else if ((info-dump_level  DL_EXCLUDE_USER_DATA)
- isAnon(mapping)) {
-if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
-pfn_user++;
+else if (info-dump_level  DL_EXCLUDE_USER_DATA) {
+/*
+ * Exclude the anonnymous pages as user pages.
+ */
+if (isAnon(mapping)) {
+if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+pfn_user++;
+
+/*
+ * Check the compound page
+ */
+if (page_is_hugepage(flags)  compound_order  0) {
+int i, nr_pages = 1  compound_order;
+
+for (i = 1; i  nr_pages; ++i) {
+if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+pfn_user++;
+}
+pfn += nr_pages - 2;
+mem_map += (nr_pages - 1) * SIZE(page);
+}
+}
+/*
+ * Exclude the hugetlbfs pages as user pages.
+ */
+else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
+int i, nr_pages = 1  compound_order;
+
+for (i = 0; i  nr_pages; ++i) {
+if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+pfn_user++;
+}
+pfn += nr_pages - 1;
+mem_map += (nr_pages - 1) * SIZE(page);
+}
   }
   /*
* Exclude the hwpoison page.


I'm concerned about the case that filtering is not performed to part of mem_map
entries not belonging to the current cyclic range.

If maximum value of compound_order is larger than maximum value of
CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
ARRAY_LENGTH(zone.free_area),
it's necessary to align info-bufsize_cyclic with larger one in
check_cyclic_buffer_overrun().



ping, in case you overlooked this...

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-11 Thread Petr Tesarik
On Fri, 08 Nov 2013 13:27:05 +0800
Jingbai Ma  wrote:

> On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote:
> > (2013/11/08 14:12), Atsushi Kumagai wrote:
> >> Hello Jingbai,
> >>
> >> (2013/11/07 17:58), Jingbai Ma wrote:
> >>> On 11/06/2013 10:23 PM, Vivek Goyal wrote:
>  On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
> > (2013/11/06 5:27), Vivek Goyal wrote:
> >> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
> >>> This patch set intend to exclude unnecessary hugepages from vmcore 
> >>> dump file.
> >>>
> >>> This patch requires the kernel patch to export necessary data 
> >>> structures into
> >>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
> >>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
> >>>
> >>> This patch introduce two new dump levels 32 and 64 to exclude all 
> >>> unused and
> >>> active hugepages. The level to exclude all unnecessary pages will be 
> >>> 127 now.
> >>
> >> Interesting. Why hugepages should be treated any differentely than 
> >> normal
> >> pages?
> >>
> >> If user asked to filter out free page, then it should be filtered and
> >> it should not matter whether it is a huge page or not?
> >
> > I'm making a RFC patch of hugepages filtering based on such policy.
> >
> > I attach the prototype version.
> > It's able to filter out also THPs, and suitable for cyclic processing
> > because it depends on mem_map and looking up it can be divided into
> > cycles. This is the same idea as page_is_buddy().
> >
> > So I think it's better.
> 
>  Agreed. Being able to treat hugepages in same manner as other pages
>  sounds good.
> 
>  Jingbai, looks good to you?
> >>>
> >>> It looks good to me.
> >>>
> >>> My only concern is by this way, we only can exclude all hugepage 
> >>> together, but can't exclude the free hugepages only. I'm not sure if user 
> >>> need to dump out the activated hugepage only.
> >>>
> >>> Kumagai-san, please correct me, if I'm wrong.
> >>
> >> Yes, my patch treats all allocated hugetlbfs pages as user pages,
> >> doesn't distinguish whether the pages are actually used or not.
> >> I made so because I guess it's enough for almost all users.
> >>
> >> We can introduce new dump level after it's needed actually,
> >> but I don't think now is the time. To introduce it without
> >> demand will make this tool just more complex.
> >>
> > 
> > Typically, users would allocate huge pages as much as actually they use 
> > only,
> > in order not to waste system memory. So, this design seems reasonable.
> > 
> 
> OK, It looks reasonable.

Agreed. Whether a page is a huge page or not is an implementation
detail (and with THP even more so). Makedumpfile users should only be
concerned about the _meaning_ of what gets filtered, not about
implementation details.

If we expose too much of the implementation, it may become hard to
maintain backward compatibility one day...

Thank you very much for all the work!

Petr T
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-11 Thread Petr Tesarik
On Fri, 08 Nov 2013 13:27:05 +0800
Jingbai Ma jingbai...@hp.com wrote:

 On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote:
  (2013/11/08 14:12), Atsushi Kumagai wrote:
  Hello Jingbai,
 
  (2013/11/07 17:58), Jingbai Ma wrote:
  On 11/06/2013 10:23 PM, Vivek Goyal wrote:
  On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
  (2013/11/06 5:27), Vivek Goyal wrote:
  On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
  This patch set intend to exclude unnecessary hugepages from vmcore 
  dump file.
 
  This patch requires the kernel patch to export necessary data 
  structures into
  vmcore: kexec: export hugepage data structure into vmcoreinfo
  http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
 
  This patch introduce two new dump levels 32 and 64 to exclude all 
  unused and
  active hugepages. The level to exclude all unnecessary pages will be 
  127 now.
 
  Interesting. Why hugepages should be treated any differentely than 
  normal
  pages?
 
  If user asked to filter out free page, then it should be filtered and
  it should not matter whether it is a huge page or not?
 
  I'm making a RFC patch of hugepages filtering based on such policy.
 
  I attach the prototype version.
  It's able to filter out also THPs, and suitable for cyclic processing
  because it depends on mem_map and looking up it can be divided into
  cycles. This is the same idea as page_is_buddy().
 
  So I think it's better.
 
  Agreed. Being able to treat hugepages in same manner as other pages
  sounds good.
 
  Jingbai, looks good to you?
 
  It looks good to me.
 
  My only concern is by this way, we only can exclude all hugepage 
  together, but can't exclude the free hugepages only. I'm not sure if user 
  need to dump out the activated hugepage only.
 
  Kumagai-san, please correct me, if I'm wrong.
 
  Yes, my patch treats all allocated hugetlbfs pages as user pages,
  doesn't distinguish whether the pages are actually used or not.
  I made so because I guess it's enough for almost all users.
 
  We can introduce new dump level after it's needed actually,
  but I don't think now is the time. To introduce it without
  demand will make this tool just more complex.
 
  
  Typically, users would allocate huge pages as much as actually they use 
  only,
  in order not to waste system memory. So, this design seems reasonable.
  
 
 OK, It looks reasonable.

Agreed. Whether a page is a huge page or not is an implementation
detail (and with THP even more so). Makedumpfile users should only be
concerned about the _meaning_ of what gets filtered, not about
implementation details.

If we expose too much of the implementation, it may become hard to
maintain backward compatibility one day...

Thank you very much for all the work!

Petr T
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma
On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote:
> (2013/11/08 14:12), Atsushi Kumagai wrote:
>> Hello Jingbai,
>>
>> (2013/11/07 17:58), Jingbai Ma wrote:
>>> On 11/06/2013 10:23 PM, Vivek Goyal wrote:
 On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
> (2013/11/06 5:27), Vivek Goyal wrote:
>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>>> This patch set intend to exclude unnecessary hugepages from vmcore dump 
>>> file.
>>>
>>> This patch requires the kernel patch to export necessary data 
>>> structures into
>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>>
>>> This patch introduce two new dump levels 32 and 64 to exclude all 
>>> unused and
>>> active hugepages. The level to exclude all unnecessary pages will be 
>>> 127 now.
>>
>> Interesting. Why hugepages should be treated any differentely than normal
>> pages?
>>
>> If user asked to filter out free page, then it should be filtered and
>> it should not matter whether it is a huge page or not?
>
> I'm making a RFC patch of hugepages filtering based on such policy.
>
> I attach the prototype version.
> It's able to filter out also THPs, and suitable for cyclic processing
> because it depends on mem_map and looking up it can be divided into
> cycles. This is the same idea as page_is_buddy().
>
> So I think it's better.

 Agreed. Being able to treat hugepages in same manner as other pages
 sounds good.

 Jingbai, looks good to you?
>>>
>>> It looks good to me.
>>>
>>> My only concern is by this way, we only can exclude all hugepage together, 
>>> but can't exclude the free hugepages only. I'm not sure if user need to 
>>> dump out the activated hugepage only.
>>>
>>> Kumagai-san, please correct me, if I'm wrong.
>>
>> Yes, my patch treats all allocated hugetlbfs pages as user pages,
>> doesn't distinguish whether the pages are actually used or not.
>> I made so because I guess it's enough for almost all users.
>>
>> We can introduce new dump level after it's needed actually,
>> but I don't think now is the time. To introduce it without
>> demand will make this tool just more complex.
>>
> 
> Typically, users would allocate huge pages as much as actually they use only,
> in order not to waste system memory. So, this design seems reasonable.
> 

OK, It looks reasonable.
Thanks!

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Atsushi Kumagai
Hello Jingbai,

(2013/11/07 17:58), Jingbai Ma wrote:
> On 11/06/2013 10:23 PM, Vivek Goyal wrote:
>> On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
>>> (2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
> This patch set intend to exclude unnecessary hugepages from vmcore dump 
> file.
>
> This patch requires the kernel patch to export necessary data structures 
> into
> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>
> This patch introduce two new dump levels 32 and 64 to exclude all unused 
> and
> active hugepages. The level to exclude all unnecessary pages will be 127 
> now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?
>>>
>>> I'm making a RFC patch of hugepages filtering based on such policy.
>>>
>>> I attach the prototype version.
>>> It's able to filter out also THPs, and suitable for cyclic processing
>>> because it depends on mem_map and looking up it can be divided into
>>> cycles. This is the same idea as page_is_buddy().
>>>
>>> So I think it's better.
>>
>> Agreed. Being able to treat hugepages in same manner as other pages
>> sounds good.
>>
>> Jingbai, looks good to you?
>
> It looks good to me.
>
> My only concern is by this way, we only can exclude all hugepage together, 
> but can't exclude the free hugepages only. I'm not sure if user need to dump 
> out the activated hugepage only.
>
> Kumagai-san, please correct me, if I'm wrong.

Yes, my patch treats all allocated hugetlbfs pages as user pages,
doesn't distinguish whether the pages are actually used or not.
I made so because I guess it's enough for almost all users.

We can introduce new dump level after it's needed actually,
but I don't think now is the time. To introduce it without
demand will make this tool just more complex.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread HATAYAMA Daisuke
(2013/11/08 14:12), Atsushi Kumagai wrote:
> Hello Jingbai,
> 
> (2013/11/07 17:58), Jingbai Ma wrote:
>> On 11/06/2013 10:23 PM, Vivek Goyal wrote:
>>> On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>> This patch set intend to exclude unnecessary hugepages from vmcore dump 
>> file.
>>
>> This patch requires the kernel patch to export necessary data structures 
>> into
>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>
>> This patch introduce two new dump levels 32 and 64 to exclude all unused 
>> and
>> active hugepages. The level to exclude all unnecessary pages will be 127 
>> now.
>
> Interesting. Why hugepages should be treated any differentely than normal
> pages?
>
> If user asked to filter out free page, then it should be filtered and
> it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.
>>>
>>> Agreed. Being able to treat hugepages in same manner as other pages
>>> sounds good.
>>>
>>> Jingbai, looks good to you?
>>
>> It looks good to me.
>>
>> My only concern is by this way, we only can exclude all hugepage together, 
>> but can't exclude the free hugepages only. I'm not sure if user need to dump 
>> out the activated hugepage only.
>>
>> Kumagai-san, please correct me, if I'm wrong.
> 
> Yes, my patch treats all allocated hugetlbfs pages as user pages,
> doesn't distinguish whether the pages are actually used or not.
> I made so because I guess it's enough for almost all users.
> 
> We can introduce new dump level after it's needed actually,
> but I don't think now is the time. To introduce it without
> demand will make this tool just more complex.
> 

Typically, users would allocate huge pages as much as actually they use only,
in order not to waste system memory. So, this design seems reasonable.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma

On 11/06/2013 10:23 PM, Vivek Goyal wrote:

On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.


Agreed. Being able to treat hugepages in same manner as other pages
sounds good.

Jingbai, looks good to you?


It looks good to me.

My only concern is by this way, we only can exclude all hugepage 
together, but can't exclude the free hugepages only. I'm not sure if 
user need to dump out the activated hugepage only.


Kumagai-san, please correct me, if I'm wrong.





Thanks
Vivek



--
Thanks
Atsushi Kumagai



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma

On 11/06/2013 10:23 PM, Vivek Goyal wrote:

On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.


Agreed. Being able to treat hugepages in same manner as other pages
sounds good.

Jingbai, looks good to you?


It looks good to me.

My only concern is by this way, we only can exclude all hugepage 
together, but can't exclude the free hugepages only. I'm not sure if 
user need to dump out the activated hugepage only.


Kumagai-san, please correct me, if I'm wrong.





Thanks
Vivek



--
Thanks
Atsushi Kumagai



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread HATAYAMA Daisuke
(2013/11/08 14:12), Atsushi Kumagai wrote:
 Hello Jingbai,
 
 (2013/11/07 17:58), Jingbai Ma wrote:
 On 11/06/2013 10:23 PM, Vivek Goyal wrote:
 On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump 
 file.

 This patch requires the kernel patch to export necessary data structures 
 into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

 This patch introduce two new dump levels 32 and 64 to exclude all unused 
 and
 active hugepages. The level to exclude all unnecessary pages will be 127 
 now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.

 Agreed. Being able to treat hugepages in same manner as other pages
 sounds good.

 Jingbai, looks good to you?

 It looks good to me.

 My only concern is by this way, we only can exclude all hugepage together, 
 but can't exclude the free hugepages only. I'm not sure if user need to dump 
 out the activated hugepage only.

 Kumagai-san, please correct me, if I'm wrong.
 
 Yes, my patch treats all allocated hugetlbfs pages as user pages,
 doesn't distinguish whether the pages are actually used or not.
 I made so because I guess it's enough for almost all users.
 
 We can introduce new dump level after it's needed actually,
 but I don't think now is the time. To introduce it without
 demand will make this tool just more complex.
 

Typically, users would allocate huge pages as much as actually they use only,
in order not to waste system memory. So, this design seems reasonable.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Atsushi Kumagai
Hello Jingbai,

(2013/11/07 17:58), Jingbai Ma wrote:
 On 11/06/2013 10:23 PM, Vivek Goyal wrote:
 On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump 
 file.

 This patch requires the kernel patch to export necessary data structures 
 into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

 This patch introduce two new dump levels 32 and 64 to exclude all unused 
 and
 active hugepages. The level to exclude all unnecessary pages will be 127 
 now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.

 Agreed. Being able to treat hugepages in same manner as other pages
 sounds good.

 Jingbai, looks good to you?

 It looks good to me.

 My only concern is by this way, we only can exclude all hugepage together, 
 but can't exclude the free hugepages only. I'm not sure if user need to dump 
 out the activated hugepage only.

 Kumagai-san, please correct me, if I'm wrong.

Yes, my patch treats all allocated hugetlbfs pages as user pages,
doesn't distinguish whether the pages are actually used or not.
I made so because I guess it's enough for almost all users.

We can introduce new dump level after it's needed actually,
but I don't think now is the time. To introduce it without
demand will make this tool just more complex.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Jingbai Ma
On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote:
 (2013/11/08 14:12), Atsushi Kumagai wrote:
 Hello Jingbai,

 (2013/11/07 17:58), Jingbai Ma wrote:
 On 11/06/2013 10:23 PM, Vivek Goyal wrote:
 On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump 
 file.

 This patch requires the kernel patch to export necessary data 
 structures into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

 This patch introduce two new dump levels 32 and 64 to exclude all 
 unused and
 active hugepages. The level to exclude all unnecessary pages will be 
 127 now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?

 I'm making a RFC patch of hugepages filtering based on such policy.

 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().

 So I think it's better.

 Agreed. Being able to treat hugepages in same manner as other pages
 sounds good.

 Jingbai, looks good to you?

 It looks good to me.

 My only concern is by this way, we only can exclude all hugepage together, 
 but can't exclude the free hugepages only. I'm not sure if user need to 
 dump out the activated hugepage only.

 Kumagai-san, please correct me, if I'm wrong.

 Yes, my patch treats all allocated hugetlbfs pages as user pages,
 doesn't distinguish whether the pages are actually used or not.
 I made so because I guess it's enough for almost all users.

 We can introduce new dump level after it's needed actually,
 but I don't think now is the time. To introduce it without
 demand will make this tool just more complex.

 
 Typically, users would allocate huge pages as much as actually they use only,
 in order not to waste system memory. So, this design seems reasonable.
 

OK, It looks reasonable.
Thanks!

-- 
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-06 Thread HATAYAMA Daisuke

(2013/11/06 11:21), Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.




@@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
&& !isAnon(mapping)) {
if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
pfn_cache_private++;
+   /*
+* NOTE: If THP for cache is introduced, the check for
+*   compound pages is needed here.
+*/
}
/*
 * Exclude the data page of the user process.
 */
-   else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
-   && isAnon(mapping)) {
-   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
-   pfn_user++;
+   else if (info->dump_level & DL_EXCLUDE_USER_DATA) {
+   /*
+* Exclude the anonnymous pages as user pages.
+*/
+   if (isAnon(mapping)) {
+   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+   pfn_user++;
+
+   /*
+* Check the compound page
+*/
+   if (page_is_hugepage(flags) && compound_order > 
0) {
+   int i, nr_pages = 1 << compound_order;
+
+   for (i = 1; i < nr_pages; ++i) {
+   if 
(clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+   pfn_user++;
+   }
+   pfn += nr_pages - 2;
+   mem_map += (nr_pages - 1) * SIZE(page);
+   }
+   }
+   /*
+* Exclude the hugetlbfs pages as user pages.
+*/
+   else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
+   int i, nr_pages = 1 << compound_order;
+
+   for (i = 0; i < nr_pages; ++i) {
+   if 
(clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+   pfn_user++;
+   }
+   pfn += nr_pages - 1;
+   mem_map += (nr_pages - 1) * SIZE(page);
+   }
}
/*
 * Exclude the hwpoison page.


I'm concerned about the case that filtering is not performed to part of mem_map
entries not belonging to the current cyclic range.

If maximum value of compound_order is larger than maximum value of
CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
ARRAY_LENGTH(zone.free_area),
it's necessary to align info->bufsize_cyclic with larger one in
check_cyclic_buffer_overrun().

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-06 Thread Vivek Goyal
On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
> (2013/11/06 5:27), Vivek Goyal wrote:
> > On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
> >> This patch set intend to exclude unnecessary hugepages from vmcore dump 
> >> file.
> >>
> >> This patch requires the kernel patch to export necessary data structures 
> >> into
> >> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
> >> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
> >>
> >> This patch introduce two new dump levels 32 and 64 to exclude all unused 
> >> and
> >> active hugepages. The level to exclude all unnecessary pages will be 127 
> >> now.
> >
> > Interesting. Why hugepages should be treated any differentely than normal
> > pages?
> >
> > If user asked to filter out free page, then it should be filtered and
> > it should not matter whether it is a huge page or not?
> 
> I'm making a RFC patch of hugepages filtering based on such policy.
> 
> I attach the prototype version.
> It's able to filter out also THPs, and suitable for cyclic processing
> because it depends on mem_map and looking up it can be divided into
> cycles. This is the same idea as page_is_buddy().
> 
> So I think it's better.

Agreed. Being able to treat hugepages in same manner as other pages
sounds good.

Jingbai, looks good to you?

Thanks
Vivek

> 
> -- 
> Thanks
> Atsushi Kumagai
> 
> 
> From: Atsushi Kumagai 
> Date: Wed, 6 Nov 2013 10:10:43 +0900
> Subject: [PATCH] [RFC] Exclude hugepages.
> 
> Signed-off-by: Atsushi Kumagai 
> ---
>makedumpfile.c | 122 
> ++---
>makedumpfile.h |   8 
>2 files changed, 125 insertions(+), 5 deletions(-)
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 428c53e..75b7123 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -63,6 +63,7 @@ do { \
>
>static void check_cyclic_buffer_overrun(void);
>static void setup_page_is_buddy(void);
> +static void setup_page_is_hugepage(void);
>
>void
>initialize_tables(void)
> @@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) {
>}
>
>static int
> +page_is_hugepage(unsigned long flags) {
> + if (NUMBER(PG_head) != NOT_FOUND_NUMBER) {
> + return isHead(flags);
> + } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) {
> + return isTail(flags);
> + }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) {
> + return isCompound(flags);
> + }
> + return 0;
> +}
> +
> +static int
>is_mapped_with_mmap(off_t offset) {
>
>   if (info->flag_usemmap
> @@ -1107,6 +1120,8 @@ get_symbol_info(void)
>   SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn,
>   "node_remap_start_pfn");
>
> + SYMBOL_INIT(free_huge_page, "free_huge_page");
> +
>   return TRUE;
>}
>
> @@ -1214,11 +1229,19 @@ get_structure_info(void)
>
>   ENUM_NUMBER_INIT(PG_lru, "PG_lru");
>   ENUM_NUMBER_INIT(PG_private, "PG_private");
> + ENUM_NUMBER_INIT(PG_head, "PG_head");
> + ENUM_NUMBER_INIT(PG_tail, "PG_tail");
> + ENUM_NUMBER_INIT(PG_compound, "PG_compound");
>   ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache");
>   ENUM_NUMBER_INIT(PG_buddy, "PG_buddy");
>   ENUM_NUMBER_INIT(PG_slab, "PG_slab");
>   ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison");
>
> + if (NUMBER(PG_head) == NOT_FOUND_NUMBER &&
> + NUMBER(PG_compound) == NOT_FOUND_NUMBER)
> + /* Pre-2.6.26 kernels did not have pageflags */
> + NUMBER(PG_compound) = PG_compound_ORIGINAL;
> +
>   ENUM_TYPE_SIZE_INIT(pageflags, "pageflags");
>
>   TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t");
> @@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void)
>   WRITE_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr);
>   WRITE_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr);
>   WRITE_SYMBOL("node_remap_start_pfn", node_remap_start_pfn);
> + WRITE_SYMBOL("free_huge_page", free_huge_page);
>
>   /*
>* write the structure size of 1st kernel
> @@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void)
>
>   WRITE_NUMBER("PG_lru", PG_lru);
>   WRITE_NUMBER("PG_private", PG_private);
> + WRITE_NUMBER("PG_head", PG_head);
> + WRITE_NUMBER("PG_tail", PG_tail);
> + WRITE_NUMBER("PG_compound", PG_compound);
>   WRITE_NUMBER("PG_swapcache", PG_swapcache);
>   WRITE_NUMBER("PG_buddy", PG_buddy);
>   WRITE_NUMBER("PG_slab", PG_slab);
> @@ -1932,6 +1959,7 @@ read_vmcoreinfo(void)
>   READ_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr);
>   READ_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr);
>   READ_SYMBOL("node_remap_start_pfn", node_remap_start_pfn);
> + READ_SYMBOL("free_huge_page", free_huge_page);
>
>   READ_STRUCTURE_SIZE("page", page);
>   READ_STRUCTURE_SIZE("mem_section", mem_section);
> @@ -2000,6 +2028,9 @@ 

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-06 Thread Vivek Goyal
On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
 (2013/11/06 5:27), Vivek Goyal wrote:
  On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
  This patch set intend to exclude unnecessary hugepages from vmcore dump 
  file.
 
  This patch requires the kernel patch to export necessary data structures 
  into
  vmcore: kexec: export hugepage data structure into vmcoreinfo
  http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
 
  This patch introduce two new dump levels 32 and 64 to exclude all unused 
  and
  active hugepages. The level to exclude all unnecessary pages will be 127 
  now.
 
  Interesting. Why hugepages should be treated any differentely than normal
  pages?
 
  If user asked to filter out free page, then it should be filtered and
  it should not matter whether it is a huge page or not?
 
 I'm making a RFC patch of hugepages filtering based on such policy.
 
 I attach the prototype version.
 It's able to filter out also THPs, and suitable for cyclic processing
 because it depends on mem_map and looking up it can be divided into
 cycles. This is the same idea as page_is_buddy().
 
 So I think it's better.

Agreed. Being able to treat hugepages in same manner as other pages
sounds good.

Jingbai, looks good to you?

Thanks
Vivek

 
 -- 
 Thanks
 Atsushi Kumagai
 
 
 From: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp
 Date: Wed, 6 Nov 2013 10:10:43 +0900
 Subject: [PATCH] [RFC] Exclude hugepages.
 
 Signed-off-by: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp
 ---
makedumpfile.c | 122 
 ++---
makedumpfile.h |   8 
2 files changed, 125 insertions(+), 5 deletions(-)
 
 diff --git a/makedumpfile.c b/makedumpfile.c
 index 428c53e..75b7123 100644
 --- a/makedumpfile.c
 +++ b/makedumpfile.c
 @@ -63,6 +63,7 @@ do { \

static void check_cyclic_buffer_overrun(void);
static void setup_page_is_buddy(void);
 +static void setup_page_is_hugepage(void);

void
initialize_tables(void)
 @@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) {
}

static int
 +page_is_hugepage(unsigned long flags) {
 + if (NUMBER(PG_head) != NOT_FOUND_NUMBER) {
 + return isHead(flags);
 + } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) {
 + return isTail(flags);
 + }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) {
 + return isCompound(flags);
 + }
 + return 0;
 +}
 +
 +static int
is_mapped_with_mmap(off_t offset) {

   if (info-flag_usemmap
 @@ -1107,6 +1120,8 @@ get_symbol_info(void)
   SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn,
   node_remap_start_pfn);

 + SYMBOL_INIT(free_huge_page, free_huge_page);
 +
   return TRUE;
}

 @@ -1214,11 +1229,19 @@ get_structure_info(void)

   ENUM_NUMBER_INIT(PG_lru, PG_lru);
   ENUM_NUMBER_INIT(PG_private, PG_private);
 + ENUM_NUMBER_INIT(PG_head, PG_head);
 + ENUM_NUMBER_INIT(PG_tail, PG_tail);
 + ENUM_NUMBER_INIT(PG_compound, PG_compound);
   ENUM_NUMBER_INIT(PG_swapcache, PG_swapcache);
   ENUM_NUMBER_INIT(PG_buddy, PG_buddy);
   ENUM_NUMBER_INIT(PG_slab, PG_slab);
   ENUM_NUMBER_INIT(PG_hwpoison, PG_hwpoison);

 + if (NUMBER(PG_head) == NOT_FOUND_NUMBER 
 + NUMBER(PG_compound) == NOT_FOUND_NUMBER)
 + /* Pre-2.6.26 kernels did not have pageflags */
 + NUMBER(PG_compound) = PG_compound_ORIGINAL;
 +
   ENUM_TYPE_SIZE_INIT(pageflags, pageflags);

   TYPEDEF_SIZE_INIT(nodemask_t, nodemask_t);
 @@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void)
   WRITE_SYMBOL(node_remap_start_vaddr, node_remap_start_vaddr);
   WRITE_SYMBOL(node_remap_end_vaddr, node_remap_end_vaddr);
   WRITE_SYMBOL(node_remap_start_pfn, node_remap_start_pfn);
 + WRITE_SYMBOL(free_huge_page, free_huge_page);

   /*
* write the structure size of 1st kernel
 @@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void)

   WRITE_NUMBER(PG_lru, PG_lru);
   WRITE_NUMBER(PG_private, PG_private);
 + WRITE_NUMBER(PG_head, PG_head);
 + WRITE_NUMBER(PG_tail, PG_tail);
 + WRITE_NUMBER(PG_compound, PG_compound);
   WRITE_NUMBER(PG_swapcache, PG_swapcache);
   WRITE_NUMBER(PG_buddy, PG_buddy);
   WRITE_NUMBER(PG_slab, PG_slab);
 @@ -1932,6 +1959,7 @@ read_vmcoreinfo(void)
   READ_SYMBOL(node_remap_start_vaddr, node_remap_start_vaddr);
   READ_SYMBOL(node_remap_end_vaddr, node_remap_end_vaddr);
   READ_SYMBOL(node_remap_start_pfn, node_remap_start_pfn);
 + READ_SYMBOL(free_huge_page, free_huge_page);

   READ_STRUCTURE_SIZE(page, page);
   READ_STRUCTURE_SIZE(mem_section, mem_section);
 @@ -2000,6 +2028,9 @@ read_vmcoreinfo(void)

   READ_NUMBER(PG_lru, PG_lru);
   READ_NUMBER(PG_private, PG_private);
 + READ_NUMBER(PG_head, PG_head);
 + 

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-06 Thread HATAYAMA Daisuke

(2013/11/06 11:21), Atsushi Kumagai wrote:

(2013/11/06 5:27), Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.




@@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 !isAnon(mapping)) {
if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
pfn_cache_private++;
+   /*
+* NOTE: If THP for cache is introduced, the check for
+*   compound pages is needed here.
+*/
}
/*
 * Exclude the data page of the user process.
 */
-   else if ((info-dump_level  DL_EXCLUDE_USER_DATA)
-isAnon(mapping)) {
-   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
-   pfn_user++;
+   else if (info-dump_level  DL_EXCLUDE_USER_DATA) {
+   /*
+* Exclude the anonnymous pages as user pages.
+*/
+   if (isAnon(mapping)) {
+   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+   pfn_user++;
+
+   /*
+* Check the compound page
+*/
+   if (page_is_hugepage(flags)  compound_order  
0) {
+   int i, nr_pages = 1  compound_order;
+
+   for (i = 1; i  nr_pages; ++i) {
+   if 
(clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+   pfn_user++;
+   }
+   pfn += nr_pages - 2;
+   mem_map += (nr_pages - 1) * SIZE(page);
+   }
+   }
+   /*
+* Exclude the hugetlbfs pages as user pages.
+*/
+   else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
+   int i, nr_pages = 1  compound_order;
+
+   for (i = 0; i  nr_pages; ++i) {
+   if 
(clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
+   pfn_user++;
+   }
+   pfn += nr_pages - 1;
+   mem_map += (nr_pages - 1) * SIZE(page);
+   }
}
/*
 * Exclude the hwpoison page.


I'm concerned about the case that filtering is not performed to part of mem_map
entries not belonging to the current cyclic range.

If maximum value of compound_order is larger than maximum value of
CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by 
ARRAY_LENGTH(zone.free_area),
it's necessary to align info-bufsize_cyclic with larger one in
check_cyclic_buffer_overrun().

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Atsushi Kumagai
(2013/11/06 5:27), Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>> This patch set intend to exclude unnecessary hugepages from vmcore dump file.
>>
>> This patch requires the kernel patch to export necessary data structures into
>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>
>> This patch introduce two new dump levels 32 and 64 to exclude all unused and
>> active hugepages. The level to exclude all unnecessary pages will be 127 now.
>
> Interesting. Why hugepages should be treated any differentely than normal
> pages?
>
> If user asked to filter out free page, then it should be filtered and
> it should not matter whether it is a huge page or not?

I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.

-- 
Thanks
Atsushi Kumagai


From: Atsushi Kumagai 
Date: Wed, 6 Nov 2013 10:10:43 +0900
Subject: [PATCH] [RFC] Exclude hugepages.

Signed-off-by: Atsushi Kumagai 
---
   makedumpfile.c | 122 
++---
   makedumpfile.h |   8 
   2 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index 428c53e..75b7123 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -63,6 +63,7 @@ do { \
   
   static void check_cyclic_buffer_overrun(void);
   static void setup_page_is_buddy(void);
+static void setup_page_is_hugepage(void);
   
   void
   initialize_tables(void)
@@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) {
   }
   
   static int
+page_is_hugepage(unsigned long flags) {
+   if (NUMBER(PG_head) != NOT_FOUND_NUMBER) {
+   return isHead(flags);
+   } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) {
+   return isTail(flags);
+   }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) {
+   return isCompound(flags);
+   }
+   return 0;
+}
+
+static int
   is_mapped_with_mmap(off_t offset) {
   
if (info->flag_usemmap
@@ -1107,6 +1120,8 @@ get_symbol_info(void)
SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn,
"node_remap_start_pfn");
   
+   SYMBOL_INIT(free_huge_page, "free_huge_page");
+
return TRUE;
   }
   
@@ -1214,11 +1229,19 @@ get_structure_info(void)
   
ENUM_NUMBER_INIT(PG_lru, "PG_lru");
ENUM_NUMBER_INIT(PG_private, "PG_private");
+   ENUM_NUMBER_INIT(PG_head, "PG_head");
+   ENUM_NUMBER_INIT(PG_tail, "PG_tail");
+   ENUM_NUMBER_INIT(PG_compound, "PG_compound");
ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache");
ENUM_NUMBER_INIT(PG_buddy, "PG_buddy");
ENUM_NUMBER_INIT(PG_slab, "PG_slab");
ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison");
   
+   if (NUMBER(PG_head) == NOT_FOUND_NUMBER &&
+   NUMBER(PG_compound) == NOT_FOUND_NUMBER)
+   /* Pre-2.6.26 kernels did not have pageflags */
+   NUMBER(PG_compound) = PG_compound_ORIGINAL;
+
ENUM_TYPE_SIZE_INIT(pageflags, "pageflags");
   
TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t");
@@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void)
WRITE_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr);
WRITE_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr);
WRITE_SYMBOL("node_remap_start_pfn", node_remap_start_pfn);
+   WRITE_SYMBOL("free_huge_page", free_huge_page);
   
/*
 * write the structure size of 1st kernel
@@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void)
   
WRITE_NUMBER("PG_lru", PG_lru);
WRITE_NUMBER("PG_private", PG_private);
+   WRITE_NUMBER("PG_head", PG_head);
+   WRITE_NUMBER("PG_tail", PG_tail);
+   WRITE_NUMBER("PG_compound", PG_compound);
WRITE_NUMBER("PG_swapcache", PG_swapcache);
WRITE_NUMBER("PG_buddy", PG_buddy);
WRITE_NUMBER("PG_slab", PG_slab);
@@ -1932,6 +1959,7 @@ read_vmcoreinfo(void)
READ_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr);
READ_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr);
READ_SYMBOL("node_remap_start_pfn", node_remap_start_pfn);
+   READ_SYMBOL("free_huge_page", free_huge_page);
   
READ_STRUCTURE_SIZE("page", page);
READ_STRUCTURE_SIZE("mem_section", mem_section);
@@ -2000,6 +2028,9 @@ read_vmcoreinfo(void)
   
READ_NUMBER("PG_lru", PG_lru);
READ_NUMBER("PG_private", PG_private);
+   READ_NUMBER("PG_head", PG_head);
+   READ_NUMBER("PG_tail", PG_tail);
+   READ_NUMBER("PG_compound", PG_compound);
READ_NUMBER("PG_swapcache", PG_swapcache);
READ_NUMBER("PG_slab", PG_slab);
READ_NUMBER("PG_buddy", PG_buddy);

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Vivek Goyal
On Wed, Nov 06, 2013 at 09:47:49AM +0800, Jingbai Ma wrote:
> On 11/06/2013 04:26 AM, Vivek Goyal wrote:
> >On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
> >>This patch set intend to exclude unnecessary hugepages from vmcore dump 
> >>file.
> >>
> >>This patch requires the kernel patch to export necessary data structures 
> >>into
> >>vmcore: "kexec: export hugepage data structure into vmcoreinfo"
> >>http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
> >>
> >>This patch introduce two new dump levels 32 and 64 to exclude all unused and
> >>active hugepages. The level to exclude all unnecessary pages will be 127 
> >>now.
> >
> >Interesting. Why hugepages should be treated any differentely than normal
> >pages?
> >
> >If user asked to filter out free page, then it should be filtered and
> >it should not matter whether it is a huge page or not?
> 
> Yes, free hugepages should be filtered out with other free pages. It
> sounds reasonable.
> 
> But for active hugepages, I would offer user more
> choices/flexibility. (maybe bad).
> I'm OK to filter active hugepages with other user data page.
> 
> Any other comments?

I really can't see why hugepages are different than regular pages when
it comes to filtering. IMO, we really should not create filtering
option/levels only for huge pages, until and unless there is a strong
use case.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma

On 11/06/2013 04:26 AM, Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


Yes, free hugepages should be filtered out with other free pages. It 
sounds reasonable.


But for active hugepages, I would offer user more choices/flexibility. 
(maybe bad).

I'm OK to filter active hugepages with other user data page.

Any other comments?




Thanks
Vivek



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Vivek Goyal
On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
> This patch set intend to exclude unnecessary hugepages from vmcore dump file.
> 
> This patch requires the kernel patch to export necessary data structures into
> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
> 
> This patch introduce two new dump levels 32 and 64 to exclude all unused and
> active hugepages. The level to exclude all unnecessary pages will be 127 now.

Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma
This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: "kexec: export hugepage data structure into vmcoreinfo"
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.

| cachecachefreeactive
  Dump  |  zero   without  with userfreehugehuge
  Level |  page   private  private  datapagepagepage
 ---+--
 0  |
 1  |   X
 2  |   X
 4  |   XX
 8  |X
16  |X
32  |X
64  |X   X
   127  |   X   XX   X   X   X   X

example:
To exclude all unnecessary pages:
makedumpfile -c --message-level 23 -d 127 /proc/vmcore /var/crash/kdump

To exclude all unnecessary pages but keep active hugepages:
makedumpfile -c --message-level 23 -d 63 /proc/vmcore /var/crash/kdump

---

Jingbai Ma (3):
  makedumpfile: hugepage filtering: add hugepage filtering functions
  makedumpfile: hugepage filtering: add excluding hugepage messages
  makedumpfile: hugepage filtering: add new dump levels for manual page


 makedumpfile.8 |  170 +++
 makedumpfile.c |  272 
 makedumpfile.h |   19 
 print_info.c   |   12 +-
 print_info.h   |2 
 5 files changed, 431 insertions(+), 44 deletions(-)

--

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma
This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.

| cachecachefreeactive
  Dump  |  zero   without  with userfreehugehuge
  Level |  page   private  private  datapagepagepage
 ---+--
 0  |
 1  |   X
 2  |   X
 4  |   XX
 8  |X
16  |X
32  |X
64  |X   X
   127  |   X   XX   X   X   X   X

example:
To exclude all unnecessary pages:
makedumpfile -c --message-level 23 -d 127 /proc/vmcore /var/crash/kdump

To exclude all unnecessary pages but keep active hugepages:
makedumpfile -c --message-level 23 -d 63 /proc/vmcore /var/crash/kdump

---

Jingbai Ma (3):
  makedumpfile: hugepage filtering: add hugepage filtering functions
  makedumpfile: hugepage filtering: add excluding hugepage messages
  makedumpfile: hugepage filtering: add new dump levels for manual page


 makedumpfile.8 |  170 +++
 makedumpfile.c |  272 
 makedumpfile.h |   19 
 print_info.c   |   12 +-
 print_info.h   |2 
 5 files changed, 431 insertions(+), 44 deletions(-)

--

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Vivek Goyal
On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump file.
 
 This patch requires the kernel patch to export necessary data structures into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
 
 This patch introduce two new dump levels 32 and 64 to exclude all unused and
 active hugepages. The level to exclude all unnecessary pages will be 127 now.

Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Jingbai Ma

On 11/06/2013 04:26 AM, Vivek Goyal wrote:

On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:

This patch set intend to exclude unnecessary hugepages from vmcore dump file.

This patch requires the kernel patch to export necessary data structures into
vmcore: kexec: export hugepage data structure into vmcoreinfo
http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

This patch introduce two new dump levels 32 and 64 to exclude all unused and
active hugepages. The level to exclude all unnecessary pages will be 127 now.


Interesting. Why hugepages should be treated any differentely than normal
pages?

If user asked to filter out free page, then it should be filtered and
it should not matter whether it is a huge page or not?


Yes, free hugepages should be filtered out with other free pages. It 
sounds reasonable.


But for active hugepages, I would offer user more choices/flexibility. 
(maybe bad).

I'm OK to filter active hugepages with other user data page.

Any other comments?




Thanks
Vivek



--
Thanks,
Jingbai Ma
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Vivek Goyal
On Wed, Nov 06, 2013 at 09:47:49AM +0800, Jingbai Ma wrote:
 On 11/06/2013 04:26 AM, Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump 
 file.
 
 This patch requires the kernel patch to export necessary data structures 
 into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
 
 This patch introduce two new dump levels 32 and 64 to exclude all unused and
 active hugepages. The level to exclude all unnecessary pages will be 127 
 now.
 
 Interesting. Why hugepages should be treated any differentely than normal
 pages?
 
 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?
 
 Yes, free hugepages should be filtered out with other free pages. It
 sounds reasonable.
 
 But for active hugepages, I would offer user more
 choices/flexibility. (maybe bad).
 I'm OK to filter active hugepages with other user data page.
 
 Any other comments?

I really can't see why hugepages are different than regular pages when
it comes to filtering. IMO, we really should not create filtering
option/levels only for huge pages, until and unless there is a strong
use case.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Atsushi Kumagai
(2013/11/06 5:27), Vivek Goyal wrote:
 On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
 This patch set intend to exclude unnecessary hugepages from vmcore dump file.

 This patch requires the kernel patch to export necessary data structures into
 vmcore: kexec: export hugepage data structure into vmcoreinfo
 http://lists.infradead.org/pipermail/kexec/2013-November/009997.html

 This patch introduce two new dump levels 32 and 64 to exclude all unused and
 active hugepages. The level to exclude all unnecessary pages will be 127 now.

 Interesting. Why hugepages should be treated any differentely than normal
 pages?

 If user asked to filter out free page, then it should be filtered and
 it should not matter whether it is a huge page or not?

I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.

-- 
Thanks
Atsushi Kumagai


From: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp
Date: Wed, 6 Nov 2013 10:10:43 +0900
Subject: [PATCH] [RFC] Exclude hugepages.

Signed-off-by: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp
---
   makedumpfile.c | 122 
++---
   makedumpfile.h |   8 
   2 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index 428c53e..75b7123 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -63,6 +63,7 @@ do { \
   
   static void check_cyclic_buffer_overrun(void);
   static void setup_page_is_buddy(void);
+static void setup_page_is_hugepage(void);
   
   void
   initialize_tables(void)
@@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) {
   }
   
   static int
+page_is_hugepage(unsigned long flags) {
+   if (NUMBER(PG_head) != NOT_FOUND_NUMBER) {
+   return isHead(flags);
+   } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) {
+   return isTail(flags);
+   }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) {
+   return isCompound(flags);
+   }
+   return 0;
+}
+
+static int
   is_mapped_with_mmap(off_t offset) {
   
if (info-flag_usemmap
@@ -1107,6 +1120,8 @@ get_symbol_info(void)
SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn,
node_remap_start_pfn);
   
+   SYMBOL_INIT(free_huge_page, free_huge_page);
+
return TRUE;
   }
   
@@ -1214,11 +1229,19 @@ get_structure_info(void)
   
ENUM_NUMBER_INIT(PG_lru, PG_lru);
ENUM_NUMBER_INIT(PG_private, PG_private);
+   ENUM_NUMBER_INIT(PG_head, PG_head);
+   ENUM_NUMBER_INIT(PG_tail, PG_tail);
+   ENUM_NUMBER_INIT(PG_compound, PG_compound);
ENUM_NUMBER_INIT(PG_swapcache, PG_swapcache);
ENUM_NUMBER_INIT(PG_buddy, PG_buddy);
ENUM_NUMBER_INIT(PG_slab, PG_slab);
ENUM_NUMBER_INIT(PG_hwpoison, PG_hwpoison);
   
+   if (NUMBER(PG_head) == NOT_FOUND_NUMBER 
+   NUMBER(PG_compound) == NOT_FOUND_NUMBER)
+   /* Pre-2.6.26 kernels did not have pageflags */
+   NUMBER(PG_compound) = PG_compound_ORIGINAL;
+
ENUM_TYPE_SIZE_INIT(pageflags, pageflags);
   
TYPEDEF_SIZE_INIT(nodemask_t, nodemask_t);
@@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void)
WRITE_SYMBOL(node_remap_start_vaddr, node_remap_start_vaddr);
WRITE_SYMBOL(node_remap_end_vaddr, node_remap_end_vaddr);
WRITE_SYMBOL(node_remap_start_pfn, node_remap_start_pfn);
+   WRITE_SYMBOL(free_huge_page, free_huge_page);
   
/*
 * write the structure size of 1st kernel
@@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void)
   
WRITE_NUMBER(PG_lru, PG_lru);
WRITE_NUMBER(PG_private, PG_private);
+   WRITE_NUMBER(PG_head, PG_head);
+   WRITE_NUMBER(PG_tail, PG_tail);
+   WRITE_NUMBER(PG_compound, PG_compound);
WRITE_NUMBER(PG_swapcache, PG_swapcache);
WRITE_NUMBER(PG_buddy, PG_buddy);
WRITE_NUMBER(PG_slab, PG_slab);
@@ -1932,6 +1959,7 @@ read_vmcoreinfo(void)
READ_SYMBOL(node_remap_start_vaddr, node_remap_start_vaddr);
READ_SYMBOL(node_remap_end_vaddr, node_remap_end_vaddr);
READ_SYMBOL(node_remap_start_pfn, node_remap_start_pfn);
+   READ_SYMBOL(free_huge_page, free_huge_page);
   
READ_STRUCTURE_SIZE(page, page);
READ_STRUCTURE_SIZE(mem_section, mem_section);
@@ -2000,6 +2028,9 @@ read_vmcoreinfo(void)
   
READ_NUMBER(PG_lru, PG_lru);
READ_NUMBER(PG_private, PG_private);
+   READ_NUMBER(PG_head, PG_head);
+   READ_NUMBER(PG_tail, PG_tail);
+   READ_NUMBER(PG_compound, PG_compound);
READ_NUMBER(PG_swapcache, PG_swapcache);
READ_NUMBER(PG_slab, PG_slab);
READ_NUMBER(PG_buddy, PG_buddy);
@@ -3126,6 +3157,9 @@ out:
if