Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Thu 30-11-23 20:04:59, Baoquan He wrote: > On 11/30/23 at 11:16am, Michal Hocko wrote: > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > [...] > > > Now, we are worried if there's risk if the CMA area is retaken into kdump > > > kernel as system RAM. E.g is it possible that 1st kernel's

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Thu 30-11-23 11:00:48, Baoquan He wrote: [...] > Now, we are worried if there's risk if the CMA area is retaken into kdump > kernel as system RAM. E.g is it possible that 1st kernel's ongoing RDMA > or DMA will interfere with kdump kernel's normal memory accessing? > Because kdump kernel

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Fri 08-12-23 09:55:39, Baoquan He wrote: > On 12/07/23 at 12:52pm, Michal Hocko wrote: > > On Thu 07-12-23 12:13:14, Philipp Rudo wrote: [...] > > > Thing is that users don't only want to reduce the memory usage but also > > > the downtime of kdump. In the end I'm afraid that "simply waiting"

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Thu 07-12-23 12:23:13, Baoquan He wrote: [...] > We can't guarantee how swift the DMA transfer could be in the cma, case, > it will be a venture. We can't guarantee this of course but AFAIK the DMA shouldn't take minutes, right? While not perfect, waiting for some time before jumping into the

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Wed 06-12-23 14:49:51, Michal Hocko wrote: > On Wed 06-12-23 12:08:05, Philipp Rudo wrote: [...] > > If I understand Documentation/core-api/pin_user_pages.rst correctly you > > missed case 1 Direct IO. In that case "short term" DMA is allowed for > > pages without FOLL_LONGTERM. Meaning that

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-07 Thread Baoquan He
On 12/07/23 at 09:55am, Michal Hocko wrote: > On Thu 07-12-23 12:23:13, Baoquan He wrote: > [...] > > We can't guarantee how swift the DMA transfer could be in the cma, case, > > it will be a venture. > > We can't guarantee this of course but AFAIK the DMA shouldn't take > minutes, right? While

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-07 Thread Baoquan He
On 12/07/23 at 12:52pm, Michal Hocko wrote: > On Thu 07-12-23 12:13:14, Philipp Rudo wrote: > > On Thu, 7 Dec 2023 09:55:20 +0100 > > Michal Hocko wrote: > > > > > On Thu 07-12-23 12:23:13, Baoquan He wrote: > > > [...] > > > > We can't guarantee how swift the DMA transfer could be in the cma,

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-07 Thread Michal Hocko
On Thu 07-12-23 12:13:14, Philipp Rudo wrote: > On Thu, 7 Dec 2023 09:55:20 +0100 > Michal Hocko wrote: > > > On Thu 07-12-23 12:23:13, Baoquan He wrote: > > [...] > > > We can't guarantee how swift the DMA transfer could be in the cma, case, > > > it will be a venture. > > > > We can't

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-07 Thread Philipp Rudo
On Wed, 6 Dec 2023 16:19:51 +0100 Michal Hocko wrote: > On Wed 06-12-23 14:49:51, Michal Hocko wrote: > > On Wed 06-12-23 12:08:05, Philipp Rudo wrote: > [...] > > > If I understand Documentation/core-api/pin_user_pages.rst correctly you > > > missed case 1 Direct IO. In that case "short term"

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-07 Thread Philipp Rudo
On Thu, 7 Dec 2023 09:55:20 +0100 Michal Hocko wrote: > On Thu 07-12-23 12:23:13, Baoquan He wrote: > [...] > > We can't guarantee how swift the DMA transfer could be in the cma, case, > > it will be a venture. > > We can't guarantee this of course but AFAIK the DMA shouldn't take > minutes,

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-06 Thread Baoquan He
On 12/06/23 at 04:19pm, Michal Hocko wrote: > On Wed 06-12-23 14:49:51, Michal Hocko wrote: > > On Wed 06-12-23 12:08:05, Philipp Rudo wrote: > [...] > > > If I understand Documentation/core-api/pin_user_pages.rst correctly you > > > missed case 1 Direct IO. In that case "short term" DMA is

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-06 Thread Michal Hocko
On Wed 06-12-23 12:08:05, Philipp Rudo wrote: > On Fri, 1 Dec 2023 17:59:02 +0100 > Michal Hocko wrote: > > > On Fri 01-12-23 16:51:13, Philipp Rudo wrote: > > > On Fri, 1 Dec 2023 12:55:52 +0100 > > > Michal Hocko wrote: > > > > > > > On Fri 01-12-23 12:33:53, Philipp Rudo wrote: > > > >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-06 Thread David Hildenbrand
On 06.12.23 12:08, Philipp Rudo wrote: On Fri, 1 Dec 2023 17:59:02 +0100 Michal Hocko wrote: On Fri 01-12-23 16:51:13, Philipp Rudo wrote: On Fri, 1 Dec 2023 12:55:52 +0100 Michal Hocko wrote: On Fri 01-12-23 12:33:53, Philipp Rudo wrote: [...] And yes, those are all what-if concerns

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-06 Thread Philipp Rudo
On Fri, 1 Dec 2023 17:59:02 +0100 Michal Hocko wrote: > On Fri 01-12-23 16:51:13, Philipp Rudo wrote: > > On Fri, 1 Dec 2023 12:55:52 +0100 > > Michal Hocko wrote: > > > > > On Fri 01-12-23 12:33:53, Philipp Rudo wrote: > > > [...] > > > > And yes, those are all what-if concerns but

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Michal Hocko
On Fri 01-12-23 16:51:13, Philipp Rudo wrote: > On Fri, 1 Dec 2023 12:55:52 +0100 > Michal Hocko wrote: > > > On Fri 01-12-23 12:33:53, Philipp Rudo wrote: > > [...] > > > And yes, those are all what-if concerns but unfortunately that is all > > > we have right now. > > > > Should theoretical

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Philipp Rudo
On Fri, 1 Dec 2023 12:55:52 +0100 Michal Hocko wrote: > On Fri 01-12-23 12:33:53, Philipp Rudo wrote: > [...] > > And yes, those are all what-if concerns but unfortunately that is all > > we have right now. > > Should theoretical concerns without an actual evidence (e.g. multiple > drivers

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Jiri Bohac
On Thu, Nov 30, 2023 at 12:01:36PM +0800, Baoquan He wrote: > On 11/29/23 at 11:51am, Jiri Bohac wrote: > > We get a lot of problems reported by partners testing kdump on > > their setups prior to release. But even if we tune the reserved > > size up, OOM is still the most common reason for kdump

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Michal Hocko
On Fri 01-12-23 12:33:53, Philipp Rudo wrote: [...] > And yes, those are all what-if concerns but unfortunately that is all > we have right now. Should theoretical concerns without an actual evidence (e.g. multiple drivers known to be broken) become a roadblock for this otherwise useful feature?

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Philipp Rudo
Hi Jiri, I'd really love to see something like this to work. Although I also share the concerns about shitty device drivers corrupting the CMA. Please see my other mail for that. Anyway, one more comment below. On Fri, 24 Nov 2023 20:54:36 +0100 Jiri Bohac wrote: [...] > Now, specifying >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Philipp Rudo
Hi Michal, On Thu, 30 Nov 2023 14:41:12 +0100 Michal Hocko wrote: > On Thu 30-11-23 20:31:44, Baoquan He wrote: > [...] > > > > which doesn't use the proper pinning API (which would migrate away from > > > > the CMA) then what is the worst case? We will get crash kernel corrupted > > > >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Michal Hocko
On Fri 01-12-23 08:54:20, Pingfan Liu wrote: [...] > > I am not aware of any method to detect a driver is going to configure a > > RDMA. > > > > If there is a pattern, scripts/coccinelle may give some help. But I am > not sure about that. I am not aware of any pattern. > > > If this can be

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Pingfan Liu
On Thu, Nov 30, 2023 at 9:43 PM Michal Hocko wrote: > > On Thu 30-11-23 21:33:04, Pingfan Liu wrote: > > On Thu, Nov 30, 2023 at 9:29 PM Michal Hocko wrote: > > > > > > On Thu 30-11-23 20:04:59, Baoquan He wrote: > > > > On 11/30/23 at 11:16am, Michal Hocko wrote: > > > > > On Thu 30-11-23

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Michal Hocko
On Thu 30-11-23 20:31:44, Baoquan He wrote: [...] > > > which doesn't use the proper pinning API (which would migrate away from > > > the CMA) then what is the worst case? We will get crash kernel corrupted > > > potentially and fail to take a proper kernel crash, right? Is this > > > worrisome?

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Michal Hocko
On Thu 30-11-23 21:33:04, Pingfan Liu wrote: > On Thu, Nov 30, 2023 at 9:29 PM Michal Hocko wrote: > > > > On Thu 30-11-23 20:04:59, Baoquan He wrote: > > > On 11/30/23 at 11:16am, Michal Hocko wrote: > > > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > > > [...] > > > > > Now, we are worried

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Pingfan Liu
On Thu, Nov 30, 2023 at 9:29 PM Michal Hocko wrote: > > On Thu 30-11-23 20:04:59, Baoquan He wrote: > > On 11/30/23 at 11:16am, Michal Hocko wrote: > > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > > [...] > > > > Now, we are worried if there's risk if the CMA area is retaken into > > > >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Baoquan He
Hi Michal, On 11/30/23 at 08:04pm, Baoquan He wrote: > On 11/30/23 at 11:16am, Michal Hocko wrote: > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > [...] > > > Now, we are worried if there's risk if the CMA area is retaken into kdump > > > kernel as system RAM. E.g is it possible that 1st

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Baoquan He
On 11/30/23 at 11:16am, Michal Hocko wrote: > On Thu 30-11-23 11:00:48, Baoquan He wrote: > [...] > > Now, we are worried if there's risk if the CMA area is retaken into kdump > > kernel as system RAM. E.g is it possible that 1st kernel's ongoing RDMA > > or DMA will interfere with kdump kernel's

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Baoquan He
On 11/29/23 at 11:51am, Jiri Bohac wrote: > Hi Baoquan, > > thanks for your interest... > > On Wed, Nov 29, 2023 at 03:57:59PM +0800, Baoquan He wrote: > > On 11/28/23 at 10:08am, Michal Hocko wrote: > > > On Tue 28-11-23 10:11:31, Baoquan He wrote: > > > > On 11/28/23 at 09:12am, Tao Liu wrote:

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Baoquan He
On 11/29/23 at 10:03am, Donald Dutile wrote: > Baoquan, > hi! > > On 11/29/23 3:10 AM, Baoquan He wrote: > > On 11/28/23 at 10:08am, Michal Hocko wrote: > > > On Tue 28-11-23 10:11:31, Baoquan He wrote: > > > > On 11/28/23 at 09:12am, Tao Liu wrote: > > > [...] > > > > Thanks for the effort to

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Baoquan He
On 11/29/23 at 10:25am, Michal Hocko wrote: > On Wed 29-11-23 15:57:59, Baoquan He wrote: > [...] > > Hmm, Redhat could go in a different way. We have been trying to: > > 1) customize initrd for kdump kernel specifically, e.g exclude unneeded > > devices's driver to save memory; > > 2) monitor

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Donald Dutile
Baoquan, hi! On 11/29/23 3:10 AM, Baoquan He wrote: On 11/28/23 at 10:08am, Michal Hocko wrote: On Tue 28-11-23 10:11:31, Baoquan He wrote: On 11/28/23 at 09:12am, Tao Liu wrote: [...] Thanks for the effort to bring this up, Jiri. I am wondering how you will use this crashkernel=,cma

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Jiri Bohac
Hi Baoquan, thanks for your interest... On Wed, Nov 29, 2023 at 03:57:59PM +0800, Baoquan He wrote: > On 11/28/23 at 10:08am, Michal Hocko wrote: > > On Tue 28-11-23 10:11:31, Baoquan He wrote: > > > On 11/28/23 at 09:12am, Tao Liu wrote: > > [...] > > > Thanks for the effort to bring this up,

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Michal Hocko
On Wed 29-11-23 15:57:59, Baoquan He wrote: [...] > Hmm, Redhat could go in a different way. We have been trying to: > 1) customize initrd for kdump kernel specifically, e.g exclude unneeded > devices's driver to save memory; > 2) monitor device and kenrel memory usage if they begin to consume

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Baoquan He
On 11/28/23 at 10:08am, Michal Hocko wrote: > On Tue 28-11-23 10:11:31, Baoquan He wrote: > > On 11/28/23 at 09:12am, Tao Liu wrote: > [...] > > Thanks for the effort to bring this up, Jiri. > > > > I am wondering how you will use this crashkernel=,cma parameter. I mean > > the scenario of

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Baoquan He
On 11/28/23 at 10:08am, Michal Hocko wrote: > On Tue 28-11-23 10:11:31, Baoquan He wrote: > > On 11/28/23 at 09:12am, Tao Liu wrote: > [...] > > Thanks for the effort to bring this up, Jiri. > > > > I am wondering how you will use this crashkernel=,cma parameter. I mean > > the scenario of

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-28 Thread Michal Hocko
On Tue 28-11-23 10:11:31, Baoquan He wrote: > On 11/28/23 at 09:12am, Tao Liu wrote: [...] > Thanks for the effort to bring this up, Jiri. > > I am wondering how you will use this crashkernel=,cma parameter. I mean > the scenario of crashkernel=,cma. Asking this because I don't know how > SUSE

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-28 Thread Michal Hocko
On Tue 28-11-23 10:07:08, Pingfan Liu wrote: > On Sun, Nov 26, 2023 at 5:24 AM Jiri Bohac wrote: > > > > Hi Tao, > > > > On Sat, Nov 25, 2023 at 09:51:54AM +0800, Tao Liu wrote: > > > Thanks for the idea of using CMA as part of memory for the 2nd kernel. > > > However I have a question: > > > > >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-27 Thread Baoquan He
On 11/28/23 at 09:12am, Tao Liu wrote: > Hi Jiri, > > On Sun, Nov 26, 2023 at 5:22 AM Jiri Bohac wrote: > > > > Hi Tao, > > > > On Sat, Nov 25, 2023 at 09:51:54AM +0800, Tao Liu wrote: > > > Thanks for the idea of using CMA as part of memory for the 2nd kernel. > > > However I have a question: >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-27 Thread Pingfan Liu
On Sun, Nov 26, 2023 at 5:24 AM Jiri Bohac wrote: > > Hi Tao, > > On Sat, Nov 25, 2023 at 09:51:54AM +0800, Tao Liu wrote: > > Thanks for the idea of using CMA as part of memory for the 2nd kernel. > > However I have a question: > > > > What if there is on-going DMA/RDMA access on the CMA range

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-27 Thread Tao Liu
Hi Jiri, On Sun, Nov 26, 2023 at 5:22 AM Jiri Bohac wrote: > > Hi Tao, > > On Sat, Nov 25, 2023 at 09:51:54AM +0800, Tao Liu wrote: > > Thanks for the idea of using CMA as part of memory for the 2nd kernel. > > However I have a question: > > > > What if there is on-going DMA/RDMA access on the

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-25 Thread Jiri Bohac
Hi Tao, On Sat, Nov 25, 2023 at 09:51:54AM +0800, Tao Liu wrote: > Thanks for the idea of using CMA as part of memory for the 2nd kernel. > However I have a question: > > What if there is on-going DMA/RDMA access on the CMA range when 1st > kernel crash? There might be data corruption when 2nd

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-24 Thread Tao Liu
Hi Jiri, On Sat, Nov 25, 2023 at 3:55 AM Jiri Bohac wrote: > > Hi, > > this series implements a new way to reserve additional crash kernel > memory using CMA. > > Currently, all the memory for the crash kernel is not usable by > the 1st (production) kernel. It is also unmapped so that it can't >

[PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-24 Thread Jiri Bohac
Hi, this series implements a new way to reserve additional crash kernel memory using CMA. Currently, all the memory for the crash kernel is not usable by the 1st (production) kernel. It is also unmapped so that it can't be corrupted by the fault that will eventually trigger the crash. This makes