> For performance reasons during system updates/reboots we do not erase
> memory content. The memory content is erased only on power cycle,
> which we do not do in production.
>
> Once we hot-remove the memory, we convert it back into DAXFS PMEM
> device, format it into EXT4, mount it as DAX file
On Wed 02-09-20 08:53:49, Pavel Tatashin wrote:
> On Wed, Sep 2, 2020 at 7:32 AM Michal Hocko wrote:
> >
> > On Wed 02-09-20 11:53:00, Vlastimil Babka wrote:
> > > >> > > Thread #2: ccs killer kthread
> > > >> > >css_killed_work_fn
> > > >> > > cgroup_mutex <- Grab this Mutex
> > > >> >
> > This is how we are using it at Microsoft: there is a very large
> > number of small memory machines (8G each) with low downtime
> > requirements (reboot must be under a second). There is also a large
> > state ~2G of memory that we need to transfer during reboot, otherwise
> > it is very
On Wed 02-09-20 08:51:06, Pavel Tatashin wrote:
> > > > Thread #1: memory hot-remove systemd service
> > > > Loops indefinitely, because if there is something still to be migrated
> > > > this loop never terminates. However, this loop can be terminated via
> > > > signal from systemd after
On Wed 02-09-20 08:42:13, Pavel Tatashin wrote:
> > > Am 02.09.2020 um 11:53 schrieb Vlastimil Babka :
> > >
> > > On 8/28/20 6:47 PM, Pavel Tatashin wrote:
> > >> There appears to be another problem that is related to the
> > >> cgroup_mutex -> mem_hotplug_lock deadlock described above.
> > >>
>
On Wed, Sep 2, 2020 at 7:32 AM Michal Hocko wrote:
>
> On Wed 02-09-20 11:53:00, Vlastimil Babka wrote:
> > >> > > Thread #2: ccs killer kthread
> > >> > >css_killed_work_fn
> > >> > > cgroup_mutex <- Grab this Mutex
> > >> > > mem_cgroup_css_offline
> > >> > >
> > > Thread #1: memory hot-remove systemd service
> > > Loops indefinitely, because if there is something still to be migrated
> > > this loop never terminates. However, this loop can be terminated via
> > > signal from systemd after timeout.
> > > __offline_pages()
> > > do {
> > >
> > Am 02.09.2020 um 11:53 schrieb Vlastimil Babka :
> >
> > On 8/28/20 6:47 PM, Pavel Tatashin wrote:
> >> There appears to be another problem that is related to the
> >> cgroup_mutex -> mem_hotplug_lock deadlock described above.
> >>
> >> In the original deadlock that I described, the
> I am on an old codebase that already has the fix that you are proposing,
> so I might be seeing someother issue which I will debug further.
>
> So looks like the loop in __offline_pages() had a call to
> drain_all_pages() before it was removed by
>
> c52e75935f8d: mm: remove extra drain pages on
On Wed 02-09-20 11:53:00, Vlastimil Babka wrote:
> >> > > Thread #2: ccs killer kthread
> >> > >css_killed_work_fn
> >> > > cgroup_mutex <- Grab this Mutex
> >> > > mem_cgroup_css_offline
> >> > >memcg_offline_kmem.part
> >> > > memcg_deactivate_kmem_caches
> >> >
On Wed 02-09-20 11:53:00, Vlastimil Babka wrote:
> On 8/28/20 6:47 PM, Pavel Tatashin wrote:
> > There appears to be another problem that is related to the
> > cgroup_mutex -> mem_hotplug_lock deadlock described above.
> >
> > In the original deadlock that I described, the workaround is to
> >
> Am 02.09.2020 um 11:53 schrieb Vlastimil Babka :
>
> On 8/28/20 6:47 PM, Pavel Tatashin wrote:
>> There appears to be another problem that is related to the
>> cgroup_mutex -> mem_hotplug_lock deadlock described above.
>>
>> In the original deadlock that I described, the workaround is to
>>
On 8/28/20 6:47 PM, Pavel Tatashin wrote:
> There appears to be another problem that is related to the
> cgroup_mutex -> mem_hotplug_lock deadlock described above.
>
> In the original deadlock that I described, the workaround is to
> replace crash dump from piping to Linux traditional save to
On Tue, Sep 01, 2020 at 08:52:05AM -0400, Pavel Tatashin wrote:
> On Tue, Sep 1, 2020 at 1:28 AM Bharata B Rao wrote:
> >
> > On Fri, Aug 28, 2020 at 12:47:03PM -0400, Pavel Tatashin wrote:
> > > There appears to be another problem that is related to the
> > > cgroup_mutex -> mem_hotplug_lock
On Tue, Sep 1, 2020 at 1:28 AM Bharata B Rao wrote:
>
> On Fri, Aug 28, 2020 at 12:47:03PM -0400, Pavel Tatashin wrote:
> > There appears to be another problem that is related to the
> > cgroup_mutex -> mem_hotplug_lock deadlock described above.
> >
> > In the original deadlock that I described,
On Fri, Aug 28, 2020 at 12:47:03PM -0400, Pavel Tatashin wrote:
> There appears to be another problem that is related to the
> cgroup_mutex -> mem_hotplug_lock deadlock described above.
>
> In the original deadlock that I described, the workaround is to
> replace crash dump from piping to Linux
There appears to be another problem that is related to the
cgroup_mutex -> mem_hotplug_lock deadlock described above.
In the original deadlock that I described, the workaround is to
replace crash dump from piping to Linux traditional save to files
method. However, after trying this workaround, I
On Wed, Aug 12, 2020 at 8:04 PM Roman Gushchin wrote:
>
> On Wed, Aug 12, 2020 at 07:16:08PM -0400, Pavel Tatashin wrote:
> > Guys,
> >
> > There is a convoluted deadlock that I just root caused, and that is
> > fixed by this work (at least based on my code inspection it appears to
> > be fixed);
On Wed, Aug 12, 2020 at 07:16:08PM -0400, Pavel Tatashin wrote:
> Guys,
>
> There is a convoluted deadlock that I just root caused, and that is
> fixed by this work (at least based on my code inspection it appears to
> be fixed); but the deadlock exists in older and stable kernels, and I
> am not
BTW, I replied to a wrong version of this work. I intended to reply to
version 7:
https://lore.kernel.org/lkml/20200623174037.3951353-1-g...@fb.com/
Nevertheless, the problem is the same.
Thank you,
Pasha
On Wed, Aug 12, 2020 at 7:16 PM Pavel Tatashin
wrote:
>
> Guys,
>
> There is a convoluted
Guys,
There is a convoluted deadlock that I just root caused, and that is
fixed by this work (at least based on my code inspection it appears to
be fixed); but the deadlock exists in older and stable kernels, and I
am not sure whether to create a separate patch for it, or backport
this whole
21 matches
Mail list logo