Re: [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script

2021-03-16 Thread Maxim Levitsky
On Tue, 2021-03-16 at 14:38 +0100, Jan Kiszka wrote:
> On 15.03.21 23:10, Maxim Levitsky wrote:
> > Fix several issues that are present in lx-symbols script:
> > 
> > * Track module unloads by placing another software breakpoint at 
> > 'free_module'
> >   (force uninline this symbol just in case), and use remove-symbol-file
> >   gdb command to unload the symobls of the module that is unloading.
> > 
> >   That gives the gdb a chance to mark all software breakpoints from
> >   this module as pending again.
> >   Also remove the module from the 'known' module list once it is unloaded.
> > 
> > * Since we now track module unload, we don't need to reload all
> >   symbols anymore when 'known' module loaded again (that can't happen 
> > anymore).
> >   This allows reloading a module in the debugged kernel to finish much 
> > faster,
> >   while lx-symbols tracks module loads and unloads.
> > 
> > * Disable/enable all gdb breakpoints on both module load and unload 
> > breakpoint
> >   hits, and not only in 'load_all_symbols' as was done before.
> >   (load_all_symbols is no longer called on breakpoint hit)
> >   That allows gdb to avoid getting confused about the state of the (now two)
> >   internal breakpoints we place.
> > 
> >   Otherwise it will leave them in the kernel code segment, when continuing
> >   which triggers a guest kernel panic as soon as it skips over the 'int3'
> >   instruction and executes the garbage tail of the optcode on which
> >   the breakpoint was placed.
> > 
> > Signed-off-by: Maxim Levitsky 
> > ---
> >  kernel/module.c  |   8 ++-
> >  scripts/gdb/linux/symbols.py | 106 +--
> >  2 files changed, 83 insertions(+), 31 deletions(-)
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 30479355ab850..ea81fc06ea1f5 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
> >  }
> >  EXPORT_SYMBOL(module_refcount);
> >  
> > -/* This exists whether we can unload or not */
> > -static void free_module(struct module *mod);
> > +/* This exists whether we can unload or not
> > + * Keep it uninlined to provide a reliable breakpoint target,
> > + * e.g. for the gdb helper command 'lx-symbols'.
> > + */
> > +
> > +static noinline void free_module(struct module *mod);
> >  
> >  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
> > unsigned int, flags)
> > diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> > index 1be9763cf8bb2..4ce879548a1ae 100644
> > --- a/scripts/gdb/linux/symbols.py
> > +++ b/scripts/gdb/linux/symbols.py
> > @@ -17,6 +17,24 @@ import re
> >  
> >  from linux import modules, utils
> >  
> > +def save_state():
> 
> Naming is a bit too generic. And it's not only saving the state, it's
> also disabling things.
> 
> > +breakpoints = []
> > +if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
> > +for bp in gdb.breakpoints():
> > +breakpoints.append({'breakpoint': bp, 'enabled': 
> > bp.enabled})
> > +bp.enabled = False
> > +
> > +show_pagination = gdb.execute("show pagination", to_string=True)
> > +pagination = show_pagination.endswith("on.\n")
> > +gdb.execute("set pagination off")
> > +
> > +return {"breakpoints":breakpoints, "show_pagination": 
> > show_pagination}
> > +
> > +def load_state(state):
> 
> Maybe rather something with "restore", to make naming balanced. Or is
> there a use case where "state" is not coming from the function above?

I didn't put much thought into naming these functions. 
I'll think of something better.

> 
> > +for breakpoint in state["breakpoints"]:
> > +breakpoint['breakpoint'].enabled = breakpoint['enabled']
> > +gdb.execute("set pagination %s" % ("on" if state["show_pagination"] 
> > else "off"))
> > +
> >  
> >  if hasattr(gdb, 'Breakpoint'):
> >  class LoadModuleBreakpoint(gdb.Breakpoint):
> > @@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
> >  module_name = module['name'].string()
> >  cmd = self.gdb_command
> >  
> > +# module already loaded, false alarm
> > +if module_name in cmd.loaded_modules:
> > +return False
> 
> Possibly at all, now that we track unloading? Can our state tracking
> become out-of-sync?

Sadly yes and that happens a lot unless kvm is patched to
avoid injecting interrupts on a single step.
 
What is happening is that this breakpoint is hit, then symbols
are loaded which is a relatively slow process, and by the time
the gdb resumes the guest, a timer interrupt is already pending
(kernel local apic timer is running while vcpus are stopped),
which makes the guest kernel take the interrupt (interrupts are
not disabled in these two intercepted functions) and eventually
return to the breakpoint, and trigger its python handler again.

This happens so often that 

Re: [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script

2021-03-16 Thread Jan Kiszka
On 15.03.21 23:10, Maxim Levitsky wrote:
> Fix several issues that are present in lx-symbols script:
> 
> * Track module unloads by placing another software breakpoint at 'free_module'
>   (force uninline this symbol just in case), and use remove-symbol-file
>   gdb command to unload the symobls of the module that is unloading.
> 
>   That gives the gdb a chance to mark all software breakpoints from
>   this module as pending again.
>   Also remove the module from the 'known' module list once it is unloaded.
> 
> * Since we now track module unload, we don't need to reload all
>   symbols anymore when 'known' module loaded again (that can't happen 
> anymore).
>   This allows reloading a module in the debugged kernel to finish much faster,
>   while lx-symbols tracks module loads and unloads.
> 
> * Disable/enable all gdb breakpoints on both module load and unload breakpoint
>   hits, and not only in 'load_all_symbols' as was done before.
>   (load_all_symbols is no longer called on breakpoint hit)
>   That allows gdb to avoid getting confused about the state of the (now two)
>   internal breakpoints we place.
> 
>   Otherwise it will leave them in the kernel code segment, when continuing
>   which triggers a guest kernel panic as soon as it skips over the 'int3'
>   instruction and executes the garbage tail of the optcode on which
>   the breakpoint was placed.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  kernel/module.c  |   8 ++-
>  scripts/gdb/linux/symbols.py | 106 +--
>  2 files changed, 83 insertions(+), 31 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 30479355ab850..ea81fc06ea1f5 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
>  }
>  EXPORT_SYMBOL(module_refcount);
>  
> -/* This exists whether we can unload or not */
> -static void free_module(struct module *mod);
> +/* This exists whether we can unload or not
> + * Keep it uninlined to provide a reliable breakpoint target,
> + * e.g. for the gdb helper command 'lx-symbols'.
> + */
> +
> +static noinline void free_module(struct module *mod);
>  
>  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>   unsigned int, flags)
> diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> index 1be9763cf8bb2..4ce879548a1ae 100644
> --- a/scripts/gdb/linux/symbols.py
> +++ b/scripts/gdb/linux/symbols.py
> @@ -17,6 +17,24 @@ import re
>  
>  from linux import modules, utils
>  
> +def save_state():

Naming is a bit too generic. And it's not only saving the state, it's
also disabling things.

> +breakpoints = []
> +if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
> +for bp in gdb.breakpoints():
> +breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
> +bp.enabled = False
> +
> +show_pagination = gdb.execute("show pagination", to_string=True)
> +pagination = show_pagination.endswith("on.\n")
> +gdb.execute("set pagination off")
> +
> +return {"breakpoints":breakpoints, "show_pagination": 
> show_pagination}
> +
> +def load_state(state):

Maybe rather something with "restore", to make naming balanced. Or is
there a use case where "state" is not coming from the function above?

> +for breakpoint in state["breakpoints"]:
> +breakpoint['breakpoint'].enabled = breakpoint['enabled']
> +gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else 
> "off"))
> +
>  
>  if hasattr(gdb, 'Breakpoint'):
>  class LoadModuleBreakpoint(gdb.Breakpoint):
> @@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
>  module_name = module['name'].string()
>  cmd = self.gdb_command
>  
> +# module already loaded, false alarm
> +if module_name in cmd.loaded_modules:
> +return False

Possibly at all, now that we track unloading? Can our state tracking
become out-of-sync?

> +
>  # enforce update if object file is not found
>  cmd.module_files_updated = False
>  
>  # Disable pagination while reporting symbol (re-)loading.
>  # The console input is blocked in this context so that we would
>  # get stuck waiting for the user to acknowledge paged output.
> -show_pagination = gdb.execute("show pagination", to_string=True)
> -pagination = show_pagination.endswith("on.\n")
> -gdb.execute("set pagination off")
> +state = save_state()
> +cmd.load_module_symbols(module)
> +load_state(state)
> +return False
>  
> -if module_name in cmd.loaded_modules:
> -gdb.write("refreshing all symbols to reload module "
> -  "'{0}'\n".format(module_name))
> -cmd.load_all_symbols()
> -else:
> - 

[PATCH 1/3] scripts/gdb: rework lx-symbols gdb script

2021-03-15 Thread Maxim Levitsky
Fix several issues that are present in lx-symbols script:

* Track module unloads by placing another software breakpoint at 'free_module'
  (force uninline this symbol just in case), and use remove-symbol-file
  gdb command to unload the symobls of the module that is unloading.

  That gives the gdb a chance to mark all software breakpoints from
  this module as pending again.
  Also remove the module from the 'known' module list once it is unloaded.

* Since we now track module unload, we don't need to reload all
  symbols anymore when 'known' module loaded again (that can't happen anymore).
  This allows reloading a module in the debugged kernel to finish much faster,
  while lx-symbols tracks module loads and unloads.

* Disable/enable all gdb breakpoints on both module load and unload breakpoint
  hits, and not only in 'load_all_symbols' as was done before.
  (load_all_symbols is no longer called on breakpoint hit)
  That allows gdb to avoid getting confused about the state of the (now two)
  internal breakpoints we place.

  Otherwise it will leave them in the kernel code segment, when continuing
  which triggers a guest kernel panic as soon as it skips over the 'int3'
  instruction and executes the garbage tail of the optcode on which
  the breakpoint was placed.

Signed-off-by: Maxim Levitsky 
---
 kernel/module.c  |   8 ++-
 scripts/gdb/linux/symbols.py | 106 +--
 2 files changed, 83 insertions(+), 31 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 30479355ab850..ea81fc06ea1f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
 }
 EXPORT_SYMBOL(module_refcount);
 
-/* This exists whether we can unload or not */
-static void free_module(struct module *mod);
+/* This exists whether we can unload or not
+ * Keep it uninlined to provide a reliable breakpoint target,
+ * e.g. for the gdb helper command 'lx-symbols'.
+ */
+
+static noinline void free_module(struct module *mod);
 
 SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
unsigned int, flags)
diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb2..4ce879548a1ae 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -17,6 +17,24 @@ import re
 
 from linux import modules, utils
 
+def save_state():
+breakpoints = []
+if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+for bp in gdb.breakpoints():
+breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+bp.enabled = False
+
+show_pagination = gdb.execute("show pagination", to_string=True)
+pagination = show_pagination.endswith("on.\n")
+gdb.execute("set pagination off")
+
+return {"breakpoints":breakpoints, "show_pagination": show_pagination}
+
+def load_state(state):
+for breakpoint in state["breakpoints"]:
+breakpoint['breakpoint'].enabled = breakpoint['enabled']
+gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else 
"off"))
+
 
 if hasattr(gdb, 'Breakpoint'):
 class LoadModuleBreakpoint(gdb.Breakpoint):
@@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
 module_name = module['name'].string()
 cmd = self.gdb_command
 
+# module already loaded, false alarm
+if module_name in cmd.loaded_modules:
+return False
+
 # enforce update if object file is not found
 cmd.module_files_updated = False
 
 # Disable pagination while reporting symbol (re-)loading.
 # The console input is blocked in this context so that we would
 # get stuck waiting for the user to acknowledge paged output.
-show_pagination = gdb.execute("show pagination", to_string=True)
-pagination = show_pagination.endswith("on.\n")
-gdb.execute("set pagination off")
+state = save_state()
+cmd.load_module_symbols(module)
+load_state(state)
+return False
 
-if module_name in cmd.loaded_modules:
-gdb.write("refreshing all symbols to reload module "
-  "'{0}'\n".format(module_name))
-cmd.load_all_symbols()
-else:
-cmd.load_module_symbols(module)
+class UnLoadModuleBreakpoint(gdb.Breakpoint):
+def __init__(self, spec, gdb_command):
+super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
+self.silent = True
+self.gdb_command = gdb_command
+
+def stop(self):
+module = gdb.parse_and_eval("mod")
+module_name = module['name'].string()
+cmd = self.gdb_command
 
-# restore pagination state
-gdb.execute("set pagination %s" % ("on" if pagination else "off"))
+