Re: [Crash-utility] [PATCH 0/5] crash: Support zram on x86_64 for recent fedora kernel

2020-05-28 Thread Dave Anderson



- Original Message -
> This patch set implements support zram on x86_64 for recent fedora
> kernel, tested on kernel-5.6.7-200.fc31.x86_64.
> 
> I actually have other patch set for RHEL7 and RHEL8 kernels as well,
> but I want to do them later because the change for it is quite large
> and I think it necessary to first refactor zram code, e.g. by moving
> the zram code in diskdump.c into zram.c.
> 
> HATAYAMA Daisuke (5):
>   mm/zram: introduce MAX_POSSIBLE_PHYSMEM_BITS
>   zram/swap cache: treat xarray case
>   zram: fix wrongly recognizing lzo-rle as lzo
>   zram: try loading debuginfo for zram when needed
>   zram: fix failure invalid structure member offset: zram_table_flag
> 
>  defs.h |  7 ++-
>  diskdump.c | 23 +--
>  memory.c   |  4 
>  3 files changed, 27 insertions(+), 7 deletions(-)

Hi Daisuke,

This all looks good -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/3b0ab4634de89ff23ace1d4172a20d429be606b0

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Crash issues with ARM aarch64 on a live system

2020-05-21 Thread Dave Anderson



- Original Message -
> On Wed, May 20, 2020 at 10:19 PM Siamak Nazari 
> wrote:
> >
> > Apologies if this the wrong place to ask this question. If in fact this is
> > the wrong alias to post this question, appreciate a pointer in the right
> > direction. We have been trying to get crash working on a live kernel
> > (aarch64) but have not been able to.
> >
> > Based on the crash help text:
> >
> > --machdep option=value
> > Pass an option and value pair to machine-dependent code.  These
> > architecture-specific option/pairs should only be required in
> > very rare circumstances:
> >
> > ARM64:
> >   phys_offset=
> >   kimage_voffset=
> >   max_physmem_bits=
> >   vabits_actual=
> >
> > We probably need to specify the proper values for phys_offset and
> > kimage_voffset. kimage_offset is in /proc/kallsyms, but not phys_offset.
> > It is just not clear where the values should come from. I did try using
> > the value of kimage_offset  found in /proc/kallsyms but that did not help.
> > It is not clear where one needs to get the values from (and in fact a
> > casual reading of the crash source seems to try to compute the values
> > automatically)
> >
> > Here is some additional information:
> >
> > # uname -a
> > Linux bringup 5.1.0 #1 SMP Fri Oct 11 17:01:55 UTC 2019 aarch64 GNU/Linux
> >
> > # crash -s
> > crash: /proc/kcore: No such file or directory
> 
> Seem like your kernel is not configured with CONFIG_PROC_KCORE (/proc/kcore 
> support).
> With /proc/kcore you don't need to supply any additional options &
> values to crash.

Santosh is correct, with /proc/kcore things should just work.

On your system, it then tries /dev/mem, which needs help, including phys_offset.
You might try getting it from /proc/iomem, where on my arm64 system, phys_offset
is 0x40:

  $ cat /proc/iomem
  ...
  7e89-7e890fff : APMC0D60:02
  7e8d-7e8d0fff : APMC0D60:03
  7e94-7e940fff : APMC0D5E:00
  40-40001f : reserved
  400020-43fa59 : System RAM
400028-40012e : Kernel code
40012f-40017f : reserved
400180-4002cc : Kernel data
40dfe0-40e3df : reserved
40e3e0-40ffdf : Crash kernel
40-40 : reserved
41909d-41919c : reserved
43909d-4393ff : reserved
43f875-43f875 : reserved
  ...

Try adding that on the command line.

Dave


> 
> > WARNING: could not find MAGIC_START!
> > WARNING: cannot read linux_banner string
> > crash: /boot/vmlinux-5.1.0 and /dev/mem do not match!
> >
> > This is with additional debugging turned on:
> >
> > # ./crash --version
> >
> > crash 7.2.8++
> > Copyright (C) 2002-2020  Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> > Copyright (C) 1999-2006  Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> > Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011  NEC Corporation
> > Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions.  Enter "help copying" to see the conditions.
> > This program has absolutely no warranty.  Enter "help warranty" for
> > details.
> >
> > GNU gdb (GDB) 7.6
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > 
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "aarch64-unknown-linux-gnu".
> >
> > Here is crash running with -d flag:
> >
> > find_booted_kernel: check: /boot/vmlinux-5.1.0
> > find_booted_kernel: found: /boot/vmlinux-5.1.0
> > get_live_memory_source: /dev/mem
> > /proc/version:
> > Linux version 5.1.0 (oe-user@oe-host) (gcc version 7.3.0 (GCC)) #1 SMP Fri
> > Oct 11 17:01:55 UTC 2019
> > /boot/vmlinux-5.1.0:
> > Linux version 5.1.0 (oe-user@oe-host) (gcc version 7.3.0 (GCC)) #1 SMP Fri
> > Oct 11 17:01:55 UTC 2019
> > crash: /proc/kcore: No such file or directory
> > readmem: read_dev_mem() -> /dev/mem
> > VA_BITS: 48
> > kimage_voffset: fffe8800
> > phys_offset: 0
> > physvirt_offset: 8000
> > gdb --quiet /boot/vmlinux-5.1.0
> > GETBUF(328 -> 0)fffeffc00fe0
> >   GETBUF(1500 -> 1)
> >   FREEBUF(1)
> > FREEBUF(0)
> >  > d82d4630>
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  > (ROE), e6a6d5a0>
> > 
> > WARNING: could not find MAGIC_START!
> > GETBUF(328 -> 0)
> > FREEBUF(0)
> > GETBUF(8 -> 0)
> > 
> > 
> > cpu_possible_mask: cpus: (none)
> > 
> > 
> >  cpu_present_mask: cpus: (none)
> > 
> > 
> >   cpu_online_mask: cpus: (none)
> > 
> > 
> >   cpu_active_mask: cpus: (none)
> 

Re: [Crash-utility] log -T option should not be support after system suspend

2020-05-07 Thread Dave Anderson


- Original Message -
> 赵乾利 wrote on Thu, May 07, 2020:
> > This change does not take into account the system sleep,once system
> > suspend this translation will make error,printk timestamp and jiffies
> > won't be update during suspend,and system suspend is a common
> > feature,so i think change is a bug.
> 
> This is how the regular unix command `dmesg -T` works, so I think it's
> worth having as is: timestamp will be mostly correct until the first
> sleep and then off by sleep amount.
> 
> This option isn't reliable anyway (drift depends on system but it's not
> unusual to be off by a few minutes on most systems with more than a week
> of uptime -- it drifts faster when cpu clock varies often), and it's not
> like there's any harm in this.. At most print a warning that times after
> sleep are wrong if you want to.
> 
> This is obviously just my opinion but I think for tools like crash, if a
> user wants to shoot themselves in the foot, we should let them to... I'm
> always annoyed when system tools know better and I need to waste time
> patching them to bypass checks...

I agree completely with Dominique.  Your patch displays "log: -T option not 
supported"
and bails out, which is arguably more misleading than just showing the 
timestamps.

The dmesg man page shows this:

   -T, --ctime
  Print human readable timestamps.  The timestamp could be 
inaccurate!

  The time source used for the logs is not updated after system 
SUSPEND/RESUME.

This kind of warning could be added to the log command's help page, or could
be displayed as a preface to the dump of the log.  Or things could be just left
as they are.

> 
> PS. Didn't want to send a mail "just" for it but thank you for all these
> years Dave, hope you keep in touch a bit when you feel bored :)
> And congratulation? to new maintainers!
> --
> Dominique

Thanks Dominique, I will still be watching the list from my personal email.

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] [ANNOUNCE] My retirement, and crash utility maintainership changes

2020-05-07 Thread Dave Anderson



- Original Message -
> Dave,
> 
> > Initially Kazuhito will primarily be handling upstream github duties,
> > while Lianbo and Bhupesh will be handling Fedora, CentOS stream, and
> > RHEL maintenance.  All three will be involved in the acceptance of
> > patches posted on this mailing list.  Please welcome them in their
> > new roles; I am confident they will do a terrific job.
> 
> Maybe, is it better to send patch set via github as PR from now on? I'm now 
> writing
> zram patch set for x86-64 support.

Hi Daisuke,

Good question -- and one that I shall defer the answer to the new maintainers.  

Personally, I never accepted git pull requests because I always felt that
it was more valuable to expose proposed patches to the larger audience
that make up this mailing list.  So when PRs came in, I coerced the
submitter to use the list.

> 
> > Since the https://people.redhat.com/anderson web page will be
> > decommissioned after my departure, its contents have been moved
> > to be co-located with the crash-utility github site:
> > 
> >   https://crash-utility.github.io
> 
> Do you plan to set a redirection from the current page? Rpm packages for
> crash's extension modules have URL in each rpm information  like below and 
> need to
> modify it accordingly:
> 
> # yum info crash-gcore-command
> Loaded plugins: langpacks, product-id, search-disabled-repos, 
> subscription-manager
> Available Packages
> Name: crash-gcore-command
> Arch: x86_64
> Version : 1.3.1
> Release : 0.el7
> Size: 41 k
> Repo: rhel-7-server-rpms/7Server/x86_64
> Summary : Gcore extension module for the crash utility
> URL : 
> http://people.redhat.com/anderson/extensions/crash-gcore-command-1.3.1.tar.gz
> License : GPLv2
> Description : Command for creating a core dump file of a user-space task 
> that
> : was running in a kernel dumpfile.

Yes, when the packages are updated, the URLs will have to be changed.  During 
the build
procedure, the package verification will fail if the upstream URL is defunct.

> 
> Also, when I want to release a new version of crash extnesion module, I send 
> it to Hagio-san via
> this mailing list as in the past, and then Hagio-san modifies the "crash 
> extension modules" page.
> Is this understanding correct?

Yes, and then Kazu, Liang or Bhupesh will be able to update that page.

> 
> > I want to express my appreciation to all of you who have contributed
> > patches, both bug fixes and new features, and most importantly, to the
> > support from users who have kept the crash utility alive for over 20 years
> > now.  It has been my great pleasure to have had the chance to work with
> > such an extraordinary international cast of characters.
> > 
> > I will still be lurking as a regular subscriber to this list, at my
> > home address: ander...@prospeed.net
> > 
> > It will be fun to watch what happens...
> > 
> > Best wishes to all of you in these trying times,
> 
> Thanks for your work on crash utility. This is essential for our daily 
> support jobs.
> 
> Personally, I met crash utility relatively early when I entered this field 
> and have
> learnt a lot thorough using, reading and writing crash utility and lots of 
> reviews
> by you this 10 years.

I can't thank you enough Daisuke.  Over the years you have been one of my most
dependable and valued contributors.  Your finger prints are all over the crash
utility!

Best Regards,
  Dave




> 
> Thanks.
> HATAYAMA, Daisuke
> 
> 
> 
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [ANNOUNCE] My retirement, and crash utility maintainership changes

2020-05-04 Thread Dave Anderson



- Original Message -
> On Mon, 4 May 2020 10:10:52 -0400 (EDT)
> Dave Anderson  wrote:
> 
> > I'd like to take the opportunity to announce my retirement from
> > Red Hat, and from the workforce in general.  My last day at Red Hat
> > will be May 29th, 2020.
> 
> First, Dave, thank you very much for all the years maintaining this
> extremely useful tool. You will be missed!
> 
> > Accordingly, I will be relinquishing my role as maintainer of the
> > crash utility.  My replacement will be made up of three
> > co-maintainers:
> > 
> >   Kazuhito Hagio   k-hagio...@nec.com
> >   Lianbo Jiang liji...@redhat.com
> >   Bhupesh Sharma   bhsha...@redhat.com
> 
> Good luck to you.
> 
> Just curious, is there some sort of hierarchy, or is this a virtual
> team of people with equal rights and duties?

They will have equal rights.

But duty-wise, as I indicated, Kazu will handle the upstream github 
chores, while Lianbo and Bhupesh will take care of Fedora, CentOS stream
and RHEL maintenance.  But with that being said, Kazu will rely upon
Lianbo and Bhupesh for multiple-architecture access at Red Hat that he
does not have available at NEC.  And while it's still under discussion,
they have so far agreed that patches posted to the mailing list will
require at least 2 ACKs from the 3 of them.

Thanks,
  Dave 


> 
> Petr Tesarik
> SUSE HW Enablement Team
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



[Crash-utility] [ANNOUNCE] My retirement, and crash utility maintainership changes

2020-05-04 Thread Dave Anderson


I'd like to take the opportunity to announce my retirement from
Red Hat, and from the workforce in general.  My last day at Red Hat 
will be May 29th, 2020.

Accordingly, I will be relinquishing my role as maintainer of the crash
utility.  My replacement will be made up of three co-maintainers:

  Kazuhito Hagio   k-hagio...@nec.com
  Lianbo Jiang liji...@redhat.com
  Bhupesh Sharma   bhsha...@redhat.com

As you may be aware, Kazuhito will be performing double-duty, as he
is also the upstream maintainer of the makedumpfile facility; he has
been a significant contributor to the crash utility.  Lianbo and Bhupesh
have both been working extensively in the kexec/kdump area, both in 
the kernel and the user-space utilities.

Initially Kazuhito will primarily be handling upstream github duties, 
while Lianbo and Bhupesh will be handling Fedora, CentOS stream, and
RHEL maintenance.  All three will be involved in the acceptance of
patches posted on this mailing list.  Please welcome them in their
new roles; I am confident they will do a terrific job.

Since the https://people.redhat.com/anderson web page will be 
decommissioned after my departure, its contents have been moved 
to be co-located with the crash-utility github site:

  https://crash-utility.github.io

I want to express my appreciation to all of you who have contributed
patches, both bug fixes and new features, and most importantly, to the
support from users who have kept the crash utility alive for over 20 years
now.  It has been my great pleasure to have had the chance to work with
such an extraordinary international cast of characters.

I will still be lurking as a regular subscriber to this list, at my
home address: ander...@prospeed.net 

It will be fun to watch what happens...

Best wishes to all of you in these trying times,

  Dave Anderson


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] p_* commands

2020-04-30 Thread Dave Anderson



- Original Message -
> Dear Dave,
> 
> RH knowledge base article shows nice p_* commands
> https://access.redhat.com/solutions/4490051
> 
> crash> p_super_block 9ad87724f000
> (struct super_block *)0x9ad87724f000  dm-15
> .s_type - (struct file_system_type *)0xc1050460 - 'xfs'
> .s_flags = 0x30810001 - RDONLY | POSIXACL | I_VERSION | [NOSEC] |
> [BORN]
> .s_blocksize - 4096
> .s_root - (struct dentry *)0x
> .s_bdev - (struct block_device *)0x9ab8431a8680
> .bd_inode - (struct inode *)0x9ab8431a8770
> 
> .bd_super - (struct super_block *)0x
> .bd_disk - (struct gendisk *)0x9ab878ace400  dm-15
> .bd_queue - (struct request_queue *)0x9ab779be1330
> .bd_mutex - (struct mutex *)0x0001 - not implemente
> 
> Do you know something about extension implemented this nice commend?
> Cold you please find its author and ask him to publish it?
> 
> Thank you,
>   Vasily Averin

I don't have a subscription to see that page, so I don't know for sure, 
but I think that it's a pykdump script running with the mpykdump 
extension module.

Dave  
 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] crash help: using list for traversing pages through page.next

2020-04-29 Thread Dave Anderson



- Original Message -
> Hi all,
> 
> Recently I found an slub related problem(kmalloc-1024 objects eat tons of
> memory):
> # ./slabinfo -l | grep "t-0001024"
> :t-000102446448117102447.5G 4074/505/116080974 0   0
> 99 *
> 
> Then I make the memroy coredump trying to find out why, it seems something 
> wrong with
> the slub's cpu_slab(kmem_cache_cpu) list. But as I want to walk through the 
> page list
> like following, crash seems can't identify:
> 
> crash> struct page.next,pages,pobjects 0xea0049e85140
>next = 0xea0014533c40
>pages = 13511054
>pobjects = -545576266
> crash> list page.next 0xea0049e85140
> list: invalid argument: page.next
> 
> While my usage case is similar to the case in the 'help list' man page:
>  crash> p file_systems
>  file_systems = $1 = (struct file_system_type *) 0xc03adc90
>  crash> list file_system_type.next -s file_system_type.name,fs_flags
>  c03adc90
> 
> 
> So what is wrong, is there a way that I can walk through the page list?

I'm not sure -- I can't reproduce the problem:
  
  crash> kmem -p | grep mapped
  fc97c41cc840 107321000 9d5d65a12c88   52  1 17c0020036 
referenced,uptodate,lru,active,mappedtodisk
  fc97c41ce4c0 107393000 9d5d65a12c880  1 17c0020036 
referenced,uptodate,lru,active,mappedtodisk
  ^C
  crash> list page.next fc97c41cc840
  fc97c41cc840
  fc97d090fac8
  fc97c41cc848
  fc97d092a988
  fc97d08f6bc8
  fc97d090e608
  fc97d0923308
  fc97d09fd548
  fc97d090cd48
  fc97d09bc448
  fc97d0994208
  fc97d0994348
  ...

For some reason, the "page.next" argument is not being resolved by 
arg_to_datatype(),
here in tools.c, line 3454:

   3452 while (args[optind]) {
   3453 if (strstr(args[optind], ".") &&
   3454 arg_to_datatype(args[optind], sm, RETURN_ON_ERROR) 
> 1) {
   3455 if (ld->flags & LIST_OFFSET_ENTERED)
   3456 error(FATAL,
   3457"offset value %ld (0x%lx) already 
entered\n",
   3458 ld->member_offset, 
ld->member_offset);
   3459 ld->member_offset = sm->member_offset;
   3460 ld->flags |= LIST_OFFSET_ENTERED;
   3461 } else {

What happens if you just enter this:

  crash> page.next
  struct page {
 [8] struct page *next;
  }
  crash>

Or for that matter, what does "page -o" show?

You can always use the "-o offset" argument as a replacement for "page.next":

  crash> list -o 8 fc97c41cc840
  fc97c41cc840
  fc97d090fac8
  fc97c41cc848
  fc97d092a988
  fc97d08f6bc8
  fc97d090e608
  fc97d0923308
  fc97d09fd548
  fc97d090cd48
  fc97d09bc448
  fc97d0994208
  ...

Dave


 
> 
> thanks in advance~
> linfeng
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [External Mail][营销邮件] Re: ramdump support for va_bits_actual

2020-04-24 Thread Dave Anderson



- Original Message -
> On Fri, Apr 24, 2020 at 12:59 AM Dave Anderson  wrote:
> >
> >
> > Vinayak?
> >
> > - Original Message -
> > > Hi,Vinayakm
> > >
> > > I don't think it's necessary to judge physvirt_offset if empty in
> > > arm64_VTOP,because physvirt_offset always be initialized by
> > > arm64_calc_physvirt_offset,so machdep->machspec->physvirt_offset always
> > > true.even if older kernel are compatible.
> > >
> 
> Ya, that check is not really required. I have removed it. v3 attached.

Queued for crash-7.2.9:
  
  
https://github.com/crash-utility/crash/commit/339ddcd6f26fbd3519f50e96689645da867f6e0f

Thanks,
  Dave


> Thanks,
> Vinayak
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v3] Determine the ARM64 kernel's Pointer Authentication mask value by reading the new KERNELPACMASK vmcoreinfo entry.

2020-04-24 Thread Dave Anderson



- Original Message -
> Pointer authentication support is added in the recent versions of the arm64
> kernel. This basically add PAC bits to the top unused bits of the lr
> register in the stack to prevent ROP kind of attack.
> 
> However the presence of PAC bits fails to match with the correct symbol
> name. Hence a KERNELPACMASK field is added in the vmcoreinfo to help
> in masking out this PAC details.
> 
> This patch fetches the KERNELPACMASK info and use it to mask the PAC bits
> and generate correct backtrace and symbol name.
> (amit.kach...@arm.com)

Queued for crash-7.2.9:
   
  
https://github.com/crash-utility/crash/commit/41d61189d60e0fdd6509b96dc8160795263f3229

Thanks,
  Dave


> ---
> 
> Changes sice v2:
> * Removed PAC mask check from arm64_is_kernel_exception_frame function
> * More details in commit.
> 
> Changes since v1:
> * Moved PAC mask code from arm64_print_stackframe_entry to
>  arm64_unwind_frame.
> * PAC mask check on all kernel text during complete stack parsing
>   with bt -t  command.
> * dump_machdep_table now prints CONFIG_ARM64_KERNELPACMASK.
> 
> The kernel version for the corresponding vmcoreinfo entry is posted here[1].
> 
> [1]: https://lore.kernel.org/patchwork/patch/1211981/
> 
> 
>  arm64.c | 50 +-
>  defs.h  |  1 +
>  2 files changed, 42 insertions(+), 9 deletions(-)
> 
> diff --git a/arm64.c b/arm64.c
> index 7662d71..e0a5cf2 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -84,6 +84,7 @@ static int arm64_get_kvaddr_ranges(struct vaddr_range *);
>  static void arm64_get_crash_notes(void);
>  static void arm64_calc_VA_BITS(void);
>  static int arm64_is_uvaddr(ulong, struct task_context *);
> +static void arm64_calc_KERNELPACMASK(void);
>  
>  
>  /*
> @@ -213,6 +214,7 @@ arm64_init(int when)
>   machdep->pagemask = ~((ulonglong)machdep->pageoffset);
>  
>   arm64_calc_VA_BITS();
> + arm64_calc_KERNELPACMASK();
>   ms = machdep->machspec;
>   if (ms->VA_BITS_ACTUAL) {
>   ms->page_offset = ARM64_PAGE_OFFSET_ACTUAL;
> @@ -472,6 +474,7 @@ arm64_init(int when)
>   case LOG_ONLY:
>   machdep->machspec = _machine_specific;
>   arm64_calc_VA_BITS();
> + arm64_calc_KERNELPACMASK();
>   arm64_calc_phys_offset();
>   machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
>   break;
> @@ -659,6 +662,11 @@ arm64_dump_machdep_table(ulong arg)
>   fprintf(fp, "%ld\n", ms->VA_BITS_ACTUAL);
>   else
>   fprintf(fp, "(unused)\n");
> + fprintf(fp, "CONFIG_ARM64_KERNELPACMASK: ");
> + if (ms->CONFIG_ARM64_KERNELPACMASK)
> + fprintf(fp, "%lx\n", ms->CONFIG_ARM64_KERNELPACMASK);
> + else
> + fprintf(fp, "(unused)\n");
>   fprintf(fp, " userspace_top: %016lx\n", ms->userspace_top);
>   fprintf(fp, "   page_offset: %016lx\n", ms->page_offset);
>   fprintf(fp, "vmalloc_start_addr: %016lx\n", ms->vmalloc_start_addr);
> @@ -1774,13 +1782,14 @@ static int
>  arm64_is_kernel_exception_frame(struct bt_info *bt, ulong stkptr)
>  {
>  struct arm64_pt_regs *regs;
> + struct machine_specific *ms = machdep->machspec;
>  
>  regs = (struct arm64_pt_regs
>  *)>stackbuf[(ulong)(STACK_OFFSET_TYPE(stkptr))];
>  
>   if (INSTACK(regs->sp, bt) && INSTACK(regs->regs[29], bt) &&
>   !(regs->pstate & (0xULL | PSR_MODE32_BIT)) &&
>   is_kernel_text(regs->pc) &&
> - is_kernel_text(regs->regs[30])) {
> + is_kernel_text(regs->regs[30] | ms->CONFIG_ARM64_KERNELPACMASK)) {
>   switch (regs->pstate & PSR_MODE_MASK)
>   {
>   case PSR_MODE_EL1t:
> @@ -1924,6 +1933,7 @@ arm64_print_stackframe_entry(struct bt_info *bt, int
> level, struct arm64_stackfr
>   * See, for example, "bl schedule" before ret_to_user().
>   */
>   branch_pc = frame->pc - 4;
> +
>  name = closest_symbol(branch_pc);
>  name_plus_offset = NULL;
>  
> @@ -2135,7 +2145,7 @@ arm64_unwind_frame(struct bt_info *bt, struct
> arm64_stackframe *frame)
>   unsigned long stack_mask;
>   unsigned long irq_stack_ptr, orig_sp;
>   struct arm64_pt_regs *ptregs;
> - struct machine_specific *ms;
> + struct machine_specific *ms = machdep->machspec;
>  
>   stack_mask = (unsigned long)(ARM64_STACK_SIZE) - 1;
>   fp = frame->fp;
> @@ -2149,6 +2159,8 @@ arm64_unwind_frame(struct bt_info *bt, struct
> arm64_stackframe *frame)
>   frame->sp = fp + 0x10;
>   frame->fp = GET_STACK_ULONG(fp);
>   frame->pc = GET_STACK_ULONG(fp + 8);
> + if (is_kernel_text(frame->pc | ms->CONFIG_ARM64_KERNELPACMASK))
> + frame->pc |= ms->CONFIG_ARM64_KERNELPACMASK;
>  
>   if ((frame->fp == 0) && (frame->pc == 0))
>   return FALSE;
> @@ -2200,7 +2212,6 @@ 

Re: [Crash-utility] new printk ringbuffer interface

2020-04-24 Thread Dave Anderson


- Original Message -
> On 2020-04-23, HAGIO KAZUHITO(萩尾 一仁)  wrote:
> >> Should all struct sizes and field offsets be exported? It
> >> would look something like this:
> >>
> >> VMCOREINFO_SYMBOL(prb);
> >>
> >> VMCOREINFO_STRUCT_SIZE(printk_ringbuffer);
> >> VMCOREINFO_OFFSET(printk_ringbuffer, desc_ring);
> >> VMCOREINFO_OFFSET(printk_ringbuffer, text_data_ring);
> >> VMCOREINFO_OFFSET(printk_ringbuffer, dict_data_ring);
> >> VMCOREINFO_OFFSET(printk_ringbuffer, fail);
> >>
> >> VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
> >> VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
> >> VMCOREINFO_OFFSET(prb_desc_ring, descs);
> >> VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> >> VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
> >>
> >> VMCOREINFO_STRUCT_SIZE(prb_desc);
> >> VMCOREINFO_OFFSET(prb_desc, info);
> >> VMCOREINFO_OFFSET(prb_desc, state_var);
> >> VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
> >> VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
> >>
> >> VMCOREINFO_STRUCT_SIZE(prb_data_blk_lpos);
> >> VMCOREINFO_OFFSET(prb_data_blk_lpos, begin);
> >> VMCOREINFO_OFFSET(prb_data_blk_lpos, next);
> >>
> >> VMCOREINFO_STRUCT_SIZE(printk_info);
> >> VMCOREINFO_OFFSET(printk_info, seq);
> >> VMCOREINFO_OFFSET(printk_info, ts_nsec);
> >> VMCOREINFO_OFFSET(printk_info, text_len);
> >> VMCOREINFO_OFFSET(printk_info, dict_len);
> >> VMCOREINFO_OFFSET(printk_info, caller_id);
> >>
> >> VMCOREINFO_STRUCT_SIZE(prb_data_ring);
> >> VMCOREINFO_OFFSET(prb_data_ring, size_bits);
> >> VMCOREINFO_OFFSET(prb_data_ring, data);
> >> VMCOREINFO_OFFSET(prb_data_ring, head_id);
> >> VMCOREINFO_OFFSET(prb_data_ring, tail_id);
> >
> > If there is no efficient way, we will need all of the entries in
> > VMCOREINFO.
> 
> It seems like a lot to export everything, but I don't have a problem
> with it. If we decide to export everything (which I expect we will need
> to do), then I would change my crash(8) implementation to also rely only
> on the VMCOREINFO. I see no point in having some implementations using
> debug data and other implementations using VMCOREINFO data, if
> VMCOREINFO has everything that is needed.

Please don't -- the crash utility supports ~15 different dumpfile
formats, the majority of which do *not* contain VMCOREINFO data.

For that reason, I try to avoid using VMCOREINFO data whenever possible,
precisely because the relevant data can be gathered from the vmlinux symbol
table and debuginfo data.

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] [External Mail][营销邮件] Re: ramdump support for va_bits_actual

2020-04-23 Thread Dave Anderson

Vinayak?

- Original Message -
> Hi,Vinayakm
> 
> I don't think it's necessary to judge physvirt_offset if empty in
> arm64_VTOP,because physvirt_offset always be initialized by
> arm64_calc_physvirt_offset,so machdep->machspec->physvirt_offset always
> true.even if older kernel are compatible.
> 
> so,how about the following change?
> 
> @@ -1148,8 +1155,7 @@ arm64_VTOP(ulong addr)
> }
> 
> if (addr >= machdep->machspec->page_offset)
> -   return machdep->machspec->phys_offset
> -   + (addr - machdep->machspec->page_offset);
> +   return (addr + machdep->machspec->physvirt_offset);
> 
> 
> From: crash-utility-boun...@redhat.com  on
> behalf of vinayak menon 
> Sent: Tuesday, April 21, 2020 18:01
> To: Discussion list for crash utility usage,maintenance and development
> Subject: [External Mail][营销邮件] Re: [Crash-utility] ramdump support for
> va_bits_actual
> 
> Hi Dave, zhaoqianli
> 
> > Yeah, that looks reasonable.  But what about the parallel discussion re:
> > vmemmap_start?
> >
> >   https://www.redhat.com/archives/crash-utility/2020-April/msg00064.html
> 
> I have picked up the vmemmap_start as 4th patch. The physvirt_offset
> based VTOP is already
> part of patchset 1. I hope I have not missed anything from those
> recommended by Zhaogianli.
> 
> >
> > Can you send in an updated patch set with all fixes applied?
> 
> PFA.
> 
> Thanks,
> Vinayak
> #/**本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> This e-mail and its attachments contain confidential information from
> XIAOMI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!**/#
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] new printk ringbuffer interface

2020-04-23 Thread Dave Anderson



- Original Message -
> ccing kexec list, vmcore-dmesg also uses vmcoreinfo related to printk..
> 
> > -Original Message-
> > 
> > - Original Message -
> > > Hello Dave,
> > >
> > > You may or may not be aware that we are working on replacing [0] the
> > > Linux printk ringbuffer. Rather than a single buffer containing a single
> > > struct type, the new ringbuffer makes use of several different structs.
> > 
> > Yes, I am most definitely aware...
> > 
> > >
> > > I am writing to ask your advice about how this should be exported for
> > > the crash utility. Should all struct sizes and field offsets be
> > > exported? It would look something like this:
> > >
> > > VMCOREINFO_SYMBOL(prb);
> > >
> > > VMCOREINFO_STRUCT_SIZE(printk_ringbuffer);
> > > VMCOREINFO_OFFSET(printk_ringbuffer, desc_ring);
> > > VMCOREINFO_OFFSET(printk_ringbuffer, text_data_ring);
> > > VMCOREINFO_OFFSET(printk_ringbuffer, dict_data_ring);
> > > VMCOREINFO_OFFSET(printk_ringbuffer, fail);
> > >
> > > VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
> > > VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
> > > VMCOREINFO_OFFSET(prb_desc_ring, descs);
> > > VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> > > VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
> > >
> > > VMCOREINFO_STRUCT_SIZE(prb_desc);
> > > VMCOREINFO_OFFSET(prb_desc, info);
> > > VMCOREINFO_OFFSET(prb_desc, state_var);
> > > VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
> > > VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
> > >
> > > VMCOREINFO_STRUCT_SIZE(prb_data_blk_lpos);
> > > VMCOREINFO_OFFSET(prb_data_blk_lpos, begin);
> > > VMCOREINFO_OFFSET(prb_data_blk_lpos, next);
> > >
> > > VMCOREINFO_STRUCT_SIZE(printk_info);
> > > VMCOREINFO_OFFSET(printk_info, seq);
> > > VMCOREINFO_OFFSET(printk_info, ts_nsec);
> > > VMCOREINFO_OFFSET(printk_info, text_len);
> > > VMCOREINFO_OFFSET(printk_info, dict_len);
> > > VMCOREINFO_OFFSET(printk_info, caller_id);
> > >
> > > VMCOREINFO_STRUCT_SIZE(prb_data_ring);
> > > VMCOREINFO_OFFSET(prb_data_ring, size_bits);
> > > VMCOREINFO_OFFSET(prb_data_ring, data);
> > > VMCOREINFO_OFFSET(prb_data_ring, head_id);
> > > VMCOREINFO_OFFSET(prb_data_ring, tail_id);
> > >
> > > Or would it be enough to just recognize the new "prb" symbol and have
> > > all the structures defined in the crash utility? If the latter is
> > > preferred, should some sort of version number be exported? Or is the
> > > kernel version number enough?
> 
> first I don't think we can depend on the kernel version because distribution
> kernels backport upstream patches.  So we will need a version number of the
> ringbuffer if we choose that way.

With respect to the kernel version, you are absolutely correct, as we've been
burnt by that before with backports to distribution and the upstream longterm
kernels.  But I think John was talking about exporting a printk-structure-set
version number, so I think we're on the same page.

Also, if we go the version-number route, there would still not be a requirement
for the crash utility to duplicate the kernel data structures in its sources.
As John's proof-of-concept patch showed, it can still use the traditional
manner of getting structure sizes and member offsets.  With the version number
exported, there may have to be a few small adjustments in the 
MEMBER_OFFSET_INIT()
calls, but it would be fairly straight-forward to maintain.

But of course makedumpfile would have to replicate the kernel data structures.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v2] Determine the ARM64 kernel's Pointer Authentication mask value by reading the new KERNELPACMASK vmcoreinfo entry.

2020-04-23 Thread Dave Anderson


- Original Message -
...
>
> A small correction, top bytes are included in KERNELPACMASK but that is
> configurable. Anyway so when autiasp(authentication) instruction fails
> then all obfuscated value is cleared and a error bit pattern is added
> only in top byte.
> As mentioned earlier armv8.6 enhanced PAC will not add bit pattern to
> denote failure but will cause illegal instruction fault with an
> exception class and hence pc will not have extra details. This is work
> in progress so the current crash utility changes should work fine.
 
Just to be clear then, your v2 patch set should be OK to check in -- except 
for this call to is_kernel_text():
   
> And then when trying to determine whether the current stack pointer is
> pointing to an in-kernel exception frame, the possible regs->pc and 
regs[30]
> values are both transformed with the mask, so it seems that both of them
> will have been obfuscated by the processor when creating the frame on
> the stack:
>
>static int
>arm64_is_kernel_exception_frame(struct bt_info *bt, ulong stkptr)
>{
>struct arm64_pt_regs *regs;
>struct machine_specific *ms = machdep->machspec;
>
>regs = (struct arm64_pt_regs 
*)>stackbuf[(ulong)(STACK_OFFSET_TYPE(stkptr))];
>
>if (INSTACK(regs->sp, bt) && INSTACK(regs->regs[29], bt) &&
>!(regs->pstate & (0xULL | PSR_MODE32_BIT)) 
&&
> > is_kernel_text(regs->pc | ms->CONFIG_ARM64_KERNELPACMASK) &&

Yes good catch. Masking can be removed from here.

Can you please confirm?

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v2] Determine the ARM64 kernel's Pointer Authentication mask value by reading the new KERNELPACMASK vmcoreinfo entry.

2020-04-22 Thread Dave Anderson
Hi Amit,

Two more questions below...

- Original Message -

> > But here's where I'm confused: when an in-kernel exception frame occurs, 
> > and the
> > processor lays down the full register set on the stack, are both the PC and 
> > LR (regs[30])
> > text values written on the stack as obfuscated values?
> > 
> 
> In arm64 case arch/arm64/include/asm/kexec.h + crash_setup_regs()
> function sets up the kernel exception frame. As can be seen PC does not
> have obfuscated (PAC) values but LR can be obfuscated.

Ok, so that's when it's setting up the registers for a kexec/kdump operation.

But what about exceptions that occur during the normal course of events, such as
when an interrupt or page fault occurs?

> > ...
> >
> > When it gathers the starting hooks for non-active tasks, it does this:
> > 
> >static int
> >arm64_get_stackframe(struct bt_info *bt, struct arm64_stackframe 
> > *frame)crash_setup_regs
> >{
> >if (!fill_task_struct(bt->task))
> >return FALSE;
> >
> >frame->sp = ULONG(tt->task_struct + 
> > OFFSET(task_struct_thread_context_sp));
> >frame->pc = ULONG(tt->task_struct + 
> > OFFSET(task_struct_thread_context_pc));
> >frame->fp = ULONG(tt->task_struct + 
> > OFFSET(task_struct_thread_context_fp));
> >
> >return TRUE;
> >}
> >
> > When a task is put to sleep, is the PC text address in the task's 
> > thread_struct.cpu_context
> > obfuscated?

And again, what happens in this case?

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [RFC PATCH 0/1] support lockless printk ringbuffer

2020-04-21 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> I created a proof-of-concept patch to work with the new printk
> ringbuffer (as it is currently being proposed). I create a separate
> source file (printk.c) because of all the helper functions.

That's fine.

> 
> The code doesn't do much error checking if symbols were missing,
> and it probably doesn't work unless the machine running crash(8)
> has the same endian and pointer-size as the crashed machine. But
> otherwise, it does work correctly.

Not a problem.  With respect to both endian-ness and pointer-size, the
crash utility binary must match that of the dumpfile's kernel. 

> The most important part I wanted to have implemented was the new
> logic for record traversal and printing. Being one of the authors
> for the new printk ringbuffer, implementing this was far easier
> for me than for someone unfamiliar with the ringbuffer internals.
> 
> It is using the new "prb" symbol. I did not add VMCOREINFO
> support.
> 
> Note that this is based on the PATCHv2 that I have queued for
> posting to LKML, but as of right now have not yet posted.
> Basically I am waiting for feedback from Kazuhito regarding my
> VMCOREINFO query. (It will not work with previous iterations
> of the new ringbuffer because the struct names have changed.)

Right -- I'm interested in what he has to say.

> 
> I don't expect you to take the patch as-is, but I hope it can
> provide some positive ground work for moving forward.

It looks pretty good for starters.  Damn good!

And I have to say how much I appreciate the initiative you've
taken to help us out here.  Usually kernel developers are either 
(1) unaware of how their changes affect the crash utility 
and/or makedumpfile, or (2) don't give a shit. 

Thanks,
  Dave
  

 
> John Ogness (1):
>   crash: printk: add support for lockless ringbuffer
> 
>  Makefile |   5 +
>  defs.h   |  24 +
>  kernel.c |   8 +-
>  printk.c | 298 +++
>  4 files changed, 334 insertions(+), 1 deletion(-)
>  create mode 100644 printk.c
> 
> --
> 2.20.1
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v2] Determine the ARM64 kernel's Pointer Authentication mask value by reading the new KERNELPACMASK vmcoreinfo entry.

2020-04-21 Thread Dave Anderson


- Original Message -
> This value is used to mask the PAC bits and generate correct backtrace
> and symbol name.
> (amit.kach...@arm.com)
> ---
> Changes since v1:
> * Moved PAC mask code from arm64_print_stackframe_entry to
>  arm64_unwind_frame.
> * PAC mask check on all kernel text during complete stack parsing
>   with bt -t  command.
> * dump_machdep_table now prints CONFIG_ARM64_KERNELPACMASK.
> 
> The kernel version for the corresponding vmcoreinfo entry is posted here[1].
> 
> [1]: https://lore.kernel.org/patchwork/patch/1211981/

Hi Amit,

I'm still a bit confused here -- please help me out...  

For the lack of a better term, in the following discussion, I'm going to refer
to the text values without the KERNELPACMASK applied as "obfuscated".

Now, as I understand it, when a running function calls another function and 
leaves its text return address on the stack, the processor obfuscates the text
return value before it pushes it on the stack.  

But here's where I'm confused: when an in-kernel exception frame occurs, and the
processor lays down the full register set on the stack, are both the PC and LR 
(regs[30])
text values written on the stack as obfuscated values?

Here's why I'm asking...

When looking for the starting stack hooks in a dumpfile, your patch takes the 
PC from
the in-kernel exception frame unmodified:
  
  static int
  arm64_get_dumpfile_stackframe(struct bt_info *bt, struct arm64_stackframe 
*frame)
  {
  struct machine_specific *ms = machdep->machspec;
  struct arm64_pt_regs *ptregs;
  
  if (!ms->panic_task_regs ||
  (!ms->panic_task_regs[bt->tc->processor].sp &&
   !ms->panic_task_regs[bt->tc->processor].pc)) {
  bt->flags |= BT_REGS_NOT_FOUND;
  return FALSE;
  }
  
  ptregs = >panic_task_regs[bt->tc->processor];
===>  frame->pc = ptregs->pc;
  ...

That PC value comes from an exception frame.  Will that ptregs->pc value 
be obfuscated?

When it gathers the starting hooks for non-active tasks, it does this:

  static int
  arm64_get_stackframe(struct bt_info *bt, struct arm64_stackframe *frame)
  {
  if (!fill_task_struct(bt->task))
  return FALSE;
  
  frame->sp = ULONG(tt->task_struct + 
OFFSET(task_struct_thread_context_sp));
  frame->pc = ULONG(tt->task_struct + 
OFFSET(task_struct_thread_context_pc));
  frame->fp = ULONG(tt->task_struct + 
OFFSET(task_struct_thread_context_fp));
  
  return TRUE;
  }
  
When a task is put to sleep, is the PC text address in the task's 
thread_struct.cpu_context
obfuscated?

And then when trying to determine whether the current stack pointer is
pointing to an in-kernel exception frame, the possible regs->pc and regs[30]
values are both transformed with the mask, so it seems that both of them
will have been obfuscated by the processor when creating the frame on
the stack:
  
  static int
  arm64_is_kernel_exception_frame(struct bt_info *bt, ulong stkptr)
  {
  struct arm64_pt_regs *regs;
  struct machine_specific *ms = machdep->machspec;
  
  regs = (struct arm64_pt_regs 
*)>stackbuf[(ulong)(STACK_OFFSET_TYPE(stkptr))];
  
  if (INSTACK(regs->sp, bt) && INSTACK(regs->regs[29], bt) &&
  !(regs->pstate & (0xULL | PSR_MODE32_BIT)) &&
> is_kernel_text(regs->pc | ms->CONFIG_ARM64_KERNELPACMASK) &&
> is_kernel_text(regs->regs[30] | ms->CONFIG_ARM64_KERNELPACMASK)) {
  switch (regs->pstate & PSR_MODE_MASK)
  {
  case PSR_MODE_EL1t:
  case PSR_MODE_EL1h:
  case PSR_MODE_EL2t:
  case PSR_MODE_EL2h:
  return TRUE;
  }
  }
  
  return FALSE;
  }
  

But here when when displaying an exception frame, the LR is masked
if it "make sense", but the unmodified PC value is checked without
transforming it:

  static void
  arm64_print_exception_frame(struct bt_info *bt, ulong pt_regs, int mode, FILE 
*ofp)
  {
  ...
  LR = regs->regs[30];
===>  if (is_kernel_text (LR | ms->CONFIG_ARM64_KERNELPACMASK))
  LR |= ms->CONFIG_ARM64_KERNELPACMASK;
  ...

  case KERNEL_MODE:
  fprintf(ofp, " PC: %016lx  ", (ulong)regs->pc);
===>  if (is_kernel_text(regs->pc) &&
  (sp = value_search(regs->pc, ))) {
  fprintf(ofp, "[%s", sp->name);
  if (offset)
  fprintf(ofp, (*gdb_output_radix == 16) ?
  "+0x%lx" : "+%ld",
  offset);
  fprintf(ofp, "]\n");
  } else
  fprintf(ofp, "[unknown or invalid address]\n");
  
  fprintf(ofp, "

Re: [Crash-utility] Strip llvm text symbol name ending

2020-04-21 Thread Dave Anderson



- Original Message -
> 
> 
> Hi
> 
> I found an issue when the kernel build by LLVM,many symbol missing,such as
> irq_desc_tree,causes irq -s command execution failed ,so i add llvm to
> strip_symbol_end:

Looks good -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/bb9feba5fae788508152ab10aff094179b6b660f

BTW, sorry, I mistakenly left an HTML character reference in the git commit 
message.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v2] add log -T option to display the message text with human readable timestamp

2020-04-21 Thread Dave Anderson



- Original Message -
> Sometimes, we need to know the accurate time of the log, which
> helps us analyze the problem.
> 
> add -T option(like dmesg -T command) for log command to display
> the message text with human readable timestamp.
> 
> Signed-off-by: Wang Long 

Hi Wang, 

This is a nice feature.  Note that I did add a stanza for 
backwards-compatibility 
with Linux 3.4 and earlier kernels that didn't have variable length records:

@@ -4962,6 +4974,8 @@ dump_log(int msg_flags)
return;
}
 
+   if (msg_flags & SHOW_LOG_CTIME)
+   option_not_supported('T');
if (msg_flags & SHOW_LOG_DICT)
option_not_supported('d');
if ((msg_flags & SHOW_LOG_TEXT) && STREQ(pc->curcmd, "log"))

Queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/c86250bce29f17610647772f838e1bb9d622ea8c

Thanks,
  Dave











> ---
>  defs.h   |  2 ++
>  help.c   | 28 +++-
>  kernel.c | 23 +--
>  3 files changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/defs.h b/defs.h
> index d8eda5e..4e57a56 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -763,6 +763,7 @@ struct kernel_table {   /* kernel data */
>   } vmcoreinfo;
>   ulonglong flags2;
>   char *source_tree;
> + struct timespec boot_date;
>  };
>  
>  /*
> @@ -5577,6 +5578,7 @@ void dump_log(int);
>  #define SHOW_LOG_DICT  (0x2)
>  #define SHOW_LOG_TEXT  (0x4)
>  #define SHOW_LOG_AUDIT (0x8)
> +#define SHOW_LOG_CTIME (0x10)
>  void set_cpu(int);
>  void clear_machdep_cache(void);
>  struct stack_hook *gather_text_list(struct bt_info *);
> diff --git a/help.c b/help.c
> index c443cad..1ee70f7 100644
> --- a/help.c
> +++ b/help.c
> @@ -3892,12 +3892,13 @@ NULL
>  char *help_log[] = {
>  "log",
>  "dump system message buffer",
> -"[-tdma]",
> +"[-Ttdma]",
>  "  This command dumps the kernel log_buf contents in chronological order.
>  The",
>  "  command supports the older log_buf formats, which may or may not contain
>  a",
>  "  timestamp inserted prior to each message, as well as the newer
>  variable-length",
>  "  record format, where the timestamp is contained in each log entry's
>  header.",
>  "  ",
> +"-T  Display the message text with human readable timestamp.",
>  "-t  Display the message text without the timestamp; only applicable to
>  the",
>  "variable-length record format.",
>  "-d  Display the dictionary of key/value pair properties that are
>  optionally",
> @@ -4031,6 +4032,31 @@ char *help_log[] = {
>  "type=1307 audit(1489384479.809:4346):  cwd=\"/proc\"",
>  "...",
>  " ",
> +"  Display the message text with human readable timestamp.\n"
> +"%s> log -T",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0x-0x0009fbff] usable",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0x0009fc00-0x0009] reserved",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0x000f-0x000f] reserved",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0x0010-0xdffe] usable",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0xdfff-0xdfff] ACPI data",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0xfec0-0xfec00fff] reserved",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0xfee0-0xfee00fff] reserved",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0xfffc-0x] reserved",
> +"[Sat Apr  4 07:41:09 2020] BIOS-e820: [mem
> 0x0001-0x00011fff] usable",
> +"[Sat Apr  4 07:41:09 2020] NX (Execute Disable) protection: active",
> +"[Sat Apr  4 07:41:09 2020] SMBIOS 2.5 present.",
> +"[Sat Apr  4 07:41:09 2020] DMI: innotek GmbH VirtualBox/VirtualBox,
> BIOS VirtualBox 12/01/2006",
> +"[Sat Apr  4 07:41:09 2020] Hypervisor detected: KVM",
> +"[Sat Apr  4 07:41:09 2020] kvm-clock: Using msrs 4b564d01 and
> 4b564d00",
> +"[Sat Apr  4 07:41:09 2020] kvm-clock: cpu 0, msr 6de01001, primary cpu
> clock",
> +"[Sat Apr  4 07:41:09 2020] kvm-clock: using sched offset of 11838753697
> cycles",
> +"[Sat Apr  4 07:41:09 2020] clocksource: kvm-clock: mask:
> 0x max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns",
> +"[Sat Apr  4 07:41:09 2020] e820: update [mem 0x-0x0fff]
> usable ==> reserved",
> +"[Sat Apr  4 07:41:09 2020] e820: remove [mem 0x000a-0x000f]
> usable",
> +"[Sat Apr  4 07:41:09 2020] last_pfn = 0x12 max_arch_pfn =
> 0x4",
> +"[Sat Apr  4 07:41:09 2020] MTRR default type: uncachable",
> +"[Sat Apr  4 07:41:09 2020] MTRR variable ranges disabled:",
> +"...",
>  NULL
>  };
>  
> diff --git a/kernel.c b/kernel.c
> index 7604fac..7e68e6d 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -4912,9 +4912,12 @@ cmd_log(void)
>  
>   msg_flags = 0;
>  

Re: [Crash-utility] [营销邮件] Re: [营销邮件] Re: [营销邮件] Re: [营销邮件] Re: [External Mail][????] Re: ramdump support for va_bits_actual

2020-04-20 Thread Dave Anderson


- Original Message -
> It's just a risk,i found this risk when i try to fix crash-utility launch
> error with arm64 in 5.4.
> i made the fix patch the almost same as Vinayak's,As a supplement, I make
> these two suggestion(vmemmap_start _offset).
> If the advice is reasonable, you can take it

Let's wait for Vinayak to consolidate everything in his v2 patch update.  When 
he
posts it, please review and give your ACK/NAK.

Thanks,
  Dave


> 
> 
> From: crash-utility-boun...@redhat.com  on
> behalf of Dave Anderson 
> Sent: Monday, April 20, 2020 22:54
> To: Discussion list for crash utility usage,maintenance and development
> Subject: [营销邮件] Re: [Crash-utility] [营销邮件] Re:  [营销邮件] Re:  [营销邮件] Re:
> [External Mail][] Re: ramdump support for   va_bits_actual
> 
> - Original Message -
> > In fact,vmemmap  not easy to calculated in crash-utility,if
> > CONFIG_RANDOMIZE_BASE is configured,memstart_addr  will be changed since
> > below codes:
> > [arm64_memblock_init]
> > 348 vmemmap = ((struct page *)VMEMMAP_START - (memstart_addr >>
> > PAGE_SHIFT));
> > ...
> > 413 if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
> > 414 extern u16 memstart_offset_seed;
> > 415 u64 range = linear_region_size -
> > 416 (memblock_end_of_DRAM() -
> > memblock_start_of_DRAM());
> > 417
> > 418 /*
> > 419  * If the size of the linear region exceeds, by a
> > sufficient
> > 420  * margin, the size of the region that the available
> > physical
> > 421  * memory spans, randomize the linear region as well.
> > 422  */
> > 423 if (memstart_offset_seed > 0 && range >=
> > ARM64_MEMSTART_ALIGN) {
> > 424 range /= ARM64_MEMSTART_ALIGN;
> > 425 memstart_addr -= ARM64_MEMSTART_ALIGN *
> > 426  ((range *
> > memstart_offset_seed) >> 16);
> > 427 }
> > 428 }
> 
> OK.
> 
> >
> > the reason i  showed the "address_markers " is just to prove vmemmap and
> > ms->vmemmap_start is wrong.we'd better to do below change.
> > -   vmemmap_start = (-vmemmap_size);
> > +   vmemmap_start = (-vmemmap_size - MEGABYTES(2));
> 

> 
> 
> > 
> >$ crash vmlinux vmcore
> > 
> >crash 7.2.9rc13
> >Copyright (C) 2002-2020  Red Hat, Inc.
> >Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> >Copyright (C) 1999-2006  Hewlett-Packard Co
> >Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> >Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> >Copyright (C) 2005, 2011  NEC Corporation
> >Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> >Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> >This program is free software, covered by the GNU General Public
> >License,
> >and you are welcome to change it and/or distribute copies of it under
> >certain conditions.  Enter "help copying" to see the conditions.
> >This program has absolutely no warranty.  Enter "help warranty" for
> >details.
> >   
> >GNU gdb (GDB) 7.6
> >Copyright (C) 2013 Free Software Foundation, Inc.
> >License GPLv3+: GNU GPL version 3 or later
> ><http://gnu.org/licenses/gpl.html>
> >This is free software: you are free to change and redistribute it.
> >There is NO WARRANTY, to the extent permitted by law.  Type "show
> >copying"
> >and "show warranty" for details.
> >This GDB was configured as "x86_64-unknown-linux-gnu"...
> > 
> >WARNING: kernel relocated [950MB]: patching 94929 gdb minimal_symbol
> >values
> > 
> >crash: start patch...
> > 
> > 
> > 
> > - Original Message -
> >>
> >>
> >> - Original Message -
> >>> Sometimes, we need to know the accurate time of the log, which
> >>> helps us analyze the problem.
> >>>
> >>> add -T option(like dmesg -T command) for log command to display
> >>> the message text with human readable timestamp.
> >>>
> >>> Signed-off-by: Wang Long 
> >>
> >> Did you attempt this patch on a live system?  Because your patch to
> >> kernel

Re: [Crash-utility] [PATCH] add log -T option to display the message text with human readable timestamp

2020-04-20 Thread Dave Anderson



- Original Message -
> 
> 
> On 20/4/2020 3:48 am, Dave Anderson wrote:
> > 
> > FWIW, I tried it on another RHEL7 machine running live,
> > but then also on a RHEL8 kernel dumpfile, and they all hang:
> 
> I apply this patch on RHEL7 virtual machine(VirtualBbox) and it work ok.
> and on a RHEL7 kernel dumpfile, I found it hang after I send the patch.
> 
> 
> and I debug it and found the  machdep->hz == 0 on the following:
> 
>get_uptime(NULL, _jiffies);
>uptime_sec = (uptime_jiffies)/(ulonglong)machdep->hz;
>kt->boot_date.tv_sec = kt->date.tv_sec - uptime_sec;
>kt->boot_date.tv_nsec = 0;
> 
> because machdep-> hz has not been initialized here.  divide by zero make
> the cpu spinning at 100%.
> 
> I thought two solutions:
> 
> (1) add misc_init function after machdep_init(POST_INIT) call, and
> calculate the value of kt-> boot_date in it.
>  read_in_kernel_config(IKCFG_INIT);
>  kernel_init();
>  machdep_init(POST_GDB);
>  vm_init();
>  machdep_init(POST_VM);
>  module_init();
>  help_init();
>  task_init();
>  vfs_init();
>  net_init();
>  dev_init();
>  machdep_init(POST_INIT);
> +   misc_init();
> 
> (2) calculate the value of kt-> boot_date on cmd_log function, when we
> call log command.
> 
> 
> Dave, Which one do you like?

Definitely option #2.  Since it's not required unless your new command option 
is run,
you can simply check whether the new boot_date structure is still zero-filled, 
and do
your initialization at that time.

And BTW, please move the boot_date structure to the end of the kernel_table
to prevent any possible breakage of previously-compiled extension modules
that use the kernel_table.  And also can you please display the new structure's
contents in dump_kernel_table()?  You can put the display under the current
"date" display.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] ramdump support for va_bits_actual

2020-04-20 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> On Sat, Apr 18, 2020 at 2:08 PM Dave Anderson  wrote:
> >
> >
> >
> > - Original Message -
> > > Hi Dave,
> > >
> > > Noticed that raw ramdumps of 5.4 kernel aren't working with crash tip.
> > > With the patches attached, I could get it working. Please take a look.
> > >
> > > Thanks,
> > > Vinayak
> > >
> >
> > Hi Vinayak,
> >
> > A couple quick questions come to mind...
> >
> > First, I haven't checked all possible READMEM plugins, but for example, if
> > this
> > function is run on a live system, the -1 file descriptor would cause the
> > READMEM()
> > call to fail:
> 
> 
> I changed it like this and it works for ramdump. I don't actually have
> a live setup to try this. Let me try
> to set up one.
> 
> diff --git a/arm64.c b/arm64.c
> index 04efc13..fce3f8e 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -981,7 +981,7 @@ arm64_calc_physvirt_offset(void)
> 
> if ((sp = kernel_symbol_search("physvirt_offset")) &&
> machdep->machspec->kimage_voffset) {
> -   if (READMEM(-1, _offset, sizeof(physvirt_offset),
> +   if (READMEM(pc->mfd, _offset, 
> sizeof(physvirt_offset),
> sp->value, sp->value -
> machdep->machspec->kimage_voffset) > 0) {
> ms->physvirt_offset = physvirt_offset;
> 
> 
> >
> >  static void
> > +arm64_calc_physvirt_offset(void)
> > +{
> > +   struct machine_specific *ms = machdep->machspec;
> > +   ulong physvirt_offset;
> > +   struct syment *sp;
> > +
> > +   ms->physvirt_offset = ms->phys_offset - ms->page_offset;
> > +
> > +   if ((sp = kernel_symbol_search("physvirt_offset")) &&
> > +   machdep->machspec->kimage_voffset) {
> > +   if (READMEM(-1, _offset, sizeof(physvirt_offset),
> > +   sp->value, sp->value -
> > +   machdep->machspec->kimage_voffset) > 0) {
> > +   ms->physvirt_offset = physvirt_offset;
> > +   }
> > +   }
> > +
> > +   if (CRASHDEBUG(1))
> > +   fprintf(fp, "using %lx as physvirt_offset\n", 
> > ms->physvirt_offset);
> > +}
> >
> > And here -- are you missing some brackets?  (run "make warn")
> >
> 
> I did try "make warn" and it does not show any issues.Am I missing something?

I saw on a system provisioned with Fedora's latest and greatest gcc version.
I don't have the system available any more, but the warning message picked up
on the fact that your second if statement "was not guarded" by the if statement
above it.

> 
> > But regardless of that, why are you setting it back to 48 if it's greater
> > than 48?
> >
> 
> 
> I did that because machspec->CONFIG_ARM64_VA_BITS is used for calculation of
> vmemmap size. In kernel vmemmap size is calculated using VA_BITS_MIN and it is
> defined like this
> 
> #if VA_BITS > 48
> #define VA_BITS_MIN (48)
> #else
> #define VA_BITS_MIN (VA_BITS)
> #endif
> 
> But I realize now that its not the right thing to do, because 
> machspec->CONFIG_ARM64_VA_BITS
> is later used in arm64_calc_VA_BITS to verify machspec->VA_BITS. So
> what about this ?
> 
> diff --git a/arm64.c b/arm64.c
> index 04efc13..a35a30e 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -4023,8 +4023,6 @@ arm64_calc_virtual_memory_ranges(void)
> if ((ret = get_kernel_config("CONFIG_ARM64_VA_BITS",
> )) == IKCONFIG_STR)
> machdep->machspec->CONFIG_ARM64_VA_BITS = 
> atol(string);
> -   if (machdep->machspec->CONFIG_ARM64_VA_BITS > 
> 48)
> -
> machdep->machspec->CONFIG_ARM64_VA_BITS = 48;
> }
> }
> 
> @@ -4049,7 +4047,12 @@ arm64_calc_virtual_memory_ranges(void)
>  #define STRUCT_PAGE_MAX_SHIFT   6
> 
> if (ms->VA_BITS_ACTUAL) {
> -   vmemmap_size = (1UL) << (ms->CONFIG_ARM64_VA_BITS - 
> machdep->pageshift - 1 + STRUCT_PAGE_MAX_SHIFT);
> +   ulong va_bits_min = 48;
> +
> +   if (machdep->machspec->CONFIG_ARM64_VA_BITS < 48)
> +   va_bits_min = ms->CONFIG_ARM64_VA_BITS;
> +
> +   vmemmap_size = (1UL) << (va_bits_min - machdep->pageshift - 1 
> + STRUCT_PAGE_MAX_SHIFT);
> vmalloc_end = (- PUD_SIZE - vmemmap_size - KILOBYTES(64));
> vmemmap_start = (-vmemmap_size);
> ms->vmalloc_end = vmalloc_end - 1;
> 

Yeah, that looks reasonable.  But what about the parallel discussion re: 
vmemmap_start?

  https://www.redhat.com/archives/crash-utility/2020-April/msg00064.html

Can you send in an updated patch set with all fixes applied?

Thanks,
  Dave



 


Shouldn't it be 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [营销邮件] Re: [营销邮件] Re: [营销邮件] Re: [External Mail][????] Re: ramdump support for va_bits_actual

2020-04-20 Thread Dave Anderson



- Original Message -
> In fact,vmemmap  not easy to calculated in crash-utility,if
> CONFIG_RANDOMIZE_BASE is configured,memstart_addr  will be changed since 
> below codes:
> [arm64_memblock_init]
> 348 vmemmap = ((struct page *)VMEMMAP_START - (memstart_addr >> 
> PAGE_SHIFT));
> ...
> 413 if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
> 414 extern u16 memstart_offset_seed;
> 415 u64 range = linear_region_size -
> 416 (memblock_end_of_DRAM() - 
> memblock_start_of_DRAM());
> 417
> 418 /*
> 419  * If the size of the linear region exceeds, by a 
> sufficient
> 420  * margin, the size of the region that the available 
> physical
> 421  * memory spans, randomize the linear region as well.
> 422  */
> 423 if (memstart_offset_seed > 0 && range >= 
> ARM64_MEMSTART_ALIGN) {
> 424 range /= ARM64_MEMSTART_ALIGN;
> 425 memstart_addr -= ARM64_MEMSTART_ALIGN *
> 426  ((range * memstart_offset_seed) 
> >> 16);
> 427 }
> 428 }

OK.

> 
> the reason i  showed the "address_markers " is just to prove vmemmap and
> ms->vmemmap_start is wrong.we'd better to do below change.
> -   vmemmap_start = (-vmemmap_size);
> +   vmemmap_start = (-vmemmap_size - MEGABYTES(2));

This looks correct, although I've never seen a problem using the current
setting on 5.4 and later kernels.  What happens on your system?  Is your
system's memstart_addr within that low 2MB? 

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] new printk ringbuffer interface

2020-04-20 Thread Dave Anderson



- Original Message -
> Hello Dave,
> 
> You may or may not be aware that we are working on replacing [0] the
> Linux printk ringbuffer. Rather than a single buffer containing a single
> struct type, the new ringbuffer makes use of several different structs.

Yes, I am most definitely aware...

> 
> I am writing to ask your advice about how this should be exported for
> the crash utility. Should all struct sizes and field offsets be
> exported? It would look something like this:
> 
> VMCOREINFO_SYMBOL(prb);
> 
> VMCOREINFO_STRUCT_SIZE(printk_ringbuffer);
> VMCOREINFO_OFFSET(printk_ringbuffer, desc_ring);
> VMCOREINFO_OFFSET(printk_ringbuffer, text_data_ring);
> VMCOREINFO_OFFSET(printk_ringbuffer, dict_data_ring);
> VMCOREINFO_OFFSET(printk_ringbuffer, fail);
> 
> VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
> VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
> VMCOREINFO_OFFSET(prb_desc_ring, descs);
> VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
> 
> VMCOREINFO_STRUCT_SIZE(prb_desc);
> VMCOREINFO_OFFSET(prb_desc, info);
> VMCOREINFO_OFFSET(prb_desc, state_var);
> VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
> VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
> 
> VMCOREINFO_STRUCT_SIZE(prb_data_blk_lpos);
> VMCOREINFO_OFFSET(prb_data_blk_lpos, begin);
> VMCOREINFO_OFFSET(prb_data_blk_lpos, next);
> 
> VMCOREINFO_STRUCT_SIZE(printk_info);
> VMCOREINFO_OFFSET(printk_info, seq);
> VMCOREINFO_OFFSET(printk_info, ts_nsec);
> VMCOREINFO_OFFSET(printk_info, text_len);
> VMCOREINFO_OFFSET(printk_info, dict_len);
> VMCOREINFO_OFFSET(printk_info, caller_id);
> 
> VMCOREINFO_STRUCT_SIZE(prb_data_ring);
> VMCOREINFO_OFFSET(prb_data_ring, size_bits);
> VMCOREINFO_OFFSET(prb_data_ring, data);
> VMCOREINFO_OFFSET(prb_data_ring, head_id);
> VMCOREINFO_OFFSET(prb_data_ring, tail_id);
> 
> Or would it be enough to just recognize the new "prb" symbol and have
> all the structures defined in the crash utility? If the latter is
> preferred, should some sort of version number be exported? Or is the
> kernel version number enough?
> 
> I appreciate your feedback.
> 
> John Ogness

With respect to the crash utility, there are two answers. 

When running crash session normally, i.e. running "crash vmlinux vmcore", the 
runtime
"log" command does not use any VMCOREINFO entries that happen to be attached to 
a dumpfile.
Since crash has the vmlinux debuginfo data available, it uses its own 
interfaces to get 
all kernel symbol and structure related information.

But there is a little-used capability where the the vmlinux file is not 
required,
but rather just the vmcore, in its "crash --log vmcore" feature.  That 
functionality
does require the VMCOREINFO entries to extract/dump the log, and exit.  
Honestly I wish
I had never even introduced that feature.  And I wonder if it were deprecated, 
would anybody care?

However, your question is highly relevant to the makedumpfile(8) facility
for its "makedumpfile --dump-dmesg" option.  Since it doesn't have the
luxury of a vmlinux file, it needs all of the VMCOREINF_xxx items.  Kazuhito
Hagio is the makedumpfile maintainer, and since he is the primary customer
of the VMCOREINFO entries, he would be a better person to answer your
question.  

That being said, due the sheer number VMCOREINFO entries required, I like
your idea of providing a single version number.  But I defer to Kazu for
his preference.

Thanks,
  Dave



 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [营销邮件] Re: [营销邮件] Re: [External Mail][????] Re: ramdump support for va_bits_actual

2020-04-20 Thread Dave Anderson



- Original Message -
> Hi
> vmemmap and VMEMMAP_START are different,vmemmap begin from physical address
> 0x0 of page,but VMEMMAP_START start with memstart_addr.
> 
> [mm/init.c]
> arm64_memblock_init
> 348 vmemmap = ((struct page *)VMEMMAP_START - (memstart_addr >> 
> PAGE_SHIFT));

Right, so VMEMMAP_START can be calculated by reading vmemmap and memstart_addr.

> 
> We can look at address_markers, this symbol indicates the entire memory map 
> allocation
> check vmemmap and address_markers, we can see vmemmap has out of vmemmap range

It would be ideal if address_markers was guaranteed to be there, but it 
only exists if CONFIG_PTDUMP_CORE was configured.

> crash> p -x vmemmap
> vmemmap = $1 = (struct page *) 0xfffefde0
> 
> address_markers = $2 =
>  {{
>   start_address = 0xff80,
> name = 0xffd44b1fed8f "Linear Mapping start"
> ...
>   }, {
> start_address = 0xfffeffe0,
> name = 0xffd44b2c5beb "vmemmap start"
>   }, {
> start_address = 0xffe0,
> name = 0xffd44b1eb00c "vmemmap end"
>   }, {
> start_address = 0x,
> name = 0x0
>   }}
> 
> 
> > When the readmem() of symbol_value("physvirt_offset") is made, arm64_VTOP() 
> > will
> > be called with its virtual address, right?
> Yes, arm64_VTOP to get physvirt_offset needs virtual address, and
> physvirt_offset is a kimage symbol,so just need kimage_voffset can translate
> to physical address,then get value of physvirt_offset.

Ah, you're right, it's a mapped kernel symbol and doesn't use it.  

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Crash-utility Digest, Vol 175, Issue 27

2020-04-20 Thread Dave Anderson



- Original Message -
> Would you please take me off your list totally ridiculous

Perhaps you should investigate who explicitly sent your email address
for a subscription.  You're removed now.  

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] add log -T option to display the message text with human readable timestamp

2020-04-19 Thread Dave Anderson


FWIW, I tried it on another RHEL7 machine running live,
but then also on a RHEL8 kernel dumpfile, and they all hang:

  $ crash vmlinux vmcore

  crash 7.2.9rc13
  Copyright (C) 2002-2020  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
 
  GNU gdb (GDB) 7.6 
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...

  WARNING: kernel relocated [950MB]: patching 94929 gdb minimal_symbol values

  crash: start patch...  



- Original Message -
> 
> 
> - Original Message -
> > Sometimes, we need to know the accurate time of the log, which
> > helps us analyze the problem.
> > 
> > add -T option(like dmesg -T command) for log command to display
> > the message text with human readable timestamp.
> > 
> > Signed-off-by: Wang Long 
> 
> Did you attempt this patch on a live system?  Because your patch to
> kernel_init() hangs the session.  I didn't bother to investigate beyond
> adding these two debug statements around your addition to kernel_init():
> 
>   error(INFO, "start patch...\n");
>   get_uptime(NULL, _jiffies);
>   uptime_sec = (uptime_jiffies)/(ulonglong)machdep->hz;
>   kt->boot_date.tv_sec = kt->date.tv_sec - uptime_sec;
>   kt->boot_date.tv_nsec = 0;
>   error(INFO, "end patch...\n");
>   
> And that's where it hangs:
>   
>   $ ./crash
>   
>   crash 7.2.9rc13
>   Copyright (C) 2002-2020  Red Hat, Inc.
>   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>   Copyright (C) 1999-2006  Hewlett-Packard Co
>   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>   Copyright (C) 2005, 2011  NEC Corporation
>   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>   This program is free software, covered by the GNU General Public License,
>   and you are welcome to change it and/or distribute copies of it under
>   certain conditions.  Enter "help copying" to see the conditions.
>   This program has absolutely no warranty.  Enter "help warranty" for
>   details.
>
>   GNU gdb (GDB) 7.6
>   Copyright (C) 2013 Free Software Foundation, Inc.
>   License GPLv3+: GNU GPL version 3 or later
>   
>   This is free software: you are free to change and redistribute it.
>   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>   and "show warranty" for details.
>   This GDB was configured as "x86_64-unknown-linux-gnu"...
>   
>   WARNING: kernel relocated [796MB]: patching 85687 gdb minimal_symbol values
>   
>   crash: start patch...
>   
>   
> 
> And it shows a cpu spinning at 100%:
> 
>   $ top
>   top - 15:26:43 up 38 days,  3:41,  5 users,  load average: 1.00, 0.89, 0.65
>   Tasks: 280 total,   2 running, 278 sleeping,   0 stopped,   0 zombie
>   %Cpu(s):  3.9 us,  8.7 sy,  0.0 ni, 87.3 id,  0.0 wa,  0.0 hi,  0.0 si,
>   0.0 st
>   KiB Mem : 15907600 total,   455876 free,  1232832 used, 14218892 buff/cache
>   KiB Swap:  8060924 total,  7395580 free,   665344 used. 14176220 avail Mem
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
>   26668 root  20   0  350268 213688   5680 R 100.0  1.3   5:42.70 crash
>1707 root  20   0  115692   1184688 S   0.3  0.0   2:16.52
>ksmtuned
>   12852 anderson  20   0 4235240 274608  20320 S   0.3  1.7 601:46.85
>   gnome-shell
>   13060 anderson  20   0  804924  14100   3744 S   0.3  0.1 118:44.59
>   gsd-color
>   27045 anderson  20   0  172452   2532   1648 R   0.3  0.0   0:00.08 top
>   1 root  20   0  210504   5592   3224 S   0.0  0.0  18:14.19 systemd
>   ...
>  
> I'll let you figure it out...
> 
> Dave
> 
> 
> 
> 
> 
> 
> > ---
> >  defs.h   |  2 ++
> >  help.c   | 28 +++-
> >  kernel.c | 22 --
> >  3 files changed, 49 insertions(+), 3 deletions(-)
> > 
> > diff --git a/defs.h b/defs.h
> > index 

Re: [Crash-utility] [PATCH] add log -T option to display the message text with human readable timestamp

2020-04-19 Thread Dave Anderson



- Original Message -
> Sometimes, we need to know the accurate time of the log, which
> helps us analyze the problem.
> 
> add -T option(like dmesg -T command) for log command to display
> the message text with human readable timestamp.
> 
> Signed-off-by: Wang Long 

Did you attempt this patch on a live system?  Because your patch to 
kernel_init() hangs the session.  I didn't bother to investigate beyond
adding these two debug statements around your addition to kernel_init():

  error(INFO, "start patch...\n");
  get_uptime(NULL, _jiffies);
  uptime_sec = (uptime_jiffies)/(ulonglong)machdep->hz;
  kt->boot_date.tv_sec = kt->date.tv_sec - uptime_sec;
  kt->boot_date.tv_nsec = 0;
  error(INFO, "end patch...\n");
  
And that's where it hangs:
  
  $ ./crash
  
  crash 7.2.9rc13
  Copyright (C) 2002-2020  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
   
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...
  
  WARNING: kernel relocated [796MB]: patching 85687 gdb minimal_symbol values
  
  crash: start patch...  
  
  

And it shows a cpu spinning at 100%:

  $ top
  top - 15:26:43 up 38 days,  3:41,  5 users,  load average: 1.00, 0.89, 0.65
  Tasks: 280 total,   2 running, 278 sleeping,   0 stopped,   0 zombie
  %Cpu(s):  3.9 us,  8.7 sy,  0.0 ni, 87.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 
st
  KiB Mem : 15907600 total,   455876 free,  1232832 used, 14218892 buff/cache
  KiB Swap:  8060924 total,  7395580 free,   665344 used. 14176220 avail Mem 
PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
  26668 root  20   0  350268 213688   5680 R 100.0  1.3   5:42.70 crash
   1707 root  20   0  115692   1184688 S   0.3  0.0   2:16.52 ksmtuned
  12852 anderson  20   0 4235240 274608  20320 S   0.3  1.7 601:46.85 
gnome-shell
  13060 anderson  20   0  804924  14100   3744 S   0.3  0.1 118:44.59 gsd-color
  27045 anderson  20   0  172452   2532   1648 R   0.3  0.0   0:00.08 top
  1 root  20   0  210504   5592   3224 S   0.0  0.0  18:14.19 systemd
  ...
 
I'll let you figure it out...

Dave






> ---
>  defs.h   |  2 ++
>  help.c   | 28 +++-
>  kernel.c | 22 --
>  3 files changed, 49 insertions(+), 3 deletions(-)
> 
> diff --git a/defs.h b/defs.h
> index d8eda5e..1644dbd 100644
> --- a/defs.h/
> +++ b/defs.h
> @@ -689,6 +689,7 @@ struct kernel_table {   /* kernel data */
>  ulong kernel_module;
>   int mods_installed;
>   struct timespec date;
> + struct timespec boot_date;
>   char proc_version[BUFSIZE];
>   struct new_utsname utsname;
>   uint kernel_version[3];
> @@ -5577,6 +5578,7 @@ void dump_log(int);
>  #define SHOW_LOG_DICT  (0x2)
>  #define SHOW_LOG_TEXT  (0x4)
>  #define SHOW_LOG_AUDIT (0x8)
> +#define SHOW_LOG_CTIME (0x10)
>  void set_cpu(int);
>  void clear_machdep_cache(void);
>  struct stack_hook *gather_text_list(struct bt_info *);
> diff --git a/help.c b/help.c
> index c443cad..1ee70f7 100644
> --- a/help.c
> +++ b/help.c
> @@ -3892,12 +3892,13 @@ NULL
>  char *help_log[] = {
>  "log",
>  "dump system message buffer",
> -"[-tdma]",
> +"[-Ttdma]",
>  "  This command dumps the kernel log_buf contents in chronological order.
>  The",
>  "  command supports the older log_buf formats, which may or may not contain
>  a",
>  "  timestamp inserted prior to each message, as well as the newer
>  variable-length",
>  "  record format, where the timestamp is contained in each log entry's
>  header.",
>  "  ",/
> +"-T  Display the message text with human readable timestamp.",
>  "-t  Display the message text without the timestamp; only applicable to
>  the",
>  "variable-length record format.",
>  "-d  Display the dictionary of key/value pair properties that are
>  optionally",
> @@ -4031,6 +4032,31 @@ char *help_log[] = {
>  "type=1307 audit(1489384479.809:4346):  cwd=\"/proc\"",
>  "...",
>  

Re: [Crash-utility] [营销邮件] Re: [External Mail][????] Re: ramdump support for va_bits_actual

2020-04-19 Thread Dave Anderson


- Original Message -
> Hi,Dave
> 
> I don't quite understand how you said to read vmemmap,VMEMMAP_START is a
> macro definition,or it's a constant,in crash-utility vmemmap_start  is a
> wrong value,this may cause arm64_IS_VMALLOC_ADDR return wrong status

I mean the kernel symbol "vmemmap", here in "arch/arm64/mm/init.c":

  struct page *vmemmap __ro_after_init;
  EXPORT_SYMBOL(vmemmap);

Doesn't it contain the resolved starting address?

> We can get physvirt_offset  earlier,in my patch,after calling
> arm64_calc_phys_offset we can initialize physvirt_offset,in this
> time,kimage_offset,page_offset and phys_offset are ready,such as:
> 
> @@ -391,6 +391,13 @@ arm64_init(int when)
> 
> ms = machdep->machspec;
> 
> +   if (kernel_symbol_exists("physvirt_offset") &&
> +   readmem(symbol_value("physvirt_offset"), KVADDR,
> +   , sizeof(ulong), "physvirt_offset", 
> QUIET|RETURN_ON_ERROR))
> +   ms->physvirt_offset = value;
> +   else
> +   ms->physvirt_offset = ms->phys_offset -> 
> ms->page_offset;

When the readmem() of symbol_value("physvirt_offset") is made, arm64_VTOP() will
be called with its virtual address, right?

Dave
  

 
> 
> From: crash-utility-boun...@redhat.com  on
> behalf of Dave Anderson 
> Sent: Sunday, April 19, 2020 1:03
> To: Discussion list for crash utility usage,maintenance and development
> Subject: [营销邮件] Re: [Crash-utility] [External Mail][] Re: ramdump support
> for   va_bits_actual
> 
> - Original Message -
> > Hi
> > I made almost the same patch to fix the problem with arm64 in version
> > 5.4...
> >
> > One very small change can merged together,vmemmap_start has a little error:
> > [arch/arm64/include/asm/memory.h]
> >  56 #define VMEMMAP_START   (-VMEMMAP_SIZE - SZ_2M)
> > in crash arm64.c
> > -   vmemmap_start = (-vmemmap_size);
> > +   vmemmap_start = (-vmemmap_size - MEGABYTES(2));
> 
> Can't we just read the value of "vmemmap"?  If not, what is the difference
> between the calculated value above and the value of vmemmap?
> 
> >
> >
> > BTW,in arm64_VTOP,it's easier to use the physvirt_offset directly
> > @@ -1148,8 +1155,7 @@ arm64_VTOP(ulong addr)
> >
> > }
> >
> > if (addr >= machdep->machspec->page_offset)
> > -   return machdep->machspec->phys_offset
> > -   + (addr - machdep->machspec->page_offset);
> > +   return (addr + machdep->machspec->physvirt_offset);
> 
> Unfortunately that's not possible, because there is at least one arm64_VTOP()
> call *before* the new machdep->machspec->physvirt_offset gets initialized,
> which I presume is why Vinayak's patch checks whether it's non-zero first.
> 
> Dave
> 
> 
> 
> > 
> > From: crash-utility-boun...@redhat.com 
> > on
> > behalf of Dave Anderson 
> > Sent: Saturday, April 18, 2020 0:32
> > To: Discussion list for crash utility usage,maintenance and development
> > Subject: [External Mail][] Re: [Crash-utility] ramdump support for
> > va_bits_actual
> >
> > - Original Message -
> > > Hi Dave,
> > >
> > > Noticed that raw ramdumps of 5.4 kernel aren't working with crash tip.
> > > With the patches attached, I could get it working. Please take a look.
> > >
> > > Thanks,
> > > Vinayak
> > >
> >
> > Hi Vinayak,
> >
> > A couple quick questions come to mind...
> >
> > First, I haven't checked all possible READMEM plugins, but for example, if
> > this
> > function is run on a live system, the -1 file descriptor would cause the
> > READMEM()
> > call to fail:
> >
> >  static void
> > +arm64_calc_physvirt_offset(void)
> > +{
> > +   struct machine_specific *ms = machdep->machspec;
> > +   ulong physvirt_offset;
> > +   struct syment *sp;
> > +
> > +   ms->physvirt_offset = ms->phys_offset - ms->page_offset;
> > +
> > +   if ((sp = kernel_symbol_search("physvirt_offset")) &&
> > +   machdep->machspec->kimage_voffset) {
> > +   if (READMEM(-1, _offset, sizeof(physvirt_offset),
> > +   sp->value, sp->v

Re: [Crash-utility] [External Mail][????] Re: ramdump support for va_bits_actual

2020-04-18 Thread Dave Anderson


- Original Message -
> Hi
> I made almost the same patch to fix the problem with arm64 in version 5.4...
> 
> One very small change can merged together,vmemmap_start has a little error:
> [arch/arm64/include/asm/memory.h]
>  56 #define VMEMMAP_START   (-VMEMMAP_SIZE - SZ_2M)
> in crash arm64.c
> -   vmemmap_start = (-vmemmap_size);
> +   vmemmap_start = (-vmemmap_size - MEGABYTES(2));

Can't we just read the value of "vmemmap"?  If not, what is the difference
between the calculated value above and the value of vmemmap?

> 
> 
> BTW,in arm64_VTOP,it's easier to use the physvirt_offset directly
> @@ -1148,8 +1155,7 @@ arm64_VTOP(ulong addr)
>
> }
> 
> if (addr >= machdep->machspec->page_offset)
> -   return machdep->machspec->phys_offset
> -   + (addr - machdep->machspec->page_offset);
> +   return (addr + machdep->machspec->physvirt_offset);

Unfortunately that's not possible, because there is at least one arm64_VTOP()
call *before* the new machdep->machspec->physvirt_offset gets initialized,
which I presume is why Vinayak's patch checks whether it's non-zero first.

Dave



> ____
> From: crash-utility-boun...@redhat.com  on
> behalf of Dave Anderson 
> Sent: Saturday, April 18, 2020 0:32
> To: Discussion list for crash utility usage,maintenance and development
> Subject: [External Mail][] Re: [Crash-utility] ramdump support for
> va_bits_actual
> 
> - Original Message -
> > Hi Dave,
> >
> > Noticed that raw ramdumps of 5.4 kernel aren't working with crash tip.
> > With the patches attached, I could get it working. Please take a look.
> >
> > Thanks,
> > Vinayak
> >
> 
> Hi Vinayak,
> 
> A couple quick questions come to mind...
> 
> First, I haven't checked all possible READMEM plugins, but for example, if
> this
> function is run on a live system, the -1 file descriptor would cause the
> READMEM()
> call to fail:
> 
>  static void
> +arm64_calc_physvirt_offset(void)
> +{
> +   struct machine_specific *ms = machdep->machspec;
> +   ulong physvirt_offset;
> +   struct syment *sp;
> +
> +   ms->physvirt_offset = ms->phys_offset - ms->page_offset;
> +
> +   if ((sp = kernel_symbol_search("physvirt_offset")) &&
> +   machdep->machspec->kimage_voffset) {
> +   if (READMEM(-1, _offset, sizeof(physvirt_offset),
> +   sp->value, sp->value -
> +   machdep->machspec->kimage_voffset) > 0) {
> +   ms->physvirt_offset = physvirt_offset;
> +   }
> +   }
> +
> +   if (CRASHDEBUG(1))
> +   fprintf(fp, "using %lx as physvirt_offset\n",
> ms->physvirt_offset);
> +}
> 
> And here -- are you missing some brackets?  (run "make warn")
> 
> But regardless of that, why are you setting it back to 48 if it's greater
> than 48?
> 
> diff --git a/arm64.c b/arm64.c
> index 31d6e90..04efc13 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -4011,6 +4011,7 @@ arm64_calc_virtual_memory_ranges(void)
> struct machine_specific *ms = machdep->machspec;
> ulong value, vmemmap_start, vmemmap_end, vmemmap_size, vmalloc_end;
> char *string;
> +   int ret;
> ulong PUD_SIZE = UNINITIALIZED;
> 
> if (!machdep->machspec->CONFIG_ARM64_VA_BITS) {
> @@ -4018,6 +4019,12 @@ arm64_calc_virtual_memory_ranges(void)
> value = atol(string);
> free(string);
> machdep->machspec->CONFIG_ARM64_VA_BITS = value;
> +   } else if (kt->ikconfig_flags & IKCONFIG_AVAIL) {
> +   if ((ret = get_kernel_config("CONFIG_ARM64_VA_BITS",
> +   )) == IKCONFIG_STR)
> +   machdep->machspec->CONFIG_ARM64_VA_BITS =
> atol(string);
> +   if (machdep->machspec->CONFIG_ARM64_VA_BITS >
> 48)
> +
> machdep->machspec->CONFIG_ARM64_VA_BITS
> = 48;
> }
> }
> 
> Thanks,
>   Dave
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> #/**本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> This e-mail and its attachments contain c

Re: [Crash-utility] ramdump support for va_bits_actual

2020-04-17 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> Noticed that raw ramdumps of 5.4 kernel aren't working with crash tip.
> With the patches attached, I could get it working. Please take a look.
> 
> Thanks,
> Vinayak
> 

Hi Vinayak,

A couple quick questions come to mind...  

First, I haven't checked all possible READMEM plugins, but for example, if this 
function is run on a live system, the -1 file descriptor would cause the 
READMEM()
call to fail:

 static void
+arm64_calc_physvirt_offset(void)
+{
+   struct machine_specific *ms = machdep->machspec;
+   ulong physvirt_offset;
+   struct syment *sp;
+
+   ms->physvirt_offset = ms->phys_offset - ms->page_offset;
+
+   if ((sp = kernel_symbol_search("physvirt_offset")) &&
+   machdep->machspec->kimage_voffset) {
+   if (READMEM(-1, _offset, sizeof(physvirt_offset),
+   sp->value, sp->value -
+   machdep->machspec->kimage_voffset) > 0) {
+   ms->physvirt_offset = physvirt_offset;
+   }
+   }
+
+   if (CRASHDEBUG(1))
+   fprintf(fp, "using %lx as physvirt_offset\n", 
ms->physvirt_offset);
+}

And here -- are you missing some brackets?  (run "make warn")

But regardless of that, why are you setting it back to 48 if it's greater than 
48? 

diff --git a/arm64.c b/arm64.c
index 31d6e90..04efc13 100644
--- a/arm64.c
+++ b/arm64.c
@@ -4011,6 +4011,7 @@ arm64_calc_virtual_memory_ranges(void)
struct machine_specific *ms = machdep->machspec;
ulong value, vmemmap_start, vmemmap_end, vmemmap_size, vmalloc_end;
char *string;
+   int ret;
ulong PUD_SIZE = UNINITIALIZED;
 
if (!machdep->machspec->CONFIG_ARM64_VA_BITS) {
@@ -4018,6 +4019,12 @@ arm64_calc_virtual_memory_ranges(void)
value = atol(string);
free(string);
machdep->machspec->CONFIG_ARM64_VA_BITS = value;
+   } else if (kt->ikconfig_flags & IKCONFIG_AVAIL) {
+   if ((ret = get_kernel_config("CONFIG_ARM64_VA_BITS",
+   )) == IKCONFIG_STR)
+   machdep->machspec->CONFIG_ARM64_VA_BITS = 
atol(string);
+   if (machdep->machspec->CONFIG_ARM64_VA_BITS > 
48)
+   machdep->machspec->CONFIG_ARM64_VA_BITS 
= 48;
}
}
 
Thanks,
  Dave 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] Add REFRESH_TASK_TABLE flag to mount command

2020-04-15 Thread Dave Anderson



- Original Message -
> when we launch a crash first and then create a docker container.
> 
> now we run mount command to see the mount namespaces info, we get
> the error:
> 
> crash> mount -n 2020
> mount: invalid task or pid value: 2020
> 
> This patch fix it by add REFRESH_TASK_TABLE flag to mount command.
> 
> Signed-off-by: Wang Long 

Thanks Wang -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/601bccedc3300b16f7ff074ba651876af106ffdc

Dave


  
> 
>  global_data.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/global_data.c b/global_data.c
> index cbacc42..a316d1c 100644
> --- a/global_data.c
> +++ b/global_data.c
> @@ -93,7 +93,7 @@ struct command_table_entry linux_command_table[] = {
>   {"mach",cmd_mach,help_mach,0},
>   {"map", cmd_map, help_map, HIDDEN_COMMAND},
>   {"mod", cmd_mod, help_mod, 0},
> - {"mount",   cmd_mount,   help_mount,   0},
> + {"mount",   cmd_mount,   help_mount,   REFRESH_TASK_TABLE},
>   {"net", cmd_net,help_net,  REFRESH_TASK_TABLE},
>   {"p",   cmd_p,   help_p,   0},
>   {"ps",  cmd_ps,  help_ps,  REFRESH_TASK_TABLE},
> --
> 1.8.3.1
> 
> 
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-13 Thread Dave Anderson


- Original Message -
> 
> Make the mistake cause by patch update
> please re-check the new patch

In the interest of expediency, I went ahead and made a few cosmetic changes to
the comments and error message strings, and queued the patch for crash-7.2.9:
  
  
https://github.com/crash-utility/crash/commit/b12bdd36cf7caad24957c0b8c030001321ab2df4

Thanks,
  Dave


> ____
> From: Dave Anderson 
> Sent: Friday, April 10, 2020 22:57
> To: 赵乾利
> Cc: d hatayama; Discussion list for crash utility usage, maintenance and
> development
> Subject: Re: 答复: [External Mail]Re: [Crash-utility] zram decompress support
> for gcore/crash-utility
> 
> - Original Message -
> > I got little problem to compile 32-bit on my x86-64 host..
> >  96 /usr/bin/ld: skipping incompatible
> >  /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc.a when searching for -lgcc
> >  97 /usr/bin/ld: cannot find -lgcc
> >  98 /usr/bin/ld: skipping incompatible
> >  /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc_s.so when searching for -lgcc_s
> >  99 /usr/bin/ld: cannot find -lgcc_s
> >
> > I think i have fixed the build warning,but failed rebuild in 32-bit since
> > above error,please help confirm,and move log to try_zram_decompress,please
> > check the attachment.
> 
> The patch now compiles OK, but my first simple test shows that something
> is obviously wrong with the patch.
> 
> Here are a set of user-space addresses that have been swapped out to disk:
> 
>   crash> set 1
>   PID: 1
>   COMMAND: "systemd"
>  TASK: 92a13a1e8000  [THREAD_INFO: 92a13a26]
>   CPU: 2
> STATE: TASK_INTERRUPTIBLE
>   crash> vm -p | grep SWAP
>   55d917fb5000  SWAP: /dev/dm-2  OFFSET: 55827
>   55d917fb7000  SWAP: /dev/dm-2  OFFSET: 55828
>   55d917fc2000  SWAP: /dev/dm-2  OFFSET: 121359
>   55d917fc6000  SWAP: /dev/dm-2  OFFSET: 88579
>   55d917fcb000  SWAP: /dev/dm-2  OFFSET: 88581
>   55d917fcc000  SWAP: /dev/dm-2  OFFSET: 88582
>   55d917fcd000  SWAP: /dev/dm-2  OFFSET: 88583
>   55d917fce000  SWAP: /dev/dm-2  OFFSET: 104963
>   55d917fcf000  SWAP: /dev/dm-2  OFFSET: 104964
>   ...
> 
> Obviously any read of the addresses above should fail, but each
> read returns successfully, and each read is screwing up the internal
> buffering scheme:
> 
>   crash> rd -u 55d917fb5000
>   55d917fb5000:  
>   WARNING: malloc/free mismatch (53/54)
>   crash> rd -u 55d917fb7000
>   55d917fb7000:  
>   WARNING: malloc/free mismatch (53/55)
>   crash> rd -u 55d917fc2000
>   55d917fc2000:  
>   WARNING: malloc/free mismatch (53/56)
>   crash> rd -u 55d917fc6000
>   55d917fc6000:  
>   WARNING: malloc/free mismatch (53/57)
>   crash>
> 
> Dave
> 
> 
> #/**本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> This e-mail and its attachments contain confidential information from
> XIAOMI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!**/#
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-10 Thread Dave Anderson



- Original Message -
> I got little problem to compile 32-bit on my x86-64 host..
>  96 /usr/bin/ld: skipping incompatible
>  /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc.a when searching for -lgcc
>  97 /usr/bin/ld: cannot find -lgcc
>  98 /usr/bin/ld: skipping incompatible
>  /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc_s.so when searching for -lgcc_s
>  99 /usr/bin/ld: cannot find -lgcc_s
> 
> I think i have fixed the build warning,but failed rebuild in 32-bit since
> above error,please help confirm,and move log to try_zram_decompress,please
> check the attachment.

The patch now compiles OK, but my first simple test shows that something
is obviously wrong with the patch.

Here are a set of user-space addresses that have been swapped out to disk:

  crash> set 1
  PID: 1
  COMMAND: "systemd"
 TASK: 92a13a1e8000  [THREAD_INFO: 92a13a26]
  CPU: 2
STATE: TASK_INTERRUPTIBLE
  crash> vm -p | grep SWAP
  55d917fb5000  SWAP: /dev/dm-2  OFFSET: 55827
  55d917fb7000  SWAP: /dev/dm-2  OFFSET: 55828
  55d917fc2000  SWAP: /dev/dm-2  OFFSET: 121359
  55d917fc6000  SWAP: /dev/dm-2  OFFSET: 88579
  55d917fcb000  SWAP: /dev/dm-2  OFFSET: 88581
  55d917fcc000  SWAP: /dev/dm-2  OFFSET: 88582
  55d917fcd000  SWAP: /dev/dm-2  OFFSET: 88583
  55d917fce000  SWAP: /dev/dm-2  OFFSET: 104963
  55d917fcf000  SWAP: /dev/dm-2  OFFSET: 104964
  ...

Obviously any read of the addresses above should fail, but each
read returns successfully, and each read is screwing up the internal
buffering scheme:

  crash> rd -u 55d917fb5000
  55d917fb5000:  
  WARNING: malloc/free mismatch (53/54)
  crash> rd -u 55d917fb7000
  55d917fb7000:  
  WARNING: malloc/free mismatch (53/55)
  crash> rd -u 55d917fc2000
  55d917fc2000:  
  WARNING: malloc/free mismatch (53/56)
  crash> rd -u 55d917fc6000
  55d917fc6000:  
  WARNING: malloc/free mismatch (53/57)
  crash>
 
Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-09 Thread Dave Anderson




- Original Message -
> Hi,Dave
> 
> I modified RETURN_ON_ERROR to FAULT_ON_ERROR in all readmem,patch has > 
> attached.
> The reason for adding "decompress success"  is that i want indicate that zram
> address is different from the normal address,and calling zram_decompress not
> only readmem failures,but also decompression failures etc.i add "This page
> has swaped to zram"/"zram decompress success" as the signature log,and these
> log only printed when pc->debug >= 2.
> 
> When gcore read UVA address,there may be many page fault happened,we can only
> deal with zram swap,"not mapped" and other types of swap will not be
> processed.For debug request,we may need the signature log for zram.
> so i think it's better to keep this log.

OK fine, but please move the message out of readmem(), and put it at the end of 
try_zram_decompress() where it really belongs.  You'll have to replace the 
"PAGEOFFSET(addr)" argument with "ulonglong addr", and then do the PAGEOFFSET()
in try_zram_decompress().

Also, this doesn't compile cleanly on 32-bit architectures:

$ make warn
TARGET: X86
 CRASH: 7.2.9rc10
   GDB: 7.6

...
cc -c -g -DX86 -m32 -D_FILE_OFFSET_BITS=64 -DLZO -DSNAPPY -DGDB_7_6  diskdump.c 
-Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -fstack-protector 
-Wformat-security 
diskdump.c: In function 'lookup_swap_cache':
diskdump.c:2657:2: warning: right shift count >= width of type [enabled by 
default]
  swp_type = __swp_type(pte_val);
  ^
diskdump.c:2659:3: warning: right shift count >= width of type [enabled by 
default]
   swp_offset = (ulonglong)__swp_offset(pte_val);
   ^
diskdump.c: In function 'try_zram_decompress':
diskdump.c:2709:3: warning: right shift count >= width of type [enabled by 
default]
   swap_info += (__swp_type(pte_val) * sizeof(void *));
   ^
diskdump.c:2713:3: warning: right shift count >= width of type [enabled by 
default]
   swap_info += (SIZE(swap_info_struct) * __swp_type(pte_val));
   ^
diskdump.c:2745:4: warning: right shift count >= width of type [enabled by 
default]
swp_offset = (ulonglong)__swp_offset(pte_val);
...

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] fix "kmem -[sS]" for caches created during SLUB bootstrap

2020-04-09 Thread Dave Anderson


- Original Message -
> Fix for "kmem -[sS]" options on Linux 4.14 and later kernels built
> with CONFIG_SLAB_FREELIST_HARDENED enabled. Without the patch, there
> will error messages of the type "kmem:  slab: 
> invalid freepointer: " for caches created during
> SLUB bootstrap, as they are likely to have s->random == 0.
> 
> Signed-off-by: Hari Bathini 

Hi Hari,

Queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/1ad5a3622f32387b271584d2fe26c07530bcddc9

Thanks,
  Dave


> ---
>  memory.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/memory.c b/memory.c
> index ccc2944..c2433eb 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -19244,7 +19244,7 @@ count_free_objects(struct meminfo *si, ulong
> freelist)
>  static ulong
>  freelist_ptr(struct meminfo *si, ulong ptr, ulong ptr_addr)
>  {
> - if (si->random)
> + if (VALID_MEMBER(kmem_cache_random))
>   /* CONFIG_SLAB_FREELIST_HARDENED */
>   return (ptr ^ si->random ^ ptr_addr);
>   else
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-07 Thread Dave Anderson


- Original Message -
> Hi,Dave
> 
> The latest patch have been attached,please check
> 
> Thanks.


In diskdump.c, I should have noticed this before, but there a fundamental 
problem with *all* of your readmem() calls, because you qualify each of them
with RETURN_ON_ERROR -- and then just continue on presuming that they were 
successful.  If RETURN_ON_ERROR is specified, you need to check the return
value of readmem(), and then proceed appropriately.

However, as far as I can tell, all of your readmem() calls should be 
FAULT_ON_ERROR.
That way, if the readmem() call fails, you will see the error message using 
your 
supplied "type" string, and the command will be aborted immediately.

And speaking of your readmem() error message type strings, please remove
the "readmem " part of the strings.  For example,

+   readmem(page + OFFSET(page_private), KVADDR, ,
+   sizeof(void *), "readmem page_private", 
RETURN_ON_ERROR);
+   readmem(zspage, KVADDR, _s, sizeof(struct zspage), "readmem 
zspage", RETURN_ON_ERROR);

Just make them "page_private", "zspage", etc.

And re-thinking memory.c, there really is no point in displaying the 
"zram decompress success" message.  If try_zram_compress() fails,
you will see the reason why it failed based upon the error message
generated by the failing embedded/secondary readmem() call.  If it 
works OK, you will see the requested data.  Why bother to show "success"?

+   cnt = try_zram_decompress(paddr, 
(unsigned char *)bufptr, cnt, PAGEOFFSET(addr));
+   if (cnt) {
+   if (CRASHDEBUG(2))
+   error(INFO, "0x%lx zram 
decompress success\n", addr);
+   bufptr += cnt;
+   addr += cnt;
+   size -= cnt;
+   continue;
+   }

Thanks,
  Dave









> 
> From: Dave Anderson 
> Sent: Monday, April 6, 2020 22:38
> To: 赵乾利
> Cc: d hatayama; Discussion list for crash utility usage, maintenance and
> development
> Subject: Re: 答复: [External Mail]Re: [Crash-utility] zram decompress support
> for gcore/crash-utility
> 
> - Original Message -
> > patch update
> > Found a better way translate pfn to page,PTOB.
> > Besides,fix a issue with low probability of decompression failure
> >
> 
> The changes to defs.h and memory.c look good.  My comments for diskdump.c
> are interspersed below, where many of them are redundant:
> 
> +#ifdef LZO
> +static unsigned char *zram_object_addr(ulong pool, ulong handle, unsigned
> char *zram_buf)
> +{
> +   ulong obj, off, class, page, zspage;
> +   struct zspage zspage_s;
> +   physaddr_t paddr;
> +   unsigned int obj_idx, class_idx, size;
> +   ulong pages[2], sizes[2];
> +
> +   readmem(handle, KVADDR, , sizeof(void *), "readmem address",
> RETURN_ON_ERROR);
> 
> Can you make the error message type string something more helpful than
> "readmem address"?
> 
> +   obj >>= OBJ_TAG_BITS;
> +   phys_to_page(PTOB(obj >> OBJ_INDEX_BITS), );
> +   obj_idx = (obj & OBJ_INDEX_MASK);
> +
> +   readmem(page + OFFSET(page_private), KVADDR, ,
> +   sizeof(void *), "readmem address", RETURN_ON_ERROR);
> 
> Can you make the error message type string something more helpful than
> "readmem address"?
> 
> +   readmem(zspage, KVADDR, _s, sizeof(struct zspage), "readmem
> address",RETURN_ON_ERROR);
> 
> Can you make the error message type string something more helpful than
> "readmem address"?
> 
> +
> +   class_idx = zspage_s.class;
> +   if (zspage_s.magic != ZSPAGE_MAGIC)
> +   error(FATAL, "zspage magic incorrect:0x%x\n",
> zspage_s.magic);
> +
> +   class = pool + OFFSET(zspoll_size_class);
> +   class += (class_idx * sizeof(void *));
> +   readmem(class, KVADDR, , sizeof(void *), "readmem
> address",RETURN_ON_ERROR);
> 
> Can you make the error message type string something more helpful than
> "readmem address"?
> 
> +   readmem(class + OFFSET(size_class_size), KVADDR,
> +   , sizeof(unsigned int), "readmem address",
> RETURN_ON_ERROR);
> 
> Can you make the error message type string something more helpful than
> "readmem address"?
> 
> +

Re: [Crash-utility] [PATCH] Determine the ARM64 kernel's Pointer Authentication mask value by reading the new KERNELPACMASK vmcoreinfo entry.

2020-04-07 Thread Dave Anderson


Hi Amit,

I have a few suggestions and a question regarding your patch.

First, these two warnings need to be addressed:

$ make warn
...
cc -c -g -DARM64 -DLZO -DSNAPPY -DGDB_7_6  arm64.c -Wall -O2 
-Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security 
arm64.c: In function ‘arm64_calc_KERNELPACMASK’:
arm64.c:4090:2: warning: format ‘%llx’ expects argument of type ‘long long 
unsigned int’, but argument 3 has type ‘ulong’ [-Wformat=]
  fprintf(fp, " got NUMBER(KERNELPACMASK) =%llx\n", value);
  ^
arm64.c:4090:9: warning: ‘value’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  fprintf(fp, " got NUMBER(KERNELPACMASK) =%llx\n", value);
 ^
...

The message should be moved inside the if statement, and also should be gated 
with CRASHDEBUG(1) to prevent it from being displayed unconditionally:
  
  static void 
  arm64_calc_KERNELPACMASK(void)
  {
  ulong value;
  char *string;
  
  if ((string = pc->read_vmcoreinfo("NUMBER(KERNELPACMASK)"))) {
  value = htol(string, QUIET, NULL);
  free(string);
  machdep->machspec->CONFIG_ARM64_KERNELPACMASK = value;
  if (CRASHDEBUG(1))
  fprintf(fp, "CONFIG_ARM64_KERNELPACMASK: %lx\n", 
value);
  }
  }
  
And the CONFIG_ARM64_KERNELPACMASK value should be displayed in 
arm64_dump_machdep_table().

But given that the patch only modifies text return addresses on the kernel
stack is here:

@@ -1932,6 +1936,9 @@ arm64_print_stackframe_entry(struct bt_info *bt, int 
level, struct arm64_stackfr
  * See, for example, "bl schedule" before ret_to_user().
  */

branch_pc = frame->pc - 4;
+   if (ms->CONFIG_ARM64_KERNELPACMASK)
+   branch_pc |= ms->CONFIG_ARM64_KERNELPACMASK;
+
 name = closest_symbol(branch_pc);
 name_plus_offset = NULL;

I'm wondering how all of the other places that check addresses found
on the kernel stack will work?  For example, all of these places 
check whether an address found on the stack is a kernel text address:

  $ grep is_kernel_text arm64.c
  is_kernel_text(regs->pc) &&
  is_kernel_text(regs->regs[30])) {
  if (is_kernel_text(*up)) {   <= from 
arm64_print_text_symbols()
  if (is_kernel_text(frame->pc) ||
  if (is_kernel_text(frame->pc) && 
  if (is_kernel_text(regs->pc) &&
  if (is_kernel_text(LR) &&
  if (is_kernel_text(regs->pc) && (bt->flags & BT_LINE_NUMBERS)) {
  $

Except for the call arm64_print_text_symbols(), these are checking register 
values found in exception frames.  Can you confirm that they will still be
unmodified kernel text addresses?

The call from arm64_print_text_symbols() is for "bt -[tT]", which just
scours a kernel stack for text addresses and dumps them.  So presumably
that needs to apply the mask to each stack value as you've done in 
arm64_print_stackframe_entry()?  (while still recognizing unmodified text
addresses found in exception frames)

Thanks,
  Dave


- Original Message -
> This value is used to mask the PAC bits and generate correct backtrace.
> (amit.kach...@arm.com)
> ---
> The kernel version for the corresponding vmcoreinfo entry is posted here[1].
> 
> [1]: https://lore.kernel.org/patchwork/patch/1211981/
> 
>  arm64.c | 20 
>  defs.h  |  1 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/arm64.c b/arm64.c
> index 09b1b76..55e084f 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -84,6 +84,7 @@ static int arm64_get_kvaddr_ranges(struct vaddr_range *);
>  static void arm64_get_crash_notes(void);
>  static void arm64_calc_VA_BITS(void);
>  static int arm64_is_uvaddr(ulong, struct task_context *);
> +static void arm64_calc_KERNELPACMASK(void);
>  
>  
>  /*
> @@ -213,6 +214,7 @@ arm64_init(int when)
>   machdep->pagemask = ~((ulonglong)machdep->pageoffset);
>  
>   arm64_calc_VA_BITS();
> + arm64_calc_KERNELPACMASK();
>   ms = machdep->machspec;
>   if (ms->VA_BITS_ACTUAL) {
>   ms->page_offset = ARM64_PAGE_OFFSET_ACTUAL;
> @@ -472,6 +474,7 @@ arm64_init(int when)
>   case LOG_ONLY:
>   machdep->machspec = _machine_specific;
>   arm64_calc_VA_BITS();
> + arm64_calc_KERNELPACMASK();
>   arm64_calc_phys_offset();
>   machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
>   break;
> @@ -1925,6 +1928,7 @@ arm64_print_stackframe_entry(struct bt_info *bt, int
> level, struct arm64_stackfr
>   struct syment *sp;
>   struct load_module *lm;
>   char buf[BUFSIZE];
> + struct machine_specific *ms = machdep->machspec;
>  
>  /*
>   * if pc comes from a saved lr, it actually points to an instruction
> @@ -1932,6 +1936,9 @@ arm64_print_stackframe_entry(struct 

Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-06 Thread Dave Anderson



- Original Message -
> patch update
> Found a better way translate pfn to page,PTOB.
> Besides,fix a issue with low probability of decompression failure
> 

The changes to defs.h and memory.c look good.  My comments for diskdump.c
are interspersed below, where many of them are redundant:

+#ifdef LZO
+static unsigned char *zram_object_addr(ulong pool, ulong handle, unsigned char 
*zram_buf)
+{
+   ulong obj, off, class, page, zspage;
+   struct zspage zspage_s;
+   physaddr_t paddr;
+   unsigned int obj_idx, class_idx, size;
+   ulong pages[2], sizes[2];
+
+   readmem(handle, KVADDR, , sizeof(void *), "readmem address", 
RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   obj >>= OBJ_TAG_BITS;
+   phys_to_page(PTOB(obj >> OBJ_INDEX_BITS), );
+   obj_idx = (obj & OBJ_INDEX_MASK);
+
+   readmem(page + OFFSET(page_private), KVADDR, ,
+   sizeof(void *), "readmem address", RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   readmem(zspage, KVADDR, _s, sizeof(struct zspage), "readmem 
address",RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+
+   class_idx = zspage_s.class;
+   if (zspage_s.magic != ZSPAGE_MAGIC)
+   error(FATAL, "zspage magic incorrect:0x%x\n", zspage_s.magic);
+
+   class = pool + OFFSET(zspoll_size_class);
+   class += (class_idx * sizeof(void *));
+   readmem(class, KVADDR, , sizeof(void *), "readmem 
address",RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   readmem(class + OFFSET(size_class_size), KVADDR,
+   , sizeof(unsigned int), "readmem address", 
RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   off = (size * obj_idx) & (~machdep->pagemask);
+   if (off + size <= PAGESIZE()) {
+   if (!is_page_ptr(page, )) {
+   error(WARNING, "zspage not a page pointer:%lx\n", page);
+   return NULL;
+   }
+   readmem(paddr + off, PHYSADDR, zram_buf, size, "readmem zram 
buffer", RETURN_ON_ERROR);
+   goto out;
+   }
+
+   pages[0] = page;
+   readmem(page + OFFSET(page_freelist), KVADDR, [1],
+   sizeof(void *), "readmem address",RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   sizes[0] = PAGESIZE() - off;
+   sizes[1] = size - sizes[0];
+   if (!is_page_ptr(pages[0], )) {
+   error(WARNING, "pages[0] not a page pointer\n");

Maybe display the bogus value in pages[0] in the message?

+   return NULL;
+   }
+
+   readmem(paddr + off, PHYSADDR, zram_buf, sizes[0], "readmem zram 
buffer", RETURN_ON_ERROR);
+   if (!is_page_ptr(pages[1], )) {
+   error(WARNING, "pages[1] not a page pointer\n");

Maybe display the bogus value in pages[1] in the message?

+   return NULL;
+   }
+
+   readmem(paddr, PHYSADDR, zram_buf + sizes[0], sizes[1], "readmem zram 
buffer", RETURN_ON_ERROR);
+
+out:
+   readmem(page, KVADDR, , sizeof(void *), "readmem 
address",RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   if (!(obj & (1<<10))) { //PG_OwnerPriv1 flag
+   return (zram_buf + ZS_HANDLE_SIZE);
+   }
+
+   return zram_buf;
+}
+
+static unsigned char *lookup_swap_cache(ulong pte_val, unsigned char *zram_buf)
+{
+   ulong swp_type, swp_offset, swp_space;
+   struct list_pair lp;
+   physaddr_t paddr;
+   swp_type = __swp_type(pte_val);
+   if (THIS_KERNEL_VERSION >= LINUX(2,6,0)) {
+   swp_offset = (ulonglong)__swp_offset(pte_val);
+   } else {
+   swp_offset = (ulonglong)SWP_OFFSET(pte_val);
+   }
+
+   if (!symbol_exists("swapper_spaces"))
+   return NULL;
+   swp_space = symbol_value("swapper_spaces");
+   swp_space += swp_type * sizeof(void *);
+
+   readmem(swp_space, KVADDR, _space, sizeof(void *),
+   "readmem address",RETURN_ON_ERROR);

Can you make the error message type string something more helpful than "readmem 
address"?

+   swp_space += (swp_offset >> SWAP_ADDRESS_SPACE_SHIFT) * 
SIZE(address_space);
+
+   lp.index = swp_offset;
+   if (do_radix_tree(swp_space, RADIX_TREE_SEARCH, )){
+   fprintf(fp, "Find page in swap cache\n");

I don't think you meant to leave this message, right?

+   if (!is_page_ptr((ulong)lp.value, )) {
+   error(WARNING, "radix page not a page pointer\n");
+   return NULL;
+   }
+  

Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-03 Thread Dave Anderson



- Original Message -
> Hi,Dave
> 
> As per your suggestion, I updated the patch,and compiled successfully on
> arm64, x86,ppc64 architecture
> 

The patch does compile cleanly, although I haven't tried compiling it on an 
s390x, but
I'll do that soon.

A couple issues with the patch:

  +   if (!strncmp(name, "lzo", strlen("lzo"))) {
  +   lzo_init();
  +   decompressor = (void *)lzo1x_decompress_safe;
  +   } else {//todo,support more compressor
  +   error(WARNING, "Only support lzo compressor\n");
  +   return 0;
  +   }

lzo_init() will have already been called by is_diskdump() here during session
initialization if the dumpfile is a compressed kdump:

  #ifdef LZO
  if (lzo_init() == LZO_E_OK)
  dd->flags |= LZO_SUPPORTED;
  #endif

So try_zram_decompress() should check if (dd->flags & LZO_SUPPORTED) has been 
set before
calling lzo_init() again.

And while I don't object to exporting swap_info_init(), I do have a problem 
with pfn_to_map():

  +void swap_info_init(void);
  +ulong pfn_to_map(ulong);

  ...

  +   obj >>= OBJ_TAG_BITS;
  +   page = pfn_to_map(obj >> OBJ_INDEX_BITS);
  +   obj_idx = (obj & OBJ_INDEX_MASK);

The pfn_to_map() function is only relevant if the kernel is configured with 
SPARSEMEM.
I don't see why the exported phys_to_page() function could not be used here?

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] raw_data_dump: display only 8/16/32 bits if requested

2020-04-02 Thread Dave Anderson



- Original Message -
> Previously, calling raw_data_dump() with e.g. len 4 on 64bit systems
> would dump 8 bytes anyway, making it hard to tell what one wants to see.
> 
> For example, with task_struct.rt_priority a uint32.
> before patch:
> crash> struct -r task_struct.rt_priority 8d9b36186180
> 8d9b361861dc:  9741dec00063c.A.
> 
> after patch:
> crash-patched> struct -r task_struct.rt_priority 8d9b36186180
> 8d9b361861dc:  0063  c...
> ---
> 
> Here's the promised follow-up.
> 
> Two remarks:
>  - I wasn't sure about an explicit DISPLAY_64 flag, but if we're 32bit
> and want to print 8 bytes it is just as likely to be two entities than
> a single one so it makes more sense to leave default to me.
>  - I wasn't sure on what to do if someone wants to print some odd size,
> e.g. 6 bits? Should that be DISPLAY_8 anyway?
> I tried on some bitmap and it looks like raw_data_dump is called with 8
> anyway even if the bitmap part is less than 8, I'm not sure this can
> ever be called with weird values, so probably best left as is.

Yes, let's do that -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/8c28b5625505241d80ec5162f58ccc563e5e59f9

Thanks,
  Dave



> Thanks!
> 
>  memory.c | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/memory.c b/memory.c
> index 4f7b6a0..ccc2944 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -2113,6 +2113,7 @@ raw_data_dump(ulong addr, long count, int symbolic)
>   long wordcnt;
>   ulonglong address;
>   int memtype;
> + ulong flags = HEXADECIMAL;
>  
>   switch (sizeof(long))
>   {
> @@ -2132,6 +2133,22 @@ raw_data_dump(ulong addr, long count, int symbolic)
>   break;
>   }
>  
> + switch (count)
> + {
> + case SIZEOF_8BIT:
> + flags |= DISPLAY_8;
> + break;
> + case SIZEOF_16BIT:
> + flags |= DISPLAY_16;
> + break;
> + case SIZEOF_32BIT:
> + flags |= DISPLAY_32;
> + break;
> + default:
> + flags |= DISPLAY_DEFAULT;
> + break;
> + }
> +
>   if (pc->curcmd_flags & MEMTYPE_FILEADDR) {
>   address = pc->curcmd_private;
>   memtype = FILEADDR;
> @@ -2144,7 +2161,7 @@ raw_data_dump(ulong addr, long count, int symbolic)
>   }
>  
>   display_memory(address, wordcnt,
> - HEXADECIMAL|DISPLAY_DEFAULT|(symbolic ? SYMBOLIC : ASCII_ENDLINE),
> + flags|(symbolic ? SYMBOLIC : ASCII_ENDLINE),
>   memtype, NULL);
>  }
>  
> --
> 2.26.0
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] 答复: [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-02 Thread Dave Anderson


- Original Message -
> Hi,Dave & hatayama
> 
> I made two patchs in crash-utility and gcore to support zram decompress
> 1.In crash-utility,I add patch in readmem to support zram decompression,
> readmem interface automatically recognizes and decompresses zram data.
> There are some limitations to zram support,only support lzo decompress,kernel
> support lzo,lz4,lz4hc,842,zstd,but lzo is default.
> 
> use "rd" command also read data even if mapping to zram
> [without patch]
> crash> rd 144fc000 2
> rd: invalid user virtual address: 144fc000  type: "64-bit UVADDR"
> [with patch]
> crash> rd 144fc000 2
> 144fc000:  06ecdc6b06ecb280 06f027f906eebebe   k'..

With respect to the crash utility patch:

Apparently you wrote this patch to only support ARM64?  Here's what happens on 
an x86_64:
  
  $ patch -p1 < $bos/0001-support-zram-decompress-in-readmem.patch
  patching file defs.h
  Hunk #5 succeeded at 5304 (offset 2 lines).
  patching file memory.c
  $ make warn
  ... [ cut ] ...
  cc -c -g -DX86_64 -DLZO -DSNAPPY -DGDB_7_6  memory.c -Wall -O2 
-Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security 
  In file included from memory.c:19:0:
  memory.c: In function 'zram_object_addr':
  defs.h:5310:27: error: 'PHYS_MASK_SHIFT' undeclared (first use in this 
function)
   #define _PFN_BITS(PHYS_MASK_SHIFT - PAGESHIFT())
 ^
  defs.h:5311:43: note: in expansion of macro '_PFN_BITS'
   #define OBJ_INDEX_BITS   (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 ^
  memory.c:19838:27: note: in expansion of macro 'OBJ_INDEX_BITS'
page = pfn_to_map(obj >> OBJ_INDEX_BITS);
 ^
  defs.h:5310:27: note: each undeclared identifier is reported only once for 
each function it appears in
   #define _PFN_BITS(PHYS_MASK_SHIFT - PAGESHIFT())
 ^
  defs.h:5311:43: note: in expansion of macro '_PFN_BITS'
   #define OBJ_INDEX_BITS   (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 ^
  memory.c:19838:27: note: in expansion of macro 'OBJ_INDEX_BITS'
page = pfn_to_map(obj >> OBJ_INDEX_BITS);
 ^
  memory.c: In function 'try_zram_decompress':
  memory.c:19940:16: error: 'PTE_VALID' undeclared (first use in this function)
if (pte_val & PTE_VALID)
  ^
  memory.c:19932:8: warning: unused variable 'ret' [-Wunused-variable]
ulong ret = 0;
  ^
  make[4]: *** [memory.o] Error 1
  make[3]: *** [gdb] Error 2
  make[2]: *** [rebuild] Error 2
  make[1]: *** [gdb_merge] Error 2
  make: *** [warn] Error 2
  $

So that's a non-starter.  If it can't be made architecture-neutral, then at 
least the other major architectures need to be supported.  At a minimum all 
architectures need to be able to be compiled with LZO enabled.

If you can do that, other suggestions I have for the patch are:

 (1) Move all the new offset_table entries to the end of the structure to 
prevent
 the breakage of previously-compiled extension modules that use OFFSET().

 (2) Move the new LZO specific functions to diskdump.c, which is the only C file
 that is set up to deal with LZO being #define'd on the fly with "make lzo".

 (3) Create a dummy try_zram_decompress() function in diskdump.c that just 
 returns 0.  Put it outside of the LZO function block, e.g.:

#ifdef LZO
zram_object_addr(args... )
...
lookup_swap_cache(args...)
...
try_zram_decompress(args...)
...
#else
try_zram_decompress(args...) { return 0; }
#endif

 Alternatively, you could create a try_zram_decompress() macro in defs.h 
the same way.

 (4) Remove the #ifdef/#endif LZO section of readmem().

 (5) PLEASE do not make all the white-space changes in memory.c.  It's annoying 
 to have to review the patch when it's cluttered with changes that are 
 irrelevant to the task at hand.

Thanks,
  Dave






> 
> 2.In gcore, I have to make a small change ,change parameter of readmem from
> PHYADDR to UVADDR, other work will be done by crash
> 
> Please help review.
> Thanks
> 
> -邮件原件-
> 发件人: Dave Anderson 
> 发送时间: 2020年4月2日 0:29
> 收件人: 赵乾利 
> 抄送: d hatayama ; Discussion list for crash utility
> usage, maintenance and development 
> 主题: Re: [External Mail]Re: [Crash-utility] zram decompress support for
> gcore/crash-utility
> 
> 
> 
> - Original Message -
> > Hi,Dave
> > Zram is a virtual device,it simulated as a block device,it's part of
> > memroy/ramdump,just enable  CONFIG_ZRAM,no other settings needed.
> > you can refer to drivers/block/zram/zram_drv.c driver calling
> > zram_meta_alloc to alloc memory from RAM.
&

Re: [Crash-utility] [PATCH v2] struct: Allow -r with a single member-specific output

2020-04-02 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> Dave Anderson wrote on Wed, Apr 01, 2020:
> > > I didn't post that v2 back in Feb because I wasn't totally happy with
> > > it; I can't say I now am but might as well get your take on it...
> > 
> > What part of this patch aren't you happy about?
> 
> It's mostly style really - I don't like that we're calling in twice in
> datatype_info(), because member_to_datatype() doesn't really fill in the
> datatype_member struct and only fills in dm->member and offset.
> At a naive read, I would expect member_to_datatype to fill in the whole dm...
> 
> Functionally I tested it, it's a bit slower than my original version but
> not enough to be a valid argument here; it's much better than nothing so
> if you're happy with this let's go with it :)

Works for me -- queued for crash-7.2.9:
  
  
https://github.com/crash-utility/crash/commit/42fba6524ce01b6cecb4cd2cac8f0a50d79b1420

Thanks,
  Dave


> 
> I will probably want to follow up with a second patch for raw_data_dump
> to add DISPLAY_32/16 flags if the len requested is < word size but it's
> not directly related to this patch...
> 
> 
> Cheers,
> --
> Dominique
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v2] struct: Allow -r with a single member-specific output

2020-04-01 Thread Dave Anderson



- Original Message -
> Using struct -r allows much faster execution when iterating through a file;
> on 3909 elements struct struct.member takes 192s vs. 3s with -r
> ---
> Sorry for the delay, here's another take at allowing struct -r with a
> specific field specified.
> 
> I didn't post that v2 back in Feb because I wasn't totally happy with
> it; I can't say I now am but might as well get your take on it...

What part of this patch aren't you happy about?

Dave




> Thanks!
> 
>  defs.h|  2 ++
>  symbols.c | 69 ---
>  2 files changed, 63 insertions(+), 8 deletions(-)
> 
> diff --git a/defs.h b/defs.h
> index e852ddf..a3f828d 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -2283,6 +2283,7 @@ struct array_table {
>  #define MEMBER_TYPE_REQUEST ((struct datatype_member *)(-3))
>  #define STRUCT_SIZE_REQUEST ((struct datatype_member *)(-4))
>  #define MEMBER_TYPE_NAME_REQUEST ((struct datatype_member *)(-5))
> +#define ANON_MEMBER_SIZE_REQUEST ((struct datatype_member *)(-6))
>  
>  #define STRUCT_SIZE(X)  datatype_info((X), NULL, STRUCT_SIZE_REQUEST)
>  #define UNION_SIZE(X)   datatype_info((X), NULL, STRUCT_SIZE_REQUEST)
> @@ -2294,6 +2295,7 @@ struct array_table {
>  #define MEMBER_TYPE(X,Y)datatype_info((X), (Y), MEMBER_TYPE_REQUEST)
>  #define MEMBER_TYPE_NAME(X,Y)((char *)datatype_info((X), (Y),
>  MEMBER_TYPE_NAME_REQUEST))
>  #define ANON_MEMBER_OFFSET(X,Y)datatype_info((X), (Y),
>  ANON_MEMBER_OFFSET_REQUEST)
> +#define ANON_MEMBER_SIZE(X,Y)datatype_info((X), (Y),
> ANON_MEMBER_SIZE_REQUEST)
>  
>  /*
>   *  The following set of macros can only be used with pre-intialized fields
> diff --git a/symbols.c b/symbols.c
> index 9c3032d..b32e480 100644
> --- a/symbols.c
> +++ b/symbols.c
> @@ -145,6 +145,7 @@ static void print_union(char *, ulong);
>  static void dump_datatype_member(FILE *, struct datatype_member *);
>  static void dump_datatype_flags(ulong, FILE *);
>  static long anon_member_offset(char *, char *);
> +static long anon_member_size(char *, char *);
>  static int gdb_whatis(char *);
>  static void do_datatype_declaration(struct datatype_member *, ulong);
>  static int member_to_datatype(char *, struct datatype_member *, ulong);
> @@ -5604,14 +5605,17 @@ long
>  datatype_info(char *name, char *member, struct datatype_member *dm)
>  {
>   struct gnu_request *req;
> -long offset, size, member_size;
> + long offset, size, member_size;
>   int member_typecode;
> -ulong type_found;
> + ulong type_found;
>   char buf[BUFSIZE];
>  
> -if (dm == ANON_MEMBER_OFFSET_REQUEST)
> + if (dm == ANON_MEMBER_OFFSET_REQUEST)
>   return anon_member_offset(name, member);
>  
> + if (dm == ANON_MEMBER_SIZE_REQUEST)
> + return anon_member_size(name, member);
> +
>   strcpy(buf, name);
>  
>   req = (struct gnu_request *)GETBUF(sizeof(struct gnu_request));
> @@ -5828,6 +5832,46 @@ retry:
>   return value;
>  }
>  
> +/*
> + *  Determine the size of a member in an anonymous union
> + *  in a structure or union.
> + */
> +static long
> +anon_member_size(char *name, char *member)
> +{
> + char buf[BUFSIZE];
> + ulong value;
> + int type;
> +
> + value = -1;
> + type = STRUCT_REQUEST;
> + sprintf(buf, "printf \"%%ld\", (u64)(&((struct %s*)0)->%s + 1) -
> (u64)&((struct %s*)0)->%s",
> + name, member, name, member);
> + open_tmpfile2();
> +retry:
> + if (gdb_pass_through(buf, pc->tmpfile2, GNU_RETURN_ON_ERROR)) {
> + rewind(pc->tmpfile2);
> + if (fgets(buf, BUFSIZE, pc->tmpfile2)) {
> + if (hexadecimal(buf, 0))
> + value = htol(buf, RETURN_ON_ERROR|QUIET, NULL);
> + else if (STRNEQ(buf, "(nil)"))
> + value = 0;
> + }
> + }
> +
> + if ((value == -1) && (type == STRUCT_REQUEST)) {
> + type = UNION_REQUEST;
> + sprintf(buf, "printf \"%%ld\", (u64)(&((union %s*)0)->%s + 1) -
> (u64)&((union %s*)0)->%s",
> + name, member, name, member);
> + rewind(pc->tmpfile2);
> + goto retry;
> + }
> +
> + close_tmpfile2();
> +
> + return value;
> +}
> +
>  /*
>   *  Get the basic type info for a symbol.  Let the caller pass in the
>   *  gnu_request structure to have access to the full response; in either
> @@ -6617,6 +6661,9 @@ do_datatype_addr(struct datatype_member *dm, ulong
> addr, int count,
>   i = 0;
>   do {
>   if (argc_members) {
> + if (argc_members > 1 && flags & SHOW_RAW_DATA)
> + error(FATAL,
> +   "only up to one member-specific 
> output allowed with -r\n");
>   /* This call works fine with fields
>   

Re: [Crash-utility] [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-01 Thread Dave Anderson


- Original Message -
> Hi,Dave
> Zram is a virtual device,it simulated as a block device,it's part of
> memroy/ramdump,just enable  CONFIG_ZRAM,no other settings needed.
> you can refer to drivers/block/zram/zram_drv.c
> driver calling zram_meta_alloc to alloc memory from RAM.
> 
> We want to be able to access these zram page like a normal page.

I understand all that.  I'm just curious how makedumpfile will handle/filter
the physical RAM pages that make up the zram block device.

Anyway, send a patch and I'll take a look.

Dave


> 
> ____
> From: Dave Anderson 
> Sent: Wednesday, April 1, 2020 23:24
> To: 赵乾利
> Cc: d hatayama; Discussion list for crash utility usage, maintenance and
> development
> Subject: Re: [External Mail]Re: [Crash-utility] zram decompress support for
> gcore/crash-utility
> 
> - Original Message -
> > Hi,Dave
> > zram is same with other swap device,but every swaped page will be
> > compressed then saved to another memory address.
> > The process is same with the common swap device,non-swap just a normal user
> > address,pgd and mmu will translate to phy address
> >
> > please refer to below information:
> > crash> vm -p
> > PID: 1565   TASK: ffe1fce32d00  CPU: 7   COMMAND: "system_server"
> >MM   PGD  RSSTOTAL_VM
> > ffe264431c00  ffe1f54ad000  528472k  9780384k
> >   VMA   START   END FLAGS FILE
> > ffe0ea401300   12c0   12e0 100073
> > VIRTUAL PHYSICAL
> > ...
> > 144fc000SWAP: /dev/block/zram0  OFFSET: 236750
> > ...
> > 1738e000SWAP: /dev/block/zram0  OFFSET: 73426
> > 1738f000   21aa2c000
> > 1739   1c3308000
> > 17391000SWAP: /dev/block/zram0  OFFSET: 73431
> > 17392000   19c162000
> > 17393000   19c132000
> > 17394000SWAP: /dev/block/zram0  OFFSET: 234576
> > 17395000   19c369000
> > 17396000   20b35c000
> > 17397000   18011e000
> > 17398000SWAP: /dev/block/zram0  OFFSET: 73433
> > 17399000   1dc3d2000
> > 1739a000   1bc59f000
> > 1739b000SWAP: /dev/block/zram0  OFFSET: 73437
> >
> >
> > crash> vtop -c 1565 144fc000
> > VIRTUAL PHYSICAL
> > 144fc000(not mapped)
> >
> > PAGE DIRECTORY: ffe1f54ad000
> >PGD: ffe1f54ad000 => 1f54ab003
> >PMD: ffe1f54ab510 => 1f43b8003
> >PTE: ffe1f43b87e0 => 39cce00
> >
> >   PTE  SWAPOFFSET
> > 39cce00  /dev/block/zram0  236750
> >
> >   VMA   START   END FLAGS FILE
> > ffe148bafe40   144c   1454 100073
> >
> > SWAP: /dev/block/zram0  OFFSET: 236750
> 
> Ok, so with respect to user-space virtual addresses, there is nothing
> other than handling zram swap-backed memory.
> 
> So what you're proposing is that when reading user-space memory
> that happens to be backed-up on a zram swap device, then the user
> data could alternatively be read from the zram swap device, and
> presented as if it were present in physical memory?
> 
> Are the physical RAM pages that make up the contents of a zram
> device collected with a typical filtered compressed kdump?  If not,
> what makedumpfile -d flag is required for them to be captured?
> 
> Dave
> 
> 
> #/**本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> This e-mail and its attachments contain confidential information from
> XIAOMI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!**/#
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] [External Mail]Re: zram decompress support for gcore/crash-utility

2020-04-01 Thread Dave Anderson



- Original Message -
> Hi,Dave
> zram is same with other swap device,but every swaped page will be compressed 
> then saved to another memory address.
> The process is same with the common swap device,non-swap just a normal user 
> address,pgd and mmu will translate to phy address
> 
> please refer to below information:
> crash> vm -p
> PID: 1565   TASK: ffe1fce32d00  CPU: 7   COMMAND: "system_server"
>MM   PGD  RSSTOTAL_VM
> ffe264431c00  ffe1f54ad000  528472k  9780384k
>   VMA   START   END FLAGS FILE
> ffe0ea401300   12c0   12e0 100073
> VIRTUAL PHYSICAL
> ...
> 144fc000SWAP: /dev/block/zram0  OFFSET: 236750
> ...
> 1738e000SWAP: /dev/block/zram0  OFFSET: 73426
> 1738f000   21aa2c000
> 1739   1c3308000
> 17391000SWAP: /dev/block/zram0  OFFSET: 73431
> 17392000   19c162000
> 17393000   19c132000
> 17394000SWAP: /dev/block/zram0  OFFSET: 234576
> 17395000   19c369000
> 17396000   20b35c000
> 17397000   18011e000
> 17398000SWAP: /dev/block/zram0  OFFSET: 73433
> 17399000   1dc3d2000
> 1739a000   1bc59f000
> 1739b000SWAP: /dev/block/zram0  OFFSET: 73437
> 
> 
> crash> vtop -c 1565 144fc000
> VIRTUAL PHYSICAL
> 144fc000(not mapped)
> 
> PAGE DIRECTORY: ffe1f54ad000
>PGD: ffe1f54ad000 => 1f54ab003
>PMD: ffe1f54ab510 => 1f43b8003
>PTE: ffe1f43b87e0 => 39cce00
> 
>   PTE  SWAPOFFSET
> 39cce00  /dev/block/zram0  236750
> 
>   VMA   START   END FLAGS FILE
> ffe148bafe40   144c   1454 100073
> 
> SWAP: /dev/block/zram0  OFFSET: 236750

Ok, so with respect to user-space virtual addresses, there is nothing
other than handling zram swap-backed memory.

So what you're proposing is that when reading user-space memory
that happens to be backed-up on a zram swap device, then the user
data could alternatively be read from the zram swap device, and
presented as if it were present in physical memory?

Are the physical RAM pages that make up the contents of a zram
device collected with a typical filtered compressed kdump?  If not,
what makedumpfile -d flag is required for them to be captured?

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] zram decompress support for gcore/crash-utility

2020-04-01 Thread Dave Anderson



- Original Message -

...

> > 
> > As far as the gcore extension module, that is maintained by Daisuke 
> > Hatayama,
> > and he make all decisions w/respect to that codebase.  I've cc'd this 
> > response
> > to him.
> 
> Thanks Zhao for your patch set.
> Thanks for ccing me, Dave.
> 
> I agree that ZRAM support is useful as your explanation. On the other
> hand, it is not only for crash gcore command, but also for crash utility. I 
> think
> it more natural than the current implementation of your patch set that you
> implement a ZRAM support in crash utility and then do it in crash gcore 
> command.
> 
> If the ZRAM support were transparent to readmem() interface, there would be 
> no need
> to implement crash gcore command at all. If not, there would be need to add a 
> new code
> for the ZRAM support just corresponding to the following stanza in
> 0001-gcore-add-support-zram-swap.patch:
> 
> @@ -225,6 +417,18 @@ void gcore_coredump(void)
>   strerror(errno));
> } else {
> pagefaultf("page fault at %lx\n", addr);
> +   if (paddr != 0) {
> +   pte_val = paddr;
> +   if(try_zram_decompress(pte_val, 
> (unsigned char *)buffer) == PAGE_SIZE)
> +   {
> +   error(WARNING, "zram 
> decompress successed\n");
> +   if (fwrite(buffer, PAGE_SIZE, 
> 1, gcore->fp) != 1)
> +   error(FATAL, "%s: 
> write: %s\n", gcore->corename, strerror(errno));
> +   continue;
> +   }
> +  
> +  }

I'm not clear on how zram is linked into the user-space mapping.  For user 
space that
has been swapped out to a zram swap device, I presume it's the same as is, but 
it 
references the zram swap device.  But for other user-space mappings 
(non-swapped),
what does the "vm -p" display for user space virtual address pages that are 
backed
by zram?  And for that matter, what does "vtop " show?

Thanks,
  Dave
 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] zram decompress support for gcore/crash-utility

2020-03-31 Thread Dave Anderson


- Original Message -
> 
> 
> Hello list,
> 
> When i try to use gcore to parse coredump from fulldump,I got below issue
> that make the coredump unavailable.
> 
> 1. Zram is very common feature,in current android system supports zram
> swap,but gcore/crash-utility does not support,even zram swap can decoded
> from ram,many page fault due to this reason.
> 
> I added zram decompress feature to gcore,and i ’ m also considering wheather
> support zram in crash-utility,but for this feature,i have to add miniLZO to
> codebase, I'm not sure if it's acceptable,plase help give some advice.
> 
> miniLZO :miniLZO is a very lightweight subset of the LZO library,Distributed
> under the terms of the GNU General Public License (GPL v2+).

If you build the upstream crash utility package with "make lzo", it will bring
in those libraries, which are required for dumpfiles compressed with LZO.  You
can enter the make command after the package has been built, as it simply 
re-compiles diskdump.c and includes the libraries in the build.

As far as the gcore extension module, that is maintained by Daisuke Hatayama,
and he make all decisions w/respect to that codebase.  I've cc'd this response
to him.

Dave


.
> 
> http://www.oberhumer.com/opensource/lzo/
> 
> 
> 
> This change is a bit big,I attached it to the mail,if attachment is not
> available,you can also see these patch in github:
> https://github.com/zhaoqianli0202/crash-gcore/commits/upstream
> 
> Please review.
> 
> 
> 
> 2. For historical reasons,kernel reserved top 8/16 bytes of stacks,but after
> kernel-4.14, this reservation was cancelled,so gcore needs to improve
> compatibility.
> 
> kernel change as below:
> 
> commit 34be98f4944f99076f049a6806fc5f5207a755d3
> 
> Author: Ard Biesheuvel < ard.biesheu...@linaro.org >
> 
> Date: Thu Jul 20 17:15:45 2017 +0100
> 
> 
> 
> arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP
> 
> 
> 
> For historical reasons, we leave the top 16 bytes of our task and IRQ
> 
> stacks unused, a practice used to ensure that the SP can always be
> 
> masked to find the base of the current stack (historically, where
> 
> thread_info could be found).™
> 
> 
> 
> Patch for this issue:
> 
> commit d1031df4617351a58b8edfb0121c306baaa34f9d
> 
> Author: zhaoqianli 
> 
> Date: Mon Mar 30 12:07:02 2020 +0800
> 
> 
> 
> gcore: ARM/ARM64 reserved 8/16 byte in the top of stacks before 4.14,
> 
> but this reservation was removed after 4.14
> 
> Without this patch,gcore counld't parse full callstack in version after 4.14
> 
> 
> 
> diff --git a/gcore.c b/gcore.c

> 
> index f75701d..f6e1787 100644
> 
> --- a/gcore.c
> 
> +++ b/gcore.c
> 
> @@ -558,4 +558,16 @@ static void gcore_machdep_init(void)
> 
> 
> 
> if (!gcore_arch_vsyscall_has_vm_alwaysdump_flag())
> 

> gcore_machdep->vm_alwaysdump = 0x;
> 
> +
> 172272617227261722726
> +#if defined(ARM) || defined(ARM64)
> 
> +#ifdef ARM
> 
> +#define STACK_RESERVE_SIZE 8
> 
> +#else

> 
> +#define STACK_RESERVE_SIZE 16
> 
> +#endif
> 

> + if (THIS_KERNEL_VERSION >= LINUX(4,14,0))
> 
> + gcore_machdep->stack_reserve = 0;
> 
> + else
> 
> + gcore_machdep->stack_reserve = STACK_RESERVE_SIZE;
> 
> +#endif
> 
> }
> 
> diff --git a/libgcore/gcore_arm.c b/libgcore/gcore_arm.c
> 
> index 891d01e..c8aefdf 100644
> 
> --- a/libgcore/gcore_arm.c
> 
> +++ b/libgcore/gcore_arm.c
> 
> @@ -29,7 +29,7 @@ static int gpr_get(struct task_context *target,
> 
> 
> 
> BZERO(regs, sizeof(*regs));
> 
> 
> 
> - readmem(machdep->get_stacktop(target->task) - 8 - SIZE(pt_regs), KVADDR,
> 
> + readmem(machdep->get_stacktop(target->task) - gcore_machdep->stack_reserve
> - SIZE(pt_regs), KVADDR,
> 
> regs, SIZE(pt_regs), "genregs_get: pt_regs",
> 
> gcore_verbose_error_handle());
> 
> 
> 
> diff --git a/libgcore/gcore_arm64.c b/libgcore/gcore_arm64.c
> 
> index 3257389..ed3fdc8 100644
> 
> --- a/libgcore/gcore_arm64.c
> 
> +++ b/libgcore/gcore_arm64.c
> 
> @@ -28,7 +28,7 @@ static int gpr_get(struct task_context *target,
> 
> 
> 
> BZERO(regs, sizeof(*regs));
> 
> 
> 
> - readmem(machdep->get_stacktop(target->task) - 16 - SIZE(pt_regs), KVADDR,
> 
> + readmem(machdep->get_stacktop(target->task) - gcore_machdep->stack_reserve
> - SIZE(pt_regs), KVADDR,
> 
> regs, sizeof(struct user_pt_regs), "gpr_get: user_pt_regs",
> 
> gcore_verbose_error_handle());
> 
> 
> 
> @@ -124,7 +124,7 @@ static int compat_gpr_get(struct task_context *target,
> 
> BZERO(_regs, sizeof(pt_regs));
> 
> BZERO(regs, sizeof(*regs));
> 
> 
> 
> - readmem(machdep->get_stacktop(target->task) - 16 - SIZE(pt_regs), KVADDR,
> 
> + readmem(machdep->get_stacktop(target->task) - gcore_machdep->stack_reserve
> - SIZE(pt_regs), KVADDR,
> 
> _regs, sizeof(struct pt_regs), "compat_gpr_get: pt_regs",
> 
> gcore_verbose_error_handle());
> 
> 
> 
> diff --git a/libgcore/gcore_defs.h b/libgcore/gcore_defs.h
> 
> index b0f5603..f31036c 100644
> 
> --- a/libgcore/gcore_defs.h
> 
> +++ b/libgcore/gcore_defs.h
> 
> @@ -1177,6 +1177,7 @@ extern struct gcore_size_table 

Re: [Crash-utility] help debug number of CPU detect failure

2020-03-10 Thread Dave Anderson



- Original Message -
> 
> 
> - Original Message -
> 
> ...
> 
> > Hi Dave,
> > 
> > I did some more experiments and found that it is nothing to do with numa.
> >
> > I also found that the issue gets resolved when I insert "SYMBOL(_stext)="
> > into vmcoreinfo.
> > Meaning sometime crash needs _stext value along with kaslr & phys_base.
> >
> > Thanks,
> > Santosh
> 
> There is this comment in kaslr_init():
> 
>  *  Setting RELOC_AUTO will ensure that derive_kaslr_offset() is
>  *  called after the sorting operation has captured the vmlinux
>  *  file's "_stext" symbol value -- which it will compare to the
>  *  relocated "_stext" value found in either a dumpfile's vmcoreinfo
>  *  or in /proc/kallsyms on a live system.
> 
> You should report that back to the vm2core developers to fix that
> half-baked ELF header.
> 
> For example, when the KVM "virsh dump" feature is used by a KVM
> host to create a dumpfile of a guest VM, they have a mechanism in
> place that transfers the vmcoreinfo data from the guest back to
> the host, resulting in a vmcore that is identical to a bare-metal
> vmcore.

By the way, I note on the vm2core github site, they suggest using
the guest's System.map file on the command line (in addition to 
--kaslr and --machdep phys_base=).  Doing that would supply
the _stext value.  

Dave



--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] help debug number of CPU detect failure

2020-03-10 Thread Dave Anderson



- Original Message -

...

> Hi Dave,
> 
> I did some more experiments and found that it is nothing to do with numa.
>
> I also found that the issue gets resolved when I insert "SYMBOL(_stext)=" 
> into vmcoreinfo.
> Meaning sometime crash needs _stext value along with kaslr & phys_base.
>
> Thanks,
> Santosh

There is this comment in kaslr_init():

 *  Setting RELOC_AUTO will ensure that derive_kaslr_offset() is
 *  called after the sorting operation has captured the vmlinux
 *  file's "_stext" symbol value -- which it will compare to the
 *  relocated "_stext" value found in either a dumpfile's vmcoreinfo
 *  or in /proc/kallsyms on a live system.

You should report that back to the vm2core developers to fix that
half-baked ELF header.  

For example, when the KVM "virsh dump" feature is used by a KVM
host to create a dumpfile of a guest VM, they have a mechanism in
place that transfers the vmcoreinfo data from the guest back to
the host, resulting in a vmcore that is identical to a bare-metal
vmcore.

Dave
 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH 0/1] extensions: proccgroup.c

2020-03-09 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> I build the proccgroup extensions on
> https://people.redhat.com/anderson/extensions/proccgroup.c.
> and find a bug on centos 7. the following patch fix it.

Normally the owner/maintainer listed is responsible to accept any 
changes to their module.  But since it's an obvious fix, I have gone
ahead and updated https://people.redhat.com/anderson/extensions/proccgroup.c
and I've cc'd the owner/maintainer.

> I also want to ask another question:
> Why not put various extensions in the extensions directory of the source code?

Because that would imply some expectation of ongoing support, and I am 
absolutely not 
interested in doing that.  

Thanks,
  Dave


> 
> Wang Long (1):
>   extensions: proccgroup: fix the wrong method which detect whether to
> support getting subsys name
> 
>  extensions/proccgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --
> 1.8.3.1
> 
> 
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] help debug number of CPU detect failure

2020-03-06 Thread Dave Anderson



- Original Message -
> On Thu, Mar 5, 2020 at 1:07 PM Santosh  wrote:
> >
> > On Thu, Mar 5, 2020 at 12:54 PM Dave Anderson  wrote:
> > >
> > > > > I suspect that it's a problem with either the --kaslr offset and/or
> > > > > the phys_base value that you have used.
> > > >
> > > > Is there method to know or print kaslr & phy_base in a running Linux
> > > > system?
> > >
> > > They are normally passed in the VMCOREINFO data that is contained in an
> > > ELF PT_NOTE
> > > in the dumpfile header.  For example, here's a dump of the normal
> > > VMCOREINFO data,
> > > where the phys_base and KASLR offsets are down near the bottom:
> > >
> > >   OSRELEASE=4.18.0-185.el8.x86_64
> > >   PAGESIZE=4096
> > >   SYMBOL(init_uts_ns)=bd812540
> > >   SYMBOL(node_online_map)=bda0f520
> > >   SYMBOL(swapper_pg_dir)=bd80a000
> > >   SYMBOL(_stext)=bc60
> > >   SYMBOL(vmap_area_list)=bd8d78b0
> > >   SYMBOL(mem_section)=956a3ffd2000
> > >   LENGTH(mem_section)=2048
> > >   SIZE(mem_section)=16
> > >   OFFSET(mem_section.section_mem_map)=0
> > >   SIZE(page)=64
> > >   SIZE(pglist_data)=171968
> > >   SIZE(zone)=1472
> > >   SIZE(free_area)=88
> > >   SIZE(list_head)=16
> > >   SIZE(nodemask_t)=128
> > >   OFFSET(page.flags)=0
> > >   OFFSET(page._refcount)=52
> > >   OFFSET(page.mapping)=24
> > >   OFFSET(page.lru)=8
> > >   OFFSET(page._mapcount)=48
> > >   OFFSET(page.private)=40
> > >   OFFSET(page.compound_dtor)=16
> > >   OFFSET(page.compound_order)=17
> > >   OFFSET(page.compound_head)=8
> > >   OFFSET(pglist_data.node_zones)=0
> > >   OFFSET(pglist_data.nr_zones)=171232
> > >   OFFSET(pglist_data.node_start_pfn)=171240
> > >   OFFSET(pglist_data.node_spanned_pages)=171256
> > >   OFFSET(pglist_data.node_id)=171264
> > >   OFFSET(zone.free_area)=192
> > >   OFFSET(zone.vm_stat)=1296
> > >   OFFSET(zone.spanned_pages)=112
> > >   OFFSET(free_area.free_list)=0
> > >   OFFSET(list_head.next)=0
> > >   OFFSET(list_head.prev)=8
> > >   OFFSET(vmap_area.va_start)=0
> > >   OFFSET(vmap_area.list)=48
> > >   LENGTH(zone.free_area)=11
> > >   SYMBOL(log_buf)=bd85b140
> > >   SYMBOL(log_buf_len)=bd85b13c
> > >   SYMBOL(log_first_idx)=be319778
> > >   SYMBOL(clear_idx)=be319744
> > >   SYMBOL(log_next_idx)=be319768
> > >   SIZE(printk_log)=16
> > >   OFFSET(printk_log.ts_nsec)=0
> > >   OFFSET(printk_log.len)=8
> > >   OFFSET(printk_log.text_len)=10
> > >   OFFSET(printk_log.dict_len)=12
> > >   LENGTH(free_area.free_list)=5
> > >   NUMBER(NR_FREE_PAGES)=0
> > >   NUMBER(PG_lru)=5
> > >   NUMBER(PG_private)=12
> > >   NUMBER(PG_swapcache)=9
> > >   NUMBER(PG_swapbacked)=18
> > >   NUMBER(PG_slab)=8
> > >   NUMBER(PG_hwpoison)=22
> > >   NUMBER(PG_head_mask)=32768
> > >   NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
> > >   NUMBER(HUGETLB_PAGE_DTOR)=2
> > >   NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
> > >===>   NUMBER(phys_base)=16437477376
> > >   SYMBOL(init_top_

Re: [Crash-utility] help debug number of CPU detect failure

2020-03-05 Thread Dave Anderson



- Original Message -

> > Is there method to know or print kaslr & phy_base in a running Linux
> > system?
> 
> Got it.
> 
> crash> p vmcoreinfo_data+1600
> $12 = (unsigned char *) 0x90ff7cdc3640
> "poison)=22\nNUMBER(PG_head_mask)=32768\nNUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128\nNUMBER(HUGETLB_PAGE_DTOR)=2\nNUMBER(phys_base)=-499122176\nSYMBOL(init_top_pgt)=a200a000\nSYMBOL(node_data)=a225d780\nLENGTH(node_data)=1024\nKERNELOFFSET=1fc0\nNUMB"...

Right -- that's easier than my first suggestion.  And on more recent kernels,
the VMCOREINFO data has also been added to /proc/kcore, so you could also
get it by entering "help -D".

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] help debug number of CPU detect failure

2020-03-05 Thread Dave Anderson
> > I suspect that it's a problem with either the --kaslr offset and/or
> > the phys_base value that you have used.
> 
> Is there method to know or print kaslr & phy_base in a running Linux system?

They are normally passed in the VMCOREINFO data that is contained in an ELF 
PT_NOTE
in the dumpfile header.  For example, here's a dump of the normal VMCOREINFO 
data,
where the phys_base and KASLR offsets are down near the bottom:

  OSRELEASE=4.18.0-185.el8.x86_64
  PAGESIZE=4096
  SYMBOL(init_uts_ns)=bd812540
  SYMBOL(node_online_map)=bda0f520
  SYMBOL(swapper_pg_dir)=bd80a000
  SYMBOL(_stext)=bc60
  SYMBOL(vmap_area_list)=bd8d78b0
  SYMBOL(mem_section)=956a3ffd2000
  LENGTH(mem_section)=2048
  SIZE(mem_section)=16
  OFFSET(mem_section.section_mem_map)=0
  SIZE(page)=64
  SIZE(pglist_data)=171968
  SIZE(zone)=1472
  SIZE(free_area)=88
  SIZE(list_head)=16
  SIZE(nodemask_t)=128
  OFFSET(page.flags)=0
  OFFSET(page._refcount)=52
  OFFSET(page.mapping)=24
  OFFSET(page.lru)=8
  OFFSET(page._mapcount)=48
  OFFSET(page.private)=40
  OFFSET(page.compound_dtor)=16
  OFFSET(page.compound_order)=17
  OFFSET(page.compound_head)=8
  OFFSET(pglist_data.node_zones)=0
  OFFSET(pglist_data.nr_zones)=171232
  OFFSET(pglist_data.node_start_pfn)=171240
  OFFSET(pglist_data.node_spanned_pages)=171256
  OFFSET(pglist_data.node_id)=171264
  OFFSET(zone.free_area)=192
  OFFSET(zone.vm_stat)=1296
  OFFSET(zone.spanned_pages)=112
  OFFSET(free_area.free_list)=0
  OFFSET(list_head.next)=0
  OFFSET(list_head.prev)=8
  OFFSET(vmap_area.va_start)=0
  OFFSET(vmap_area.list)=48
  LENGTH(zone.free_area)=11
  SYMBOL(log_buf)=bd85b140
  SYMBOL(log_buf_len)=bd85b13c
  SYMBOL(log_first_idx)=be319778
  SYMBOL(clear_idx)=be319744
  SYMBOL(log_next_idx)=be319768
  SIZE(printk_log)=16
  OFFSET(printk_log.ts_nsec)=0
  OFFSET(printk_log.len)=8
  OFFSET(printk_log.text_len)=10
  OFFSET(printk_log.dict_len)=12
  LENGTH(free_area.free_list)=5
  NUMBER(NR_FREE_PAGES)=0
  NUMBER(PG_lru)=5
  NUMBER(PG_private)=12
  NUMBER(PG_swapcache)=9
  NUMBER(PG_swapbacked)=18
  NUMBER(PG_slab)=8
  NUMBER(PG_hwpoison)=22
  NUMBER(PG_head_mask)=32768
  NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
  NUMBER(HUGETLB_PAGE_DTOR)=2
  NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
   ===>   NUMBER(phys_base)=16437477376
  SYMBOL(init_top_pgt)=bd80a000
  NUMBER(pgtable_l5_enabled)=0
  SYMBOL(node_data)=bda0ad20
  LENGTH(node_data)=1024
   ===>   KERNELOFFSET=3b60
  NUMBER(KERNEL_IMAGE_SIZE)=1073741824
  NUMBER(sme_mask)=0
  CRASHTIME=1583350919

But in your Azure-generated dumpfile, I note that each cpu's NT_PRSTATUS note
contains junk data, and while does have a VMCOREINFO note, it contains this:

Elf64_Nhdr:
   n_namesz: 11 ("VMCOREINFO")
   n_descsz: 42
 n_type: 0 (unused)
 FAKE1=IGNORE1
 FAKE2=IGNORE2
 FAKE3=IGNORE3

So that's why you need to pass in the two arguments.

Now, the crash utility should be able to be brought up successfully
on a live system without passing the arguments.  And once you've done
that, you could get the values like this:  

  crash> help -m | grep phys_base
  phys_base: 3d3c0
  crash> help -k | grep relocate
relocate: c4a0  (KASLR offset: 3b60 / 950MB)
  crash> 

But since they change with each reboot, you would have to capture them
while running on the live system, and save them somewhere for a subsequent
crash.  So 

Re: [Crash-utility] [PATCH] Add eBPF program name to "bpf -p|-P" options output

2020-03-05 Thread Dave Anderson



- Original Message -
> "bpftool prog list" command displays eBPF program name if available.
> Also, the crash "bpf -m|-M" options display eBPF map name.  But the
> "bpf -p|-P" options don't display its name.  It would be useful in
> finding the program which we want to see.

Looks good -- queued for crash-7.2.9:
  
  
https://github.com/crash-utility/crash/commit/007f844e6ddd53b777f6e0eb3261309bbc5e3fee

Thanks,
  Dave


> 
> Signed-off-by: Kazuhito Hagio 
> ---
>  bpf.c | 12 
>  defs.h|  1 +
>  help.c|  3 ++-
>  symbols.c |  2 ++
>  4 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/bpf.c b/bpf.c
> index 39ced88..cb6b0ed 100644
> --- a/bpf.c
> +++ b/bpf.c
> @@ -194,6 +194,7 @@ bpf_init(struct bpf_info *bpf)
>   MEMBER_OFFSET_INIT(bpf_prog_pages, "bpf_prog", "pages");
>   MEMBER_OFFSET_INIT(bpf_prog_aux_load_time, "bpf_prog_aux", 
> "load_time");
>   MEMBER_OFFSET_INIT(bpf_prog_aux_user, "bpf_prog_aux", "user");
> + MEMBER_OFFSET_INIT(bpf_prog_aux_name, "bpf_prog_aux", "name");
>   MEMBER_OFFSET_INIT(bpf_map_key_size, "bpf_map", "key_size");
>   MEMBER_OFFSET_INIT(bpf_map_value_size, "bpf_map", "value_size");
>   MEMBER_OFFSET_INIT(bpf_map_max_entries, "bpf_map", 
> "max_entries");
> @@ -452,6 +453,17 @@ do_bpf(ulong flags, ulong prog_id, ulong map_id, int
> radix)
>   bpf_prog_gpl_compatible(buf1, 
> (ulong)bpf->proglist[i].value);
>   fprintf(fp, " GPL_COMPATIBLE: %s", buf1);
>  
> + fprintf(fp, "  NAME: ");
> + if (VALID_MEMBER(bpf_prog_aux_name)) {
> + 
> BCOPY(>bpf_prog_aux_buf[OFFSET(bpf_prog_aux_name)], buf1, 16);
> + buf1[16] = NULLCHAR;
> + if (strlen(buf1))
> + fprintf(fp, "\"%s\"", buf1);
> + else
> + fprintf(fp, "(unused)");
> + } else
> + fprintf(fp, "(unknown)");
> +
>   fprintf(fp, "  UID: ");
>   if (VALID_MEMBER(bpf_prog_aux_user) && 
> VALID_MEMBER(user_struct_uid)) {
>   user = ULONG(bpf->bpf_prog_aux_buf + 
> OFFSET(bpf_prog_aux_user));
> diff --git a/defs.h b/defs.h
> index fbd19b0..e852ddf 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -2078,6 +2078,7 @@ struct offset_table {/* stash of
> commonly-used offsets */
>   long bpf_map_memory;
>   long bpf_map_memory_pages;
>   long bpf_map_memory_user;
> + long bpf_prog_aux_name;
>  };
>  
>  struct size_table { /* stash of commonly-used sizes */
> diff --git a/help.c b/help.c
> index 5c313af..eda5ce9 100644
> --- a/help.c
> +++ b/help.c
> @@ -2412,7 +2412,8 @@ char *help_bpf[] = {
>  "-p ID  displays the basic information specific to the program ID, plus
>  the",
>  "   size in bytes of its translated bytecode, the size in bytes of
>  its",
>  "   jited code, the number of bytes locked into memory, the time
>  that",
> -"   the program was loaded, whether it is GPL compatible, and its
> UID.",
> +"   the program was loaded, whether it is GPL compatible, its name",
> +"   string, and its UID.",
>  "-P same as -p, but displays the basic and extra data for all
>  programs.",
>  "-m ID  displays the basic information specific to the map ID, plus
>  the",
>  "   size in bytes of its key and value, the maximum number of
>  key-value",
> diff --git a/symbols.c b/symbols.c
> index f1f659b..9c3032d 100644
> --- a/symbols.c
> +++ b/symbols.c
> @@ -10494,6 +10494,8 @@ dump_offset_table(char *spec, ulong makestruct)
>   OFFSET(bpf_prog_aux_load_time));
>   fprintf(fp, " bpf_prog_aux_user: %ld\n",
>   OFFSET(bpf_prog_aux_user));
> + fprintf(fp, " bpf_prog_aux_name: %ld\n",
> + OFFSET(bpf_prog_aux_name));
>   fprintf(fp, "   user_struct_uid: %ld\n",
>   OFFSET(user_struct_uid));
>  
> --
> 2.24.1
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] help debug number of CPU detect failure

2020-03-04 Thread Dave Anderson
> Hello List,
>
> I've a two ELF coredumps from two different HyperV VMs generated by this
> tool (https://github.com/Azure/azure-linux-utils/tree/master/vm2core).
>
> Crash works with one of these coredumps but do not work with other.
>
> I've placed the output generated by crash tool here:
>
> Not ok with crash:
> ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
> vm1_numa_4gb_5cpu.coredump --kaslr 60 -m phys_base=4355784704 -d8
>  https://raw.githubusercontent.com/santoshx/temp/master/notok_with_crash.txt
>
> Ok with crash:
>  ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
> vm1_nonuma_4gb_5cpu.coredump --kaslr 3c0 -m phys_base=2344615936 -d8
>  https://raw.githubusercontent.com/santoshx/temp/master/ok_with_crash.txt
>
>
> The problem I see that in non-working case crash fails to detect correct
> cpu_possible_mask:
>
> Relevant part of $ diff ok_with_crash.txt notok_with_crash.txt:
>
> <   cpu_active_mask: cpus: 0 1 2 3 4
> < FREEBUF(0)
> <  7ffe01722870>
> < 
> < read_netdump: addr: 86039f40 paddr: 91c39f40 cnt: 8 offset:
> 91c3a760
> ---
>> > 5638a35a2280>
>> 
>> read_netdump: addr: 826f2b60 paddr: 1060f2b60 cnt: 1024 offset:
>> fe0f3380
>> cpu_possible_mask: cpus: 3 4 5 6 8 13 14 18 20 21 22 26 28 29 30 33 36
>> 37 38 48 49 52 53 54 56 59 60 61 62 64 65 68 69 70 72 73 74 75 76 78 82
>> 83 85 86 90 91 93 94 96 99 101 102 104 105 108 109 110 114 116 117 118
>> 123 124 125 126 128 133 134 138 140 141 142 146 148 149 150 153 156 157
>> 158 168 169 172 173 174 176 179 180 181 182 184 185 188 189 190 192 193
>> 194 195 196 198 200 202 205 206 211 212 213 214 216 219 221 222 226 228
>> 229 230 232 233 234 235 236 238 242 243 245 246 248 251 253 254 256 257
>> 260 261 262 266 268 269 270 275 276 277 278 280 285 286 290 292 293 294
>> 298 300 301 302 305 308 309 310 320 321 324 325 326 328 331 332 333 334
>> 336 337 340 341 342 344 345 346 347 348 350 352 354 357 358 361 362 363
>> 365 366 370 372 373 374 376 378 381 382 385 388 389 390 392 393 394 395
>> 396 398 402 403 405 406 408 411 413 414 416 417 420 421 422 426 428 429
>> 430 435 436 437 438 440 445 446 450 452 453 454 458 460 461 462 465 468
>> 469 470 480 481 484 485 486 488 491 492 493 494 496 497 500 50
>  1 502 504 505 506 507 508 510 514 515 517 518 520 523 525 526 528 529 532
> 533 534 538 540 541 542 547 548 549
>
> I'm trying to find where the problem is? in the crash too or the tool that
> generated the ELF coredumps?

I suspect that it's a problem with either the --kaslr offset and/or
the phys_base value that you have used.

It appears that the read of the cpu_possible mask is not using the
correct virtual address, or perhaps the wrong physical address, and
as a result it is trying to translate bogus data.  In fact, the full
output txt file shows that every thing that it reads is garbage, e.g.,
the cpu masks, the utsname data structure, the linux_banner string, etc.

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] Fix for "bpf -m|-M" options on Linux 5.3 and later

2020-03-04 Thread Dave Anderson


Hi Kazu,

Thanks for catching this -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/af71d71f35372df7308788a6d49b539b75ea19b5

Dave


- Original Message -
> Fix for the "bpf -m|-M" options on Linux 5.3 and later kernels that
> contain commit 3539b96e041c06e4317082816d90ec09160aeb11, titled
> "bpf: group memory related fields in struct bpf_map_memory".
> Without the patch, the options prints "(unknown)" for MEMLOCK and UID.
> 
> Signed-off-by: Kazuhito Hagio 
> ---
>  bpf.c | 22 --
>  defs.h|  3 +++
>  symbols.c |  6 ++
>  3 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/bpf.c b/bpf.c
> index 056e286..39ced88 100644
> --- a/bpf.c
> +++ b/bpf.c
> @@ -202,6 +202,13 @@ bpf_init(struct bpf_info *bpf)
>   MEMBER_OFFSET_INIT(bpf_map_user, "bpf_map", "user");
>   MEMBER_OFFSET_INIT(user_struct_uid, "user_struct", "uid");
>  
> + /* Linux 5.3 */
> + MEMBER_OFFSET_INIT(bpf_map_memory, "bpf_map", "memory");
> + if (VALID_MEMBER(bpf_map_memory)) {
> + MEMBER_OFFSET_INIT(bpf_map_memory_pages, 
> "bpf_map_memory", "pages");
> + MEMBER_OFFSET_INIT(bpf_map_memory_user, 
> "bpf_map_memory", "user");
> + }
> +
>   if (!bpf_type_size_init()) {
>   bpf->status = FALSE;
>   command_not_supported();
> @@ -576,7 +583,11 @@ do_map_only:
>   fprintf(fp, "(unknown)");
>  
>   fprintf(fp, "  MEMLOCK: ");
> - if (VALID_MEMBER(bpf_map_pages)) {
> + if (VALID_MEMBER(bpf_map_memory) && 
> VALID_MEMBER(bpf_map_memory_pages)) {
> + map_pages = UINT(bpf->bpf_map_buf + 
> OFFSET(bpf_map_memory)
> + + OFFSET(bpf_map_memory_pages));
> + fprintf(fp, "%d\n", map_pages * PAGESIZE());
> + } else if (VALID_MEMBER(bpf_map_pages)) {
>   map_pages = UINT(bpf->bpf_map_buf + 
> OFFSET(bpf_map_pages));
>   fprintf(fp, "%d\n", map_pages * PAGESIZE());
>   } else
> @@ -594,8 +605,15 @@ do_map_only:
>   fprintf(fp, "(unknown)\n");
>  
>   fprintf(fp, "  UID: ");
> - if (VALID_MEMBER(bpf_map_user) && 
> VALID_MEMBER(user_struct_uid)) {
> + if (VALID_MEMBER(bpf_map_memory) && 
> VALID_MEMBER(bpf_map_memory_user))
> + user = ULONG(bpf->bpf_map_buf + 
> OFFSET(bpf_map_memory)
> + + OFFSET(bpf_map_memory_user));
> + else if (VALID_MEMBER(bpf_map_user))
>   user = ULONG(bpf->bpf_map_buf + 
> OFFSET(bpf_map_user));
> + else
> + user = 0;
> +
> + if (user && VALID_MEMBER(user_struct_uid)) {
>   if (readmem(user + OFFSET(user_struct_uid), 
> KVADDR, , sizeof(uint),
>   "user_struct.uid", QUIET|RETURN_ON_ERROR))
>   fprintf(fp, "%d\n", uid);
> diff --git a/defs.h b/defs.h
> index ac24a5d..fbd19b0 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -2075,6 +2075,9 @@ struct offset_table {/* stash of
> commonly-used offsets */
>   long device_private_knode_class;
>   long timerqueue_head_rb_root;
>   long rb_root_cached_rb_leftmost;
> + long bpf_map_memory;
> + long bpf_map_memory_pages;
> + long bpf_map_memory_user;
>  };
>  
>  struct size_table { /* stash of commonly-used sizes */
> diff --git a/symbols.c b/symbols.c
> index f04e8b5..f1f659b 100644
> --- a/symbols.c
> +++ b/symbols.c
> @@ -10479,6 +10479,12 @@ dump_offset_table(char *spec, ulong makestruct)
>   OFFSET(bpf_map_name));
>   fprintf(fp, "  bpf_map_user: %ld\n",
>   OFFSET(bpf_map_user));
> + fprintf(fp, "bpf_map_memory: %ld\n",
> + OFFSET(bpf_map_memory));
> + fprintf(fp, "  bpf_map_memory_pages: %ld\n",
> + OFFSET(bpf_map_memory_pages));
> + fprintf(fp, "   bpf_map_memory_user: %ld\n",
> + OFFSET(bpf_map_memory_user));
>  
>   fprintf(fp, " bpf_prog_aux_used_map_cnt: %ld\n",
>   OFFSET(bpf_prog_aux_used_map_cnt));
> --
> 2.24.1
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH V2] extensions: add extend -s option to show all available shared object file

2020-03-03 Thread Dave Anderson


Thanks Wang -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/5dfbc7aa27392a095b207d31654cec7db94dd810
  
Dave


- Original Message -
> When we load extensions, sometime we do not know the exact name of the shared
> object file.
>  
> This patch add -s option for extend cmd to show all available shared object
> file.
> 
> for example:
> 
> crash> extend -s
> ./trace.so
> /usr/lib64/crash/extensions/dminfo.so
> /usr/lib64/crash/extensions/echo.so
> /usr/lib64/crash/extensions/eppic.so
> /usr/lib64/crash/extensions/snap.so
> /usr/lib64/crash/extensions/trace.so
> ./extensions/dminfo.so
> ./extensions/eppic.so
> ./extensions/echo.so
> ./extensions/snap.so
> ./extensions/trace.so
> crash> extend -s -l
> extend: -l and -s are mutually exclusive
> Usage:
>   extend [shared-object ...] | [-u [shared-object ...]] | -s
> Enter "help extend" for details.
> crash> extend -s -u
> extend: -u and -s are mutually exclusive
> Usage:
>   extend [shared-object ...] | [-u [shared-object ...]] | -s
> Enter "help extend" for details.
> crash>
> 
> Also, this patch update the help for extend command:
> add the search order "5. the ./extensions subdirectory of the current
> directory"
> 
> Changes since v1:
> - -s option also check the current working directory
> - fix warning
> 
> Signed-off-by: Wang Long 
> ---
>  extensions.c | 75
>  
>  help.c   |  4 +++-
>  2 files changed, 74 insertions(+), 5 deletions(-)
> 
> diff --git a/extensions.c b/extensions.c
> index 24b91de..d23b1e3 100644
> --- a/extensions.c
> +++ b/extensions.c
> @@ -20,10 +20,13 @@
>  
>  static int in_extensions_library(char *, char *);
>  static char *get_extensions_directory(char *);
> +static void show_all_extensions(void);
> +static void show_extensions(char *);
>  
> -#define DUMP_EXTENSIONS   (0)
> -#define LOAD_EXTENSION(1)
> -#define UNLOAD_EXTENSION  (2)
> +#define DUMP_EXTENSIONS(0)
> +#define LOAD_EXTENSION (1)
> +#define UNLOAD_EXTENSION   (2)
> +#define SHOW_ALL_EXTENSIONS(4)
>  
>  /*
>   *  Load, unload, or list the extension libaries.
> @@ -36,14 +39,30 @@ cmd_extend(void)
>  
>   flag = DUMP_EXTENSIONS;
>  
> -while ((c = getopt(argcnt, args, "lu")) != EOF) {
> +while ((c = getopt(argcnt, args, "lus")) != EOF) {
>  switch(c)
>  {
> + case 's':
> + if (flag & UNLOAD_EXTENSION) {
> + error(INFO,
> + "-s and -u are mutually exclusive\n");
> + argerrs++;
> + }else if (flag & LOAD_EXTENSION) {
> + error(INFO,
> + "-s and -l are mutually exclusive\n");
> + argerrs++;
> + } else
> + flag |= SHOW_ALL_EXTENSIONS;
> + break;
>   case 'l':
>   if (flag & UNLOAD_EXTENSION) {
>   error(INFO,
>   "-l and -u are mutually exclusive\n");
>   argerrs++;
> + } else if (flag & SHOW_ALL_EXTENSIONS) {
> + error(INFO,
> + "-l and -s are mutually exclusive\n");
> + argerrs++;
>   } else
>   flag |= LOAD_EXTENSION;
>   break;
> @@ -53,6 +72,10 @@ cmd_extend(void)
>  error(INFO,
>  "-u and -l are mutually
>  exclusive\n");
>  argerrs++;
> + } else if (flag & SHOW_ALL_EXTENSIONS) {
> + error(INFO,
> + "-u and -s are mutually exclusive\n");
> + argerrs++;
>  } else
>  flag |= UNLOAD_EXTENSION;
>   break;
> @@ -100,6 +123,11 @@ cmd_extend(void)
>   optind++;
>   }
>   break;
> +
> + case SHOW_ALL_EXTENSIONS:
> + show_all_extensions();
> + break;
> +
>   }
>  }
>  
> @@ -182,6 +210,45 @@ dump_extension_table(int verbose)
>   } while ((ext = ext->prev));
>  }
>  
> +static void
> +show_extensions(char *dir) {
> + DIR *dirp;
> + struct dirent *dp;
> + char filename[BUFSIZE*2];
> +
> +dirp = opendir(dir);
> + if (!dirp)
> + return;
> +
> +for (dp = readdir(dirp); dp != NULL; dp = readdir(dirp)) {
> + sprintf(filename, "%s%s%s", dir,
> + LASTCHAR(dir) == '/' ? "" : "/",
> + dp->d_name);
> +
> +  

Re: [Crash-utility] [PATCH] extensions: add extend -s option to show all available shared object file

2020-03-02 Thread Dave Anderson


- Original Message -
> When we load extensions, sometime we do not know the exact name of the shared
> object file.
> 
> This patch add -s option for extend cmd to show all available shared object
> file.

Hello Wang,

I think this patch is a good idea.  Couple things, though...

Building with "make warn" shows these warnings:

  $ make warn
  ...
  cc -c -g -DX86_64 -DLZO -DSNAPPY -DGDB_7_6  extensions.c -Wall -O2 
-Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security 
  extensions.c:23:1: warning: function declaration isn’t a prototype 
[-Wstrict-prototypes]
   static void show_all_extensions();
   ^
  extensions.c:212:6: warning: no previous prototype for 'show_extensions' 
[-Wmissing-prototypes]
   void show_extensions(char *dir) {
^
  extensions.c: In function 'show_extensions':
  extensions.c:216:6: warning: variable 'found' set but not used 
[-Wunused-but-set-variable]
int found;
^
  extensions.c: At top level:
  extensions.c:236:1: warning: function declaration isn’t a prototype 
[-Wstrict-prototypes]
   show_all_extensions()
   ^
  ...

And secondly, your show_extensions() function doesn't check the
current working directory.  Even though the description below 
only applies when an extension module argument is supplied, it 
makes sense that it should also apply to your -s option:

  $ help extend
  ...
If the shared-object filename is not expressed with a fully-qualified
pathname, the following directories will be searched in the order shown,
and the first instance of the file that is found will be selected:
  
   1. the current working directory
   2. the directory specified in the CRASH_EXTENSIONS environment variable
   3. /usr/lib64/crash/extensions (64-bit architectures)
   4. /usr/lib/crash/extensions
   5. the ./extensions subdirectory of the current directory
  ... 
  
Thanks,
  Dave



> 
> for example:
> 
> crash> extend -s
> /usr/lib64/crash/extensions/dminfo.so
> /usr/lib64/crash/extensions/echo.so
> /usr/lib64/crash/extensions/eppic.so
> /usr/lib64/crash/extensions/snap.so
> /usr/lib64/crash/extensions/trace.so
> ./extensions/dminfo.so
> ./extensions/eppic.so
> ./extensions/echo.so
> ./extensions/snap.so
> ./extensions/trace.so
> crash> extend -s -l
> extend: -l and -s are mutually exclusive
> Usage:
>   extend [shared-object ...] | [-u [shared-object ...]] | -s
> Enter "help extend" for details.
> crash> extend -s -u
> extend: -u and -s are mutually exclusive
> Usage:
>   extend [shared-object ...] | [-u [shared-object ...]] | -s
> Enter "help extend" for details.
> crash>
> 
> Also, this patch update the help for extend command:
> add the search order "5. the ./extensions subdirectory of the current
> directory"
> 
> Signed-off-by: Wang Long 
> ---
>  extensions.c | 72
>  
>  help.c   |  4 +++-
>  2 files changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/extensions.c b/extensions.c
> index 24b91de..bdf9e93 100644
> --- a/extensions.c
> +++ b/extensions.c
> @@ -20,10 +20,12 @@
>  
>  static int in_extensions_library(char *, char *);
>  static char *get_extensions_directory(char *);
> +static void show_all_extensions();
>  
> -#define DUMP_EXTENSIONS   (0)
> -#define LOAD_EXTENSION(1)
> -#define UNLOAD_EXTENSION  (2)
> +#define DUMP_EXTENSIONS(0)
> +#define LOAD_EXTENSION (1)
> +#define UNLOAD_EXTENSION   (2)
> +#define SHOW_ALL_EXTENSIONS(4)
>  
>  /*
>   *  Load, unload, or list the extension libaries.
> @@ -36,14 +38,30 @@ cmd_extend(void)
>  
>   flag = DUMP_EXTENSIONS;
>  
> -while ((c = getopt(argcnt, args, "lu")) != EOF) {
> +while ((c = getopt(argcnt, args, "lus")) != EOF) {
>  switch(c)
>  {
> + case 's':
> + if (flag & UNLOAD_EXTENSION) {
> + error(INFO,
> + "-s and -u are mutually exclusive\n");
> + argerrs++;
> + }else if (flag & LOAD_EXTENSION) {
> + error(INFO,
> + "-s and -l are mutually exclusive\n");
> + argerrs++;
> + } else
> + flag |= SHOW_ALL_EXTENSIONS;
> + break;
>   case 'l':
>   if (flag & UNLOAD_EXTENSION) {
>   error(INFO,
>   "-l and -u are mutually exclusive\n");
>   argerrs++;
> + } else if (flag & SHOW_ALL_EXTENSIONS) {
> + error(INFO,
> + "-l and -s are mutually exclusive\n");
> + argerrs++;
>   } else
>   flag |= LOAD_EXTENSION;
>   

Re: [Crash-utility] Using crash to dump trace information

2020-02-26 Thread Dave Anderson


- Original Message -
> 
> 
> Hi Dave:
> 
> I compiled crash-utility to run on X86 and handle vmcore files for a 32bit 
> arm.
> 
> I would like to use crash to dump the trace buffer, as described here:
> 
> https://access.redhat.com/solutions/239433
> 
> it relies on using a trace.so which is dependent on the kernel version.

That's news to me.  Why do say that?

> 
> $ crash --osrelease vmcore*
> 4.9.170
> 
> Since this is a homegrown version of 4.9 I suspect that I’d need to make the
> trace.so locally.

I don't understand your question, but on the x86 host machine tree were you 
built
the binary with "make target=ARM", you can build the sample extension modules
(which include trace.c) by entering "make extensions".

Dave
 
> 
> Is that something I can do without a lot of work?
> 
> -pete delaney
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] [PATCH] Fix for the "log" command

2020-02-13 Thread Dave Anderson



- Original Message -
> On Thu, Feb 13, 2020 at 08:16:37AM -0500, Dave Anderson wrote:
> > 
> > What does this patch have to do with "log -a"?
> > 
> Sorry, I just use "log" command to dump the kernel message buffer.
> The option fails with the following error message "invalid log_buf
> entry encountered". The issue has only happened on one arm64's dump
> once up to now.

OK thanks -- queued for crash-7.2.9:

  
https://github.com/crash-utility/crash/commit/dcd6e6bbdf0c32ca61f02d52a5250c9eeb499430

Dave

   
> The dump doesn't print all log filled with log_buf due to the failure.
> The symbol data values related with printk as followed:
> crash_test> log_next_idx
> log_next_idx = $1 = 1491656
> crash_test> log_first_idx
> log_first_idx = $2 = 1507956
> crash_test> log_buf_len
> log_buf_len = $3 = 2097152
> 
> We can see the log idx has exceed the value of log_buf_len.
> crash_test> log > dmesg.txt
> 
> log: invalid log_buf entry encountered. idx=2115950
> 
> The original code just break loop simply with a error message.
> We need to take case of the issue.
> 
> Thanks!
> Qiwu
> 
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] Fix for the "log -a" option

2020-02-13 Thread Dave Anderson


What does this patch have to do with "log -a"?

Dave

- Original Message -
> From: chenqiwu 
> 
> Fix for the "log -a" option. The printk logbuf is a ring buffer,
> if log_first_idx is larger than log_next_idx, there are two buffer
> zones must be handled for logdump:
> 1) [log_first_idx, log_buf_len]
> 2) [0, log_next_idx]
> 
> However, the original code ignores the logdump for the second buffer
> zone if log_first_idx is larger than log_next_idx. Without this patch,
> the option fails with the following error message "duplicate log_buf
> message pointer".
> 
> Signed-off-by: chenqiwu 
> ---
>  kernel.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel.c b/kernel.c
> index 68ee282..7604fac 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -5278,8 +5278,12 @@ dump_variable_length_record_log(int msg_flags)
>   idx = log_next(idx, logbuf);
>  
>   if (idx >= log_buf_len) {
> - error(INFO, "\ninvalid log_buf entry encountered\n");
> - break;
> + if (log_first_idx > log_next_idx)
> + idx = 0;
> + else {
> + error(INFO, "\ninvalid log_buf entry 
> encountered\n");
> + break;
> + }
>   }
>  
>   if (CRASHDEBUG(1) && (idx == log_next_idx))
> --
> 1.9.1
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-07 Thread Dave Anderson



- Original Message -
> Dave Anderson wrote on Fri, Feb 07, 2020:
> > > Following up with patch, with a couple of remarks:
> > >  - I had to change member_to_datatype() to use datatype_info() directly
> > > instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if
> > > this will have any side effects, but things like 'struct foo.a,b' still
> > > work at least. You might have a better idea of what to check.
> > 
> > Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and
> > not have datatype_info() re-initialize the incoming dm.  (except for
> > the setting of dm->member).  Maybe have a different flag for gathering
> > the size as you have, and keep the original functionality the same?
> > Or alternatively, leave the call to member_datatype() as-is, and if
> > do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()?
> 
> Ok, I just did that to save an extra gdb call as I was assuming that was
> slow, but it would probably be acceptable to call into gdb twice except
> for intellectual satisfaction... :)
> 
> Alternatively, have member_to_datatype() declare a temporary dm, pass
> that to datatype_info, and fill only dm->member_size ?
> I find it a bit weird to fill only member_offset and not the size...
> 
> Actually, for do_datatype_addr(), I'm pretty sure we never use the
> member_offset at all -- I just replaced member_to_datatype() by
> something that just assigned the member to dm->member and things just
> work for anything I could try (struct foo, struct foo.member, struct
> foo.member1,member2, struct foo.member1.innermember, struct -o foo,
> struct -o foo.member...)

I'm not following, your SHOW_RAW_DATA patch bumps the incoming addr by
dm->member_offset right?  Or do you mean in the normal case?  You are 
probably right there.

> 
> 
> > >  - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the
> > > non-raw case.
> > 
> > I think you mean just the opposite...
> 
> woops, I meant what I wrote here, the code I sent is wrong -- we're not
> getting the member_size in the ANON_MEMBER_QUERY so I didn't want it for
> a first approach.
> 
> > All of the ANON_xxx macros were added for getting information for members
> > that are declared inside of anonymous structures within a structure, where
> > where the generic datatype_info() call fails.  In those cases, the request
> > gets directed to a gdb print command within anon_member_offset().  There is
> > no support for getting the size of such a member, so MEMBER_SIZE() would
> > fail.  So I don't think this feature would work for those types of members,
> > and would need some kind of ANON_MEMBER_SIZE() and accompanying
> > anon_member_size() functionality.
> 
> Hm, yeah, probably want to support these too.
> 
> anon_member_offset tricks gdb with the usual "&((type*)0)->member", we
> could get the size by printing the same "+1" and computing the
> difference.
> 
> Can get it in one call with casts, but that's not exactly pretty...
> "(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"...
> 
> Would that work for you?

I'd prefer to put it a separate function and macro, i.e., a new 
ANON_MEMBER_SIZE()
macro and accompanying anon_member_size(), just to follow convention.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-07 Thread Dave Anderson



- Original Message -
> Dave Anderson wrote on Thu, Feb 06, 2020:
> > Right, the time-consumer is the call into gdb to get the structure member
> > details.
> > 
> > I'm not following what you mean by "making 'datatype_member' static ...",
> > but
> > off the top of my head, I was thinking of adding a "tmp_offset" and
> > "tmp_size"
> > fields to the global offset_table and size_table structures, and adding a
> > new
> > pc->curcmd_flags bit.  Then, if a command that wants to support such a
> > concept,
> > it could:
> > 
> >   (1) check the new pc->curcmd_flags bit, which will always be 0 the first
> >   time
> >   the function gets called by the exec_args_input_file() loop.
> >   (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size)
> >   (3) set the new flag in pc->curcmd_flags, and continue...
> > 
> > Then during the second and subsequent times through the loop,
> > pc->curcmd_flags
> > will still be set/unchanged, because restore_sanity() is not called from
> > the
> > exec_args_input_flags() loop.
> > 
> > But that scheme falls down if a user requests a comma-separated list of
> > multiple members (argc_members would be > 1).  Although, it might be better
> > if the "struct -r' option rejects multiple-member arguments, especially
> > given
> > that the output would be pretty much unreadable.
> 
> I would tend to agree with that, struct -r with multiple members might
> be somewhat parsable but if someone can do that they can dump the whole
> struct and parse it anyway, so let's go with only one number.
> 
> On the good news though, this whole caching isn't going to be
> immediately needed. I just finished the first part of this (allowing
> struct -r with a member), and struct -r is already infinitely faster
> than struct; so getting the offset wasn't the slow part:
>  - with a small 100-elements file, I'm already going down from 12s to
> near-instant on this old laptop.
>  - I didn't wait for 1000-elements to finish normally but it's just
> about one second with -r, which is acceptable enough for me.
> 
> Caching might make it another order of magnitude faster but for now I'm
> happy to wait a couple of minutes for my 100k elements list, it's better
> than not finishing in half an hour :)

Ok, so it must be the gdb-assisted print_struct() and parsing that's 
time-consuming, and not the gdb datatype query.

> 
> Following up with patch, with a couple of remarks:
>  - I had to change member_to_datatype() to use datatype_info() directly
> instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if
> this will have any side effects, but things like 'struct foo.a,b' still
> work at least. You might have a better idea of what to check.

Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and
not have datatype_info() re-initialize the incoming dm.  (except for
the setting of dm->member).  Maybe have a different flag for gathering
the size as you have, and keep the original functionality the same?  
Or alternatively, leave the call to member_datatype() as-is, and if 
do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()?

>  - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the
> non-raw case. 

I think you mean just the opposite...

   if (!member_to_datatype(memberlist[i], dm,
-   ANON_MEMBER_QUERY))
+   (flags & SHOW_RAW_DATA) 
? ANON_MEMBER_QUERY : 0))
error(FATAL, "invalid data structure 
reference: %s.%s\n",
  dm->name, memberlist[i]);

The use of ANON_MEMBER_QUERY is just there for a fall-back option if the
MEMBER_OFFSET() call fails.  

>I'm not quite sure why we couldn't get the member size if
> it's an anon union/stuct, but I'm not sure how one would name an
> anonymous field in the first place here? Anyway, one would get invalid
> data structure reference message there if they do. It might be better
> to always pass the argument and then check for SHOW_RAW_DATA set with
> dm->member_size still being 0 after call to give another more
> appropriate error if you think people might hit that.

All of the ANON_xxx macros were added for getting information for members
that are declared inside of anonymous structures within a structure, where
where the generic datatype_info() call fails.  In those cases, the request
gets directed to a gdb print command within anon_member_offset().  There is
no support for getting the size of such a member, 

Re: [Crash-utility] Support for crash running on an ARM 32 bit host analyzing ARM 32 bit crash files? Looking unlikely. :(

2020-02-07 Thread Dave Anderson


- Original Message -
> 
> 
> Hi Dave:
> 
> I tried to build crash-utility to run on a 32 bit ARM to analyze 32 bit ARM
> crash dumps.
> 
> On looking at the Make file it appeared that ARM is supported.
> 
> #
> # Supported targets: X86 ALPHA PPC IA64 PPC64 SPARC64
> # TARGET and GDB_CONF_FLAGS will be configured automatically by configure
> #
> 
> I was a bit disappointed. ☹
> 
> Any hope (in the future perhaps)?
> 
> -piet

I forgot that comment in the Makefile even existed -- it's also missing X86_64, 
ARM64,
PPC64, S390 and S390X.

What happened when you tried to build a 32-bit ARM binary?  It's been 
"supported"
for about 10 years now, but depends upon patches, fixes, and testing by external
developers (outside Red Hat), given that we have no 32-bit ARM hardware to
test on.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-06 Thread Dave Anderson



- Original Message -
> Dave Anderson wrote on Wed, Feb 05, 2020:
> > > What might make sense is to use the "struct -r" option, which does a raw
> > > memory dump of a data structure.  But for a reason I do not recall, it
> > > prevents that option from being used with a "struct_name.field" argument.
> > > (see line 6628 of symbols.c).  But I don't see why that couldn't be made
> > > to work, though, since the end result is simply a call to
> > > raw_data_dump().
> 
> I'll give this a try tomorrow, probably just needs to add
> dm->member_offset to addr and dump dm->member_size long value, that
> looks straightforward enough.
> 
> > ...and then if you get "struct -r" to work with a "struct_name.field"
> > argument, the next challenge would be the caching aspect of your request.
> > 
> > Currently there's no manner in which command-specific information is
> > cached beyond the execution of a single command.  With "< file", the
> > command gets executed from scratch each time.
> 
> That does look more challenging... Or rather more a matter of taste? a
> kludge probably wouldn't be so bad to put in, but it's probably better
> to have something more generic than making 'datatype_member' static in
> cmd_datatype_common (well, it needs a bit more than that as the argument
> strings won't be useable from one call to the next...)
>  
> I assume the slow part in this will be the member_to_datatype call in
> do_datatype_addr? I'll first confirm that's the only slow bit, if there
> is only one spot to optimize away it might not be so bad.
> 
> But yeah, without caching I don't think it's realistic; and making the
> '< file' construct iterate within the function looks more work than
> trying to make struct cache some info.
> 
> Thanks!

Right, the time-consumer is the call into gdb to get the structure member 
details.

I'm not following what you mean by "making 'datatype_member' static ...", but
off the top of my head, I was thinking of adding a "tmp_offset" and "tmp_size"
fields to the global offset_table and size_table structures, and adding a new
pc->curcmd_flags bit.  Then, if a command that wants to support such a concept, 
it could:

  (1) check the new pc->curcmd_flags bit, which will always be 0 the first time
  the function gets called by the exec_args_input_file() loop.
  (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size)
  (3) set the new flag in pc->curcmd_flags, and continue...

Then during the second and subsequent times through the loop, pc->curcmd_flags
will still be set/unchanged, because restore_sanity() is not called from the 
exec_args_input_flags() loop.

But that scheme falls down if a user requests a comma-separated list of
multiple members (argc_members would be > 1).  Although, it might be better
if the "struct -r' option rejects multiple-member arguments, especially given
that the output would be pretty much unreadable.

Dave


 
 > --
> Dominique
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-05 Thread Dave Anderson



- Original Message -
> 
> 
> - Original Message -
> > Hi,
> > 
> > I often find myself dumping a bunch of addresses to files to iterate
> > with 'struct_name.field < file_with_addresses', but that is horribly
> > slow for large number of iterations.
> > 
> > `help list` comment for -S vs. -s made me try to use `rd` instead,
> > e.g. get offset manually from `struct -o` then use rd instead like
> > `rd -o xx < addr_list | awk '{ print $2 }' > value_list` -- and that is
> > infinitely better.
> > 
> > 
> > Would it make sense to add a similar option to 'struct' instead so one
> > could do e.g. `struct -S struct_name.field addr` instead of the dance I was
> > doing?
> > (That would require to cache field offset in crash and not query it
> > again everytime, from a quick look at the code, but we could only cache
> > one and still gain a lot for such iterations...)
> > 
> > 
> > Am I missing another more practical way of doing this?
> > (I guess it's not so bad now I came up with using 'rd', but that was
> > non-obvious to me. My use case here involved following a couple of
> > pointers from a list so I dumped the first pointer to follow from list
> > with -S struct1.field1, but then the following iteration just wouldn't
> > end naively)
> 
> Dominique,
> 
> What might make sense is to use the "struct -r" option, which does a raw
> memory dump of a data structure.  But for a reason I do not recall, it
> prevents that option from being used with a "struct_name.field" argument.
> (see line 6628 of symbols.c).  But I don't see why that couldn't be made
> to work, though, since the end result is simply a call to raw_data_dump().
> 
> Dave

...and then if you get "struct -r" to work with a "struct_name.field" 
argument, the next challenge would be the caching aspect of your request.

Currently there's no manner in which command-specific information is
cached beyond the execution of a single command.  With "< file", the
command gets executed from scratch each time. 

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-05 Thread Dave Anderson



- Original Message -
> Hi,
> 
> I often find myself dumping a bunch of addresses to files to iterate
> with 'struct_name.field < file_with_addresses', but that is horribly
> slow for large number of iterations.
> 
> `help list` comment for -S vs. -s made me try to use `rd` instead,
> e.g. get offset manually from `struct -o` then use rd instead like
> `rd -o xx < addr_list | awk '{ print $2 }' > value_list` -- and that is
> infinitely better.
> 
> 
> Would it make sense to add a similar option to 'struct' instead so one
> could do e.g. `struct -S struct_name.field addr` instead of the dance I was 
> doing?
> (That would require to cache field offset in crash and not query it
> again everytime, from a quick look at the code, but we could only cache
> one and still gain a lot for such iterations...)
> 
> 
> Am I missing another more practical way of doing this?
> (I guess it's not so bad now I came up with using 'rd', but that was
> non-obvious to me. My use case here involved following a couple of
> pointers from a list so I dumped the first pointer to follow from list
> with -S struct1.field1, but then the following iteration just wouldn't
> end naively)

Dominique,

What might make sense is to use the "struct -r" option, which does a raw
memory dump of a data structure.  But for a reason I do not recall, it
prevents that option from being used with a "struct_name.field" argument.
(see line 6628 of symbols.c).  But I don't see why that couldn't be made 
to work, though, since the end result is simply a call to raw_data_dump().  

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



[Crash-utility] [ANNOUNCE] crash-7.2.8 is available

2020-01-30 Thread Dave Anderson


Download from: http://people.redhat.com/anderson
 or
   https://github.com/crash-utility/crash/releases

The github master branch serves as a development branch that will contain 
all patches that are queued for the next release:

  $ git clone git://github.com/crash-utility/crash.git


Changelog:
  
 - Fix for Linux 5.4-rc1 and later kernels that contain commit 
   688fcbfc06e4fdfbb7e1d5a942a1460fe6379d2d, titled "mm/vmalloc: 
   modify struct vmap_area to reduce its size".  Without the
   patch "kmem -v" will display nothing; other architectures 
   that utilize the vmap_area_list to determine the base of 
   mapped/vmalloc address space will fail.
   (ander...@redhat.com)
 
 - Fix for Linux 5.4-rc1 and later kernels that contain commit/merge 
   e0703556644a531e50b5dc61b9f6ea83af5f6604, titled "Merge tag 'modules-
   for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
   which introduces symbol namespaces.  Without the patch, and depending
   upon the architecture:
(1) the kernel module symbol list will contain garbage entries
(2) the session fails during session initialization with a dump of
the internal buffer allocation stats followed by the message
"crash: cannot allocate any more memory!"
(3) the session fails during session initialization with a
segmentation violation.
   (ander...@redhat.com)
 
 - Fix for the "timer -r" option on Linux 5.4-rc1 and later kernels 
   that contain commit 511885d7061eda3eb1faf3f57dcc936ff75863f1, titled
   "lib/timerqueue: Rely on rbtree semantics for next timer".  Without
   the patch, the option fails with the following error "timer: invalid
   structure member offset: timerqueue_head_next".
   (k-ha...@ab.jp.nec.com)
 
 - Fix for a "[-Wstringop-truncation]" compiler warning emitted when
   symbols.c is built in a Fedora Rawhide environment with gcc-9.0.1
   or later.
   (ander...@redhat.com)
 
 - Fix for the "kmem -n" option on Linux-5.4-rc1 and later kernels that
   contain commit b6c88d3b9d38f9448e0fcf44847a075ea81d5ca2, titled
   "drivers/base/memory.c: don't store end_section_nr in memory blocks".
   Without the patch, the command option fails with the error message
   "kmem: invalid structure member offset: memory_block_end_section_nr".
   (msys.miz...@gmail.com)
 
 - Fix for Linux 4.19.5 and later 4.19-based x86_64 kernels which
   are NOT configured with CONFIG_RANDOMIZE_BASE and have backported
   kernel commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15, titled
   "x86/mm: Move LDT remap out of KASLR region on 5-level paging",
   which modified the 4-level and 5-level paging PAGE_OFFSET values.
   Without this patch, the crash session fails during initialization
   with the error message "crash: seek error: kernel virtual address:
type: "tss_struct ist array".
   (ander...@redhat.com)
 
 - Additional fix for the "kmem -n" option on Linux-5.4-rc1 and later 
   kernels that contain commit b6c88d3b9d38f9448e0fcf44847a075ea81d5ca2,
   titled "drivers/base/memory.c: don't store end_section_nr in memory 
   blocks".  The initial fix only addressed the x86_64 architecture; 
   this incremental patch addresses the other architectures.
   (msys.miz...@gmail.com)
 
 - In the unlikely event that the panic task in a dumpfile cannot be 
   determined by the normal means, scan the kernel log buffer for panic
   keywords, and if found, generate the panic task from the CPU number
   that is specified following the panic message.
   (chenq...@xiaomi.com)
 
 - Adjust a crash-7.1.8 patch for support of /proc/kcore as the live 
   memory source in Linux 4.8 and later x86_64 kernels configured with
   CONFIG_RANDOMIZE_BASE, which randomizes the unity-mapping PAGE_OFFSET
   value.  Since the problem only arises before the determination of the
   randomized PAGE_OFFSET value, restrict the patch such that it only 
   takes effect during session initialization.
   (ander...@redhat.com)
 
 - Add support for extended numbering support in ELF dumpfiles to handle
   more than PN_XNUM (0x) program headers.  If the real number of 
   program header table entries is equal to or greater than PN_XNUM, the
   e_phnum field of the ELF header is set to PN_XNUM, and the actual 
   number is set in the sh_info field of the section header at index 0. 
   (k-ha...@ab.jp.nec.com)
 
 - Fix for a "warning: large integer implicitly truncated to unsigned 
   type [-Woverflow]" compiler message generated on 32-bit architectures
   as a result of the "Additional fix for the kmem -n option" patch
   above. 
   (ander...@redhat.com)
 
 - Add support for handling openSUSE vmlinux files which will be shipped
   in .xz compressed format.  Without the patch, only gzip and bzip2 
   formats are supported.
   (jirisl...@gmail.com)
 
 - Fix for the determination of the ARM64 page size on Linux 4.4 and 
   earlier kernels that do not have vmcoreinfo data.  Without the patch,
   the crash session fails during initialization 

Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson



- Original Message -
> On Mon, Jan 27, 2020 at 12:15:48PM -0500, Dave Anderson wrote:
> > 
> > 
> > - Original Message -
> > ...
> > > 
> > > Thanks, I didn't know qemu has '-device vmcoreinfo' option.
> > > 
> > > It seems that the vmcoreinfo option works for aarch64 as well.
> > > The KASLR issue and the modules_vaddr issue are gone with
> > > vmcoreinfo option. Great!
> > > 
> > > However, VA_BITS issue still remains the vmcoreinfo doesn't have 
> > > 'NUMBER(tcr_el1_t1sz)'.
> > > I suppose we can use 'NUMBER(VA_BITS)' instead, so I'll post another 
> > > patch later.
> > 
> > Right -- Bhupesh is still working on getting NUMBER(tcr_el1_t1sz) accepted 
> > upstream.
> 
> Great!
> So, should we wait Bhupesh's patch is merged to upstream?
> Or, is useful following workaround patch until then?
> 
> diff --git a/arm64.c b/arm64.c
> index 7662d71..bf93404 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -3889,6 +3889,14 @@ arm64_calc_VA_BITS(void)
> machdep->machspec->VA_BITS_ACTUAL = value;
> machdep->machspec->VA_BITS = value;
> machdep->machspec->VA_START =
> _VA_START(machdep->machspec->VA_BITS_ACTUAL);
> +   } else if ((string = 
> pc->read_vmcoreinfo("NUMBER(VA_BITS)"))) {
> +   value = strtoll(string, NULL, 0);
> +   if (CRASHDEBUG(1))
> +   fprintf(fp,  "vmcoreinfo : 
> vabits_actual: %ld\n", value);
> +   free(string);
> +   machdep->machspec->VA_BITS_ACTUAL = value;
> +   machdep->machspec->VA_BITS = value;
> +   machdep->machspec->VA_START = 
> _VA_START(machdep->machspec->VA_BITS_ACTUAL);
> } else
> error(FATAL, "cannot determine 
> VA_BITS_ACTUAL\n");
> }

But since that section of code above is gated by the existence of 
"vabits_actual",
it would really be a guess, correct?  (and the CRASHDEBUG(1) statement is 
certainly
misleading)  

The wholesale changes that the aarch64 developers keep doing to their virtual 
memory
layout has made the crash utility's arm64.c a nightmare to maintain.  And 
patches like
the above (and below) only add to the confusion. 

> 
> > 
> > > 
> > > BTW, could you merge the patch which I posted today
> > > in case the '-device vmcoreinfo' isn't set to qemu?
> > > https://www.redhat.com/archives/crash-utility/2020-January/msg00010.html
> > 
> > Honestly, I'm leaning against doing it, especially since the other two
> > issues that you referenced (VA_BITS and KASLR) would still exist without
> > "-device vmcoreinfo".
> > 
> > I'd prefer not to put in a bunch of patches for problems that would only
> > exist
> > because a user has not properly configured QEMU.  The whole point of the
> > vmcoreinfo device is specifically for its use by the crash utility.
> 
> I think the patch is useful for old qemu/libvirt/kernel like as
> RHEL8 in case the libvirt/qemu doesn't have the vmcoreinfo option and
> the kernel doesn't have the VA_BITS issue...

The RHEL8 kernel has the vmcoreinfo device since it's been upstream
since 4.17.  Are you saying that the RHEL8 userspace component does
not support it?  I thought I read somewhere that it went into libvirt 4.4,
and it looks like RHEL8's libvirt is based upon 4.5.0.

Dave

> 
> - Masa
> 
> > 
> > Comments?
> > 
> > Dave
> > 
> > 
> >  
> > > Thanks,
> > > Masa
> > > 
> > > > 
> > > > Anyway, Daisuke should be able fill in the details.
> > > > 
> > > > Dave
> > > > 
> > > > 
> > > > > 
> > > > > Dave
> > > > >
> > > > > 
> > > > > > 
> > > > > > - Masa
> > > > > > 
> > > > > > > 
> > > > > > > Dave
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > ./crash -d1 vmlinux-v5.4 dump.v5.4
> > > > > > > > ...
> > > > > > > > vmcore_data:
> > > > > > > >   flags: 

Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson



- Original Message -
...
> 
> Thanks, I didn't know qemu has '-device vmcoreinfo' option.
> 
> It seems that the vmcoreinfo option works for aarch64 as well.
> The KASLR issue and the modules_vaddr issue are gone with
> vmcoreinfo option. Great!
> 
> However, VA_BITS issue still remains the vmcoreinfo doesn't have
> 'NUMBER(tcr_el1_t1sz)'.
> I suppose we can use 'NUMBER(VA_BITS)' instead, so I'll post
> another patch later.

Right -- Bhupesh is still working on getting NUMBER(tcr_el1_t1sz) 
accepted upstream.  

> 
> BTW, could you merge the patch which I posted today
> in case the '-device vmcoreinfo' isn't set to qemu?
> https://www.redhat.com/archives/crash-utility/2020-January/msg00010.html

Honestly, I'm leaning against doing it, especially since the other two 
issues that you referenced (VA_BITS and KASLR) would still exist without
"-device vmcoreinfo".  

I'd prefer not to put in a bunch of patches for problems that would only exist
because a user has not properly configured QEMU.  The whole point of the 
vmcoreinfo device is specifically for its use by the crash utility.

Comments?

Dave


 
> Thanks,
> Masa
> 
> > 
> > Anyway, Daisuke should be able fill in the details.
> > 
> > Dave
> > 
> > 
> > > 
> > > Dave
> > >
> > > 
> > > > 
> > > > - Masa
> > > > 
> > > > > 
> > > > > Dave
> > > > > 
> > > > > 
> > > > > > 
> > > > > > ./crash -d1 vmlinux-v5.4 dump.v5.4
> > > > > > ...
> > > > > > vmcore_data:
> > > > > >   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
> > > > > >ndfd: 3
> > > > > > ofp: a5e81588
> > > > > > header_size: 30896
> > > > > >num_pt_load_segments: 1
> > > > > >  pt_load_segment[0]:
> > > > > > file_offset: 78b0
> > > > > >  phys_start: 4000
> > > > > >phys_end: 26000
> > > > > >   zero_fill: 0
> > > > > >  elf_header: 2ed6d4e0
> > > > > >   elf32: 0
> > > > > > notes32: 0
> > > > > >  load32: 0
> > > > > >   elf64: 2ed6d4e0
> > > > > > notes64: 2ed6d520
> > > > > >  load64: 2ed6d558
> > > > > >sect0_64: 0
> > > > > > nt_prstatus: 2ed6d590
> > > > > > nt_prpsinfo: 0
> > > > > >   nt_taskstruct: 0
> > > > > > task_struct: 0
> > > > > >  arch_data1: (unused)
> > > > > >  arch_data2: (unused)
> > > > > >switch_stack: 0
> > > > > >   page_size: 0
> > > > > >  xen_kdump_data: (unused)
> > > > > >  num_prstatus_notes: 32
> > > > > >  num_qemu_notes: 0
> > > > > >  vmcoreinfo: 0
> > > > > > size_vmcoreinfo: 0
> > > > > >  nt_prstatus_percpu:
> > > > > > 2ed6d590 2ed6d950 2ed6dd10
> > > > > > 2ed6e0d0
> > > > > > 2ed6e490 2ed6e850 2ed6ec10
> > > > > > 2ed6efd0
> > > > > > 2ed6f390 2ed6f750 2ed6fb10
> > > > > > 2ed6fed0
> > > > > > 2ed70290 2ed70650 2ed70a10
> > > > > > 2ed70dd0
> > > > > > 2ed71190 2ed71550 2ed71910
> > > > > > 2ed71cd0
> > > > > > 2ed72090 2ed72450 2ed72810
> > > > > > 2ed72bd0
> > > > > > 2ed72f90 2ed73350 2ed73710
> > > > > > 2ed73ad0
> > > > > > 2ed73e90 2ed74250 2ed74610
> > > > > > 2ed749d0
> > > > > >  nt_qemu_percpu:
> > > > > >backup_src_start: 0
> > > > > > backup_src_size: 0
> > > > > >   backup_offset: 0
> > > > > > ...
> > > > > > 
> > > > > > Thanks,
> > > > > > Masa
> > > > > > 
> > > > > > > 
> > > > > > > Dave
> > > > > > > 
> > > > > > >   I
> > > > > > > > 
> > > > > > > > 1. KASLR
> > > > > > > >crash doesn't work in case KASLR is enabled on the guest.
> > > > > > > >That is because the memory dump doesn't have vmcoreinfo, so
> > > > > > > >we
> > > > > > > >cannot get the relocation position.
> > > > > > > >I suppose we need to implement calc_kaslr_offset() for
> > > > > > > >aarch64.
> > > > > > > >nokaslr with the guest kernel parameter is a workaround.
> > > > > > > > 
> > > > > > > > 2. VA_BITS
> > > > > > > >crash doesn't work in case the guest kernel is v5.4 and
> > > > > > > >later.
> > > > > > > >That is because the memory dump doesn't have vmcoreinfo, so
> > > > > > > >we
> > > > > > > >cannot get vabits_actual.
> > > > > > > >I think there's no workaround so far...
> > > > > > > > 
> > > > > > > > Masayoshi Mizuma (1):
> > > > > > > >   arm64: Fix missing offset for modules_vaddr with aarch64
> > > > > > > >   guest
> > > > > > > >   dump
> > > > > > > > 
> > > > > > > >  arm64.c | 

Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson


Masa,

Check https://libvirt.org/formatdomain.html#elementsFeatures and grep for 
"vmcoreinfo".
It would seem like "vmcoreinfo state" would be "on" by default, but I'm not 
sure.

Dave


- Original Message -
> 
> 
> - Original Message -
> > 
> > 
> > - Original Message -----
> > > On Mon, Jan 27, 2020 at 10:17:51AM -0500, Dave Anderson wrote:
> > > > 
> > > > 
> > > > - Original Message -
> > > > > On Mon, Jan 27, 2020 at 09:56:53AM -0500, Dave Anderson wrote:
> > > > > > 
> > > > > > 
> > > > > > - Original Message -
> > > > > > > From: Masayoshi Mizuma 
> > > > > > > 
> > > > > > > Fix for aarch64 with Linux v5.0 and later kernels that
> > > > > > > contains commit 91fc957c9b1d ("arm64/bpf: don't allocate
> > > > > > > BPF JIT programs in module memory") and the memory dump
> > > > > > > is captured by virsh dump.
> > > > > > > 
> > > > > > > Note: Another two issues remain for the memory dump captured by
> > > > > > > virsh dump with aarch64.
> > > > > > 
> > > > > > I'm confused -- the vmcoreinfo data has been passed to the KVM host
> > > > > > for the virsh dump for quite some time now.  Is it not passed back
> > > > > > to the host on aarch64?
> > > > > 
> > > > > The vmcore_data shows that vmcoreinfo size is 0, so I think
> > > > > vmcoreinfo
> > > > > isn't captured by virsh dump.
> > > > > Am I missing something...?
> > > > 
> > > > I'm not sure -- are you using "virsh dump --memory-only ..."?
> > > 
> > > Yes, I'm using --memory-only option like as:
> > > 
> > > virsh dump --crash --memory-only  
> > 
> > OK, then that's news to me.  We went through this a while ago on x86_64
> > because it required the vmcoreinfo "phys_base".  It took awhile to get
> > it upstream, but now the whole vmcoreinfo note is passed for virsh dump
> > to include with the dumpfile.  Maybe it's x86_64 only?
> 
> Looking back through old bugzillas, google, etc, it does seem to indicate
> that the support covers "arm/x86", which I presume covers aarch64.  See
> kernel commit 2d6d60a3d3eca50bbb20052278cb11dabcf4dff3 titled
> "fw_cfg: write vmcoreinfo details".
> 
> I see a few old postings from Daisuke that indicate the the vmcoreinfo device
> has be registered with a "devices" or "features" file on the host (?) by
> using "virsh edit"?  Again, sorry, I'm clueless.
> 
> Anyway, Daisuke should be able fill in the details.
> 
> Dave
> 
> 
> > 
> > Dave
> >
> > 
> > > 
> > > - Masa
> > > 
> > > > 
> > > > Dave
> > > > 
> > > > 
> > > > > 
> > > > > ./crash -d1 vmlinux-v5.4 dump.v5.4
> > > > > ...
> > > > > vmcore_data:
> > > > >   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
> > > > >ndfd: 3
> > > > > ofp: a5e81588
> > > > > header_size: 30896
> > > > >num_pt_load_segments: 1
> > > > >  pt_load_segment[0]:
> > > > > file_offset: 78b0
> > > > >  phys_start: 4000
> > > > >phys_end: 26000
> > > > >   zero_fill: 0
> > > > >  elf_header: 2ed6d4e0
> > > > >   elf32: 0
> > > > > notes32: 0
> > > > >  load32: 0
> > > > >   elf64: 2ed6d4e0
> > > > > notes64: 2ed6d520
> > > > >  load64: 2ed6d558
> > > > >sect0_64: 0
> > > > > nt_prstatus: 2ed6d590
> > > > > nt_prpsinfo: 0
> > > > >   nt_taskstruct: 0
> > > > > task_struct: 0
> > > > >  arch_data1: (unused)
> > > > >  arch_data2: (unused)
> > > > >switch_stack: 0
> > > > >   page_size: 0
> > > > >  xen_kdump_data: (unused)
>

Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson



- Original Message -
> 
> 
> - Original Message -
> > On Mon, Jan 27, 2020 at 10:17:51AM -0500, Dave Anderson wrote:
> > > 
> > > 
> > > - Original Message -
> > > > On Mon, Jan 27, 2020 at 09:56:53AM -0500, Dave Anderson wrote:
> > > > > 
> > > > > 
> > > > > - Original Message -
> > > > > > From: Masayoshi Mizuma 
> > > > > > 
> > > > > > Fix for aarch64 with Linux v5.0 and later kernels that
> > > > > > contains commit 91fc957c9b1d ("arm64/bpf: don't allocate
> > > > > > BPF JIT programs in module memory") and the memory dump
> > > > > > is captured by virsh dump.
> > > > > > 
> > > > > > Note: Another two issues remain for the memory dump captured by
> > > > > > virsh dump with aarch64.
> > > > > 
> > > > > I'm confused -- the vmcoreinfo data has been passed to the KVM host
> > > > > for the virsh dump for quite some time now.  Is it not passed back
> > > > > to the host on aarch64?
> > > > 
> > > > The vmcore_data shows that vmcoreinfo size is 0, so I think vmcoreinfo
> > > > isn't captured by virsh dump.
> > > > Am I missing something...?
> > > 
> > > I'm not sure -- are you using "virsh dump --memory-only ..."?
> > 
> > Yes, I'm using --memory-only option like as:
> > 
> > virsh dump --crash --memory-only  
> 
> OK, then that's news to me.  We went through this a while ago on x86_64
> because it required the vmcoreinfo "phys_base".  It took awhile to get
> it upstream, but now the whole vmcoreinfo note is passed for virsh dump
> to include with the dumpfile.  Maybe it's x86_64 only?

Looking back through old bugzillas, google, etc, it does seem to indicate
that the support covers "arm/x86", which I presume covers aarch64.  See 
kernel commit 2d6d60a3d3eca50bbb20052278cb11dabcf4dff3 titled
"fw_cfg: write vmcoreinfo details".

I see a few old postings from Daisuke that indicate the the vmcoreinfo device
has be registered with a "devices" or "features" file on the host (?) by 
using "virsh edit"?  Again, sorry, I'm clueless.  

Anyway, Daisuke should be able fill in the details.

Dave


> 
> Dave
>
> 
> > 
> > - Masa
> > 
> > > 
> > > Dave
> > > 
> > > 
> > > > 
> > > > ./crash -d1 vmlinux-v5.4 dump.v5.4
> > > > ...
> > > > vmcore_data:
> > > >   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
> > > >ndfd: 3
> > > > ofp: a5e81588
> > > > header_size: 30896
> > > >num_pt_load_segments: 1
> > > >  pt_load_segment[0]:
> > > > file_offset: 78b0
> > > >  phys_start: 4000
> > > >phys_end: 26000
> > > >   zero_fill: 0
> > > >  elf_header: 2ed6d4e0
> > > >   elf32: 0
> > > > notes32: 0
> > > >  load32: 0
> > > >   elf64: 2ed6d4e0
> > > > notes64: 2ed6d520
> > > >  load64: 2ed6d558
> > > >sect0_64: 0
> > > > nt_prstatus: 2ed6d590
> > > > nt_prpsinfo: 0
> > > >   nt_taskstruct: 0
> > > > task_struct: 0
> > > >  arch_data1: (unused)
> > > >  arch_data2: (unused)
> > > >switch_stack: 0
> > > >   page_size: 0
> > > >  xen_kdump_data: (unused)
> > > >  num_prstatus_notes: 32
> > > >  num_qemu_notes: 0
> > > >  vmcoreinfo: 0
> > > > size_vmcoreinfo: 0
> > > >  nt_prstatus_percpu:
> > > > 2ed6d590 2ed6d950 2ed6dd10
> > > > 2ed6e0d0
> > > > 2ed6e490 2ed6e850 2ed6ec10
> > > > 2ed6efd0
> > > > 2ed6f390 2ed6f750 2ed6fb10
> > > > 2ed6fed0
> > > > 2ed70290 2ed70650 2ed70a10
> > > > 2ed70dd0
> > > > 2ed71190 00

Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson



- Original Message -
> On Mon, Jan 27, 2020 at 10:17:51AM -0500, Dave Anderson wrote:
> > 
> > 
> > - Original Message -
> > > On Mon, Jan 27, 2020 at 09:56:53AM -0500, Dave Anderson wrote:
> > > > 
> > > > 
> > > > - Original Message -
> > > > > From: Masayoshi Mizuma 
> > > > > 
> > > > > Fix for aarch64 with Linux v5.0 and later kernels that
> > > > > contains commit 91fc957c9b1d ("arm64/bpf: don't allocate
> > > > > BPF JIT programs in module memory") and the memory dump
> > > > > is captured by virsh dump.
> > > > > 
> > > > > Note: Another two issues remain for the memory dump captured by
> > > > > virsh dump with aarch64.
> > > > 
> > > > I'm confused -- the vmcoreinfo data has been passed to the KVM host
> > > > for the virsh dump for quite some time now.  Is it not passed back
> > > > to the host on aarch64?
> > > 
> > > The vmcore_data shows that vmcoreinfo size is 0, so I think vmcoreinfo
> > > isn't captured by virsh dump.
> > > Am I missing something...?
> > 
> > I'm not sure -- are you using "virsh dump --memory-only ..."?
> 
> Yes, I'm using --memory-only option like as:
> 
> virsh dump --crash --memory-only  

OK, then that's news to me.  We went through this a while ago on x86_64
because it required the vmcoreinfo "phys_base".  It took awhile to get
it upstream, but now the whole vmcoreinfo note is passed for virsh dump
to include with the dumpfile.  Maybe it's x86_64 only?

Dave
   

> 
> - Masa
> 
> > 
> > Dave
> > 
> > 
> > > 
> > > ./crash -d1 vmlinux-v5.4 dump.v5.4
> > > ...
> > > vmcore_data:
> > >   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
> > >ndfd: 3
> > > ofp: a5e81588
> > > header_size: 30896
> > >num_pt_load_segments: 1
> > >  pt_load_segment[0]:
> > > file_offset: 78b0
> > >  phys_start: 4000
> > >phys_end: 26000
> > >   zero_fill: 0
> > >  elf_header: 2ed6d4e0
> > >   elf32: 0
> > > notes32: 0
> > >  load32: 0
> > >   elf64: 2ed6d4e0
> > > notes64: 2ed6d520
> > >  load64: 2ed6d558
> > >sect0_64: 0
> > > nt_prstatus: 2ed6d590
> > > nt_prpsinfo: 0
> > >   nt_taskstruct: 0
> > > task_struct: 0
> > >  arch_data1: (unused)
> > >  arch_data2: (unused)
> > >switch_stack: 0
> > >   page_size: 0
> > >  xen_kdump_data: (unused)
> > >  num_prstatus_notes: 32
> > >  num_qemu_notes: 0
> > >  vmcoreinfo: 0
> > > size_vmcoreinfo: 0
> > >  nt_prstatus_percpu:
> > > 2ed6d590 2ed6d950 2ed6dd10
> > > 2ed6e0d0
> > > 2ed6e490 2ed6e850 2ed6ec10
> > > 2ed6efd0
> > > 2ed6f390 2ed6f750 2ed6fb10
> > > 2ed6fed0
> > > 2ed70290 2ed70650 2ed70a10
> > > 2ed70dd0
> > > 2ed71190 2ed71550 2ed71910
> > > 2ed71cd0
> > > 2ed72090 2ed72450 2ed72810
> > > 2ed72bd0
> > > 2ed72f90 2ed73350 2ed73710
> > > 2ed73ad0
> > > 2ed73e90 2ed74250 2ed74610
> > > 2ed749d0
> > >  nt_qemu_percpu:
> > >backup_src_start: 0
> > > backup_src_size: 0
> > >   backup_offset: 0
> > > ...
> > > 
> > > Thanks,
> > > Masa
> > > 
> > > > 
> > > > Dave
> > > > 
> > > >   I
> > > > > 
> > > > > 1. KASLR
> > > > >crash doesn't work in case KASLR is enabled on the guest.
> > > > >That is because the memory dump doesn't have vmcoreinfo, so we
> > > > >cannot get the rel

Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson



- Original Message -
> On Mon, Jan 27, 2020 at 09:56:53AM -0500, Dave Anderson wrote:
> > 
> > 
> > - Original Message -
> > > From: Masayoshi Mizuma 
> > > 
> > > Fix for aarch64 with Linux v5.0 and later kernels that
> > > contains commit 91fc957c9b1d ("arm64/bpf: don't allocate
> > > BPF JIT programs in module memory") and the memory dump
> > > is captured by virsh dump.
> > > 
> > > Note: Another two issues remain for the memory dump captured by
> > > virsh dump with aarch64.
> > 
> > I'm confused -- the vmcoreinfo data has been passed to the KVM host
> > for the virsh dump for quite some time now.  Is it not passed back
> > to the host on aarch64?
> 
> The vmcore_data shows that vmcoreinfo size is 0, so I think vmcoreinfo
> isn't captured by virsh dump.
> Am I missing something...?

I'm not sure -- are you using "virsh dump --memory-only ..."?

Dave


> 
> ./crash -d1 vmlinux-v5.4 dump.v5.4
> ...
> vmcore_data:
>   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
>ndfd: 3
> ofp: a5e81588
> header_size: 30896
>num_pt_load_segments: 1
>  pt_load_segment[0]:
> file_offset: 78b0
>  phys_start: 4000
>phys_end: 26000
>   zero_fill: 0
>  elf_header: 2ed6d4e0
>   elf32: 0
> notes32: 0
>  load32: 0
>   elf64: 2ed6d4e0
> notes64: 2ed6d520
>  load64: 2ed6d558
>sect0_64: 0
> nt_prstatus: 2ed6d590
> nt_prpsinfo: 0
>   nt_taskstruct: 0
> task_struct: 0
>  arch_data1: (unused)
>  arch_data2: (unused)
>switch_stack: 0
>   page_size: 0
>  xen_kdump_data: (unused)
>  num_prstatus_notes: 32
>  num_qemu_notes: 0
>  vmcoreinfo: 0
> size_vmcoreinfo: 0
>  nt_prstatus_percpu:
> 2ed6d590 2ed6d950 2ed6dd10 2ed6e0d0
> 2ed6e490 2ed6e850 2ed6ec10 2ed6efd0
> 2ed6f390 2ed6f750 2ed6fb10 2ed6fed0
> 2ed70290 2ed70650 2ed70a10 2ed70dd0
> 2ed71190 2ed71550 2ed71910 2ed71cd0
> 2ed72090 2ed72450 2ed72810 2ed72bd0
> 2ed72f90 2ed73350 2ed73710 2ed73ad0
> 2ed73e90 2ed74250 2ed74610 2ed749d0
>  nt_qemu_percpu:
>backup_src_start: 0
> backup_src_size: 0
>   backup_offset: 0
> ...
> 
> Thanks,
> Masa
> 
> > 
> > Dave
> > 
> >   I
> > > 
> > > 1. KASLR
> > >crash doesn't work in case KASLR is enabled on the guest.
> > >That is because the memory dump doesn't have vmcoreinfo, so we
> > >cannot get the relocation position.
> > >I suppose we need to implement calc_kaslr_offset() for aarch64.
> > >nokaslr with the guest kernel parameter is a workaround.
> > > 
> > > 2. VA_BITS
> > >crash doesn't work in case the guest kernel is v5.4 and later.
> > >That is because the memory dump doesn't have vmcoreinfo, so we
> > >cannot get vabits_actual.
> > >I think there's no workaround so far...
> > > 
> > > Masayoshi Mizuma (1):
> > >   arm64: Fix missing offset for modules_vaddr with aarch64 guest dump
> > > 
> > >  arm64.c | 2 ++
> > >  defs.h  | 3 +++
> > >  2 files changed, 5 insertions(+)
> > > 
> > > --
> > > 2.18.1
> > > 
> > > 
> > > --
> > > Crash-utility mailing list
> > > Crash-utility@redhat.com
> > > https://www.redhat.com/mailman/listinfo/crash-utility
> > > 
> > 
> > --
> > Crash-utility mailing list
> > Crash-utility@redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> > 
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH 0/1] arm64: Fix missing offset for modules_vaddr with aarch64 guest dump

2020-01-27 Thread Dave Anderson



- Original Message -
> From: Masayoshi Mizuma 
> 
> Fix for aarch64 with Linux v5.0 and later kernels that
> contains commit 91fc957c9b1d ("arm64/bpf: don't allocate
> BPF JIT programs in module memory") and the memory dump
> is captured by virsh dump.
> 
> Note: Another two issues remain for the memory dump captured by
> virsh dump with aarch64.

I'm confused -- the vmcoreinfo data has been passed to the KVM host
for the virsh dump for quite some time now.  Is it not passed back
to the host on aarch64?

Dave

  I
> 
> 1. KASLR
>crash doesn't work in case KASLR is enabled on the guest.
>That is because the memory dump doesn't have vmcoreinfo, so we
>cannot get the relocation position.
>I suppose we need to implement calc_kaslr_offset() for aarch64.
>nokaslr with the guest kernel parameter is a workaround.
> 
> 2. VA_BITS
>crash doesn't work in case the guest kernel is v5.4 and later.
>That is because the memory dump doesn't have vmcoreinfo, so we
>cannot get vabits_actual.
>I think there's no workaround so far...
> 
> Masayoshi Mizuma (1):
>   arm64: Fix missing offset for modules_vaddr with aarch64 guest dump
> 
>  arm64.c | 2 ++
>  defs.h  | 3 +++
>  2 files changed, 5 insertions(+)
> 
> --
> 2.18.1
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Extensions: ptdump update v1.0.7

2020-01-27 Thread Dave Anderson


- Original Message -
> Hi Dave,
> 
> The attached file is updated extension module ptdump v1.0.7.
> Please replace old one in extension site [1] with this tarball.
> 
> Changelog:
> - ptdump: fix build warning: warning: this ‘if’ clause does not guard
> - ptdump: fix failure: ptdump: invalid size request: 0 type: "read page for 
> write"
> - ptdump: fix heap memory and fd leak when fault happens
> 
> md5sum:
> b548afa3c44b6e7f81bce020297a1572
> 
> [1] http://people.redhat.com/anderson/extensions.html#PTDUMP
> 
> --
> Thanks.
> HATAYAMA, Daisuke
> 

Thanks Daisuke -- done!

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] [PATCH][v2] Remove __exception_text_start and __exception_text_end for ARM64

2020-01-06 Thread Dave Anderson


- Original Message -
> > Hi Prabhakar,
> >
> > On Mon, Dec 30, 2019 at 9:55 AM Prabhakar Kushwaha
> >  wrote:
> >>
> >> Dear Bhupesh,
> >>
> >> On Tue, Dec 24, 2019 at 9:39 AM Prabhakar Kushwaha
> >>  wrote:
> >> >
> >> > On Tue, Dec 24, 2019 at 1:09 AM Bhupesh Sharma 
> >> wrote:
> >> > >
> >> > > On Mon, Dec 23, 2019 at 7:49 PM Prabhakar Kushwaha
> >> > >  wrote:
> >> > > >
> >> > > > On Mon, Dec 23, 2019 at 7:32 PM Dave Anderson
> >>  wrote:
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > - Original Message -
> >> > > > > > __exception_text_start and __exception_text_end is used to
> >> group functions
> >> > > > > > and place according to linker script in such a way to achieve
> >> > > > > > kprobe blacklist. Linux commit b6e43c0e3129 ("arm64: remove
> >> __exception
> >> > > > > > annotations") has removed __exception_text_start and
> >> > > > > > __exception_text_end and uses NOKPROBE_SYMBOL() for blacklist
> >> kprobes.
> >> > > > > >
> >> > > > > > So removing references of __exception_text_start and
> >> __exception_text_end
> >> > > > > > for ARM64.
> >> > > > >
> >> > > > > NAK for a couple of reasons...
> >> > > > >
> >> > > > > First, they cannot be removed for backward-compatibility
> >> purposes, and secondly
> >> > > > > an alternative method is required for arm64_back_trace_cmd() for
> >> handling
> >> > > > > exception frames.
> >> > > > >
> >> > > >
> >> > > > We are getting following error with crash tool with latest kernel.
> >> > > > crash: cannot resolve "__exception_text_start" error
> >> > > >
> >> > > > it is because of Linux commit b6e43c0e3129 which removes
> >> > > > __exception_text_start and __exception_text_end.
> >> > > > We need to find alternate way of fixing it.
> >> > >
> >> > > I cannot seem to remember reproducing this issue with crash with
> >> > > latest kernel on my arm64 boards.
> >> > > Can you be more specific on the exact kernel version and the crash
> >> > > command you used, just to be sure we are able to reproduce the same?
> >> > >
> >> >
> >> > We are seeing this issue on Thuder X platform (arm64).
> >> >
> >> > Please find details of commit id.
> >> > A) crash tool: https://github.com/crash-utility/crash.git
> >> > commit:  af7f78dc501b8acf7fee3f924f69e93513d0a74b (Fix for the "log
> >> -a" option.)
> >> >
> >> > B) Linux:
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >> > commit:  46cf053efec6a3a5f343fead83efe8252a46 (Linux 5.5-rc3)
> >> >
> >> > C) Kexec:
> >> https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> >> > commit: bd077966e2b9041c24df5b689e67670f02be7b0d (kexec-tools: Fix
> >> > conversion overflow when compiling on 32-bit platforms)
> >> >
> >> >
> >> > Logs
> >> > root@ubuntu$ ./crash /proc/vmcore /usr/src/tovards/linux/vmlinux
> >> >
> >> > crash 7.2.7++
> >> > Copyright (C) 2002-2019  Red Hat, Inc.
> >> > Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> >> > Copyright (C) 1999-2006  Hewlett-Packard Co
> >> > Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> >> > Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> >> > Copyright (C) 2005, 2011  NEC Corporation
> >> > Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> >> > Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> >> > This program is free software, covered by the GNU General Public
> >> License,
> >> > and you are welcome to change it and/or distribute copies of it under
> >> > certain conditions.  Enter "help copying" to see the conditions.
> >> > This program has absolutely no warranty.  Enter "help warranty" for
> >> details.
> >> >

Re: [Crash-utility] crash on arm64

2020-01-05 Thread Dave Anderson


- Original Message -
> 
> 
> - Original Message -
> > Hello
> > 
> > I am getting the following error  when I use crash on arm64 platform (NXP
> > LS1043A, with 4 A53 cores).
> >   crash: read error: kernel virtual address: 7b616100  type:
> >   "memory section root table"
> > 
> > Additional information is that when I boot the kernel with nokaslr in
> > bootargs, the crash seems to work well.
> > 
> > Have any one else seen similar issue?
> > 
> > The kernel version is top of mainline with the below patch applied:
> > - arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
> > (http://lists.infradead.org/pipermail/kexec/2019-November/023962.html)
> > 
> > And the crash-utils is from here:
> > https://github.com/crash-utility/crash.git
> > commit 5e975dd8c817ea6aea35e1e15b83c378aee9c136
> > Author: Dave Anderson 
> > Date:   Tue Dec 24 08:43:52 2019 -0500
> > 
> > When determining the ARM64 kernel's "vabits_actual" value by reading
> > the new TCR_EL1.T1SZ vmcoreinfo entry, display its value during
> > session initialization only when invoking crash with "-d1" or larger
> > -d debug value.
> > (ander...@redhat.com)
> 
> What version of kexec-tools are you using?

Also -- how do things work on the live system, both with and without KASLR?

Dave

 
> 
> Dave
> 
> > 
> > Logs of crash -d1 are below:
> > nxa19049@lsv03080:~/data/ups/crash$ ./crash -d1 ../linux/vmlinux
> > ../linux/vmcore
> > 
> > crash 7.2.7++
> > Copyright (C) 2002-2019  Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> > Copyright (C) 1999-2006  Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> > Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011  NEC Corporation
> > Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions.  Enter "help copying" to see the conditions.
> > This program has absolutely no warranty.  Enter "help warranty" for
> > details.
> > 
> > vmcore_data:
> >   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
> >    ndfd: 3
> >     ofp: 7fa9998cf620
> >     header_size: 8192
> >    num_pt_load_segments: 4
> >  pt_load_segment[0]:
> >     file_offset: 2000
> >  phys_start: 8208
> >    phys_end: 839bc000
> >   zero_fill: 0
> >  pt_load_segment[1]:
> >     file_offset: 193e000
> >  phys_start: 8000
> >    phys_end: d800
> >   zero_fill: 0
> >  pt_load_segment[2]:
> >     file_offset: 5993e000
> >  phys_start: fb00
> >    phys_end: fb80
> >   zero_fill: 0
> >  pt_load_segment[3]:
> >     file_offset: 5a13e000
> >  phys_start: fbc0
> >    phys_end: fbe0
> >   zero_fill: 0
> >  elf_header: 2724410
> >   elf32: 0
> >     notes32: 0
> >  load32: 0
> >   elf64: 2724410
> >     notes64: 2724450
> >  load64: 2724488
> >    sect0_64: 0
> >     nt_prstatus: 2725410
> >     nt_prpsinfo: 0
> >   nt_taskstruct: 0
> >     task_struct: 0
> >  arch_data1: (unused)
> >  arch_data2: (unused)
> >    switch_stack: 0
> >   page_size: 4096
> >  xen_kdump_data: (unused)
> >  num_prstatus_notes: 4
> >  num_qemu_notes: 0
> >  vmcoreinfo: 2725a98
> >     size_vmcoreinfo: 1939
> >  nt_prstatus_percpu:
> >     02725410 027255ac 02725748 027258e4
> >  nt_qemu_percpu:
> >    backup_src_start: 0
> >     backup_src_size: 0
> >   backup_offset: 0
> > 
> > Elf64_Ehdr:
> >     e_ident: \177ELF
> >   e_ident[EI_CLASS]: 2 (ELFCLASS64)
> >    e_ident[EI_DATA]: 1 (ELFDATA2LSB)
> >     e_ident[EI_VERSION]: 1 (EV_CURRENT)
> >

Re: [Crash-utility] crash on arm64

2020-01-05 Thread Dave Anderson


- Original Message -
> Hello
> 
> I am getting the following error  when I use crash on arm64 platform (NXP
> LS1043A, with 4 A53 cores).
>   crash: read error: kernel virtual address: 7b616100  type: "memory 
> section root table"
> 
> Additional information is that when I boot the kernel with nokaslr in
> bootargs, the crash seems to work well.
> 
> Have any one else seen similar issue?
> 
> The kernel version is top of mainline with the below patch applied:
> - arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
> (http://lists.infradead.org/pipermail/kexec/2019-November/023962.html)
> 
> And the crash-utils is from here: https://github.com/crash-utility/crash.git
> commit 5e975dd8c817ea6aea35e1e15b83c378aee9c136
> Author: Dave Anderson 
> Date:   Tue Dec 24 08:43:52 2019 -0500
> 
> When determining the ARM64 kernel's "vabits_actual" value by reading
> the new TCR_EL1.T1SZ vmcoreinfo entry, display its value during
> session initialization only when invoking crash with "-d1" or larger
> -d debug value.
> (ander...@redhat.com)

What version of kexec-tools are you using?

Dave

> 
> Logs of crash -d1 are below:
> nxa19049@lsv03080:~/data/ups/crash$ ./crash -d1 ../linux/vmlinux 
> ../linux/vmcore
> 
> crash 7.2.7++
> Copyright (C) 2002-2019  Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> Copyright (C) 1999-2006  Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011  NEC Corporation
> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for details.
> 
> vmcore_data:
>   flags: c0 (KDUMP_LOCAL|KDUMP_ELF64)
>    ndfd: 3
>     ofp: 7fa9998cf620
>     header_size: 8192
>    num_pt_load_segments: 4
>  pt_load_segment[0]:
>     file_offset: 2000
>  phys_start: 8208
>    phys_end: 839bc000
>   zero_fill: 0
>  pt_load_segment[1]:
>     file_offset: 193e000
>  phys_start: 8000
>    phys_end: d800
>   zero_fill: 0
>  pt_load_segment[2]:
>     file_offset: 5993e000
>  phys_start: fb00
>    phys_end: fb80
>   zero_fill: 0
>  pt_load_segment[3]:
>     file_offset: 5a13e000
>  phys_start: fbc0
>    phys_end: fbe0
>   zero_fill: 0
>  elf_header: 2724410
>   elf32: 0
>     notes32: 0
>  load32: 0
>   elf64: 2724410
>     notes64: 2724450
>  load64: 2724488
>    sect0_64: 0
>     nt_prstatus: 2725410
>     nt_prpsinfo: 0
>   nt_taskstruct: 0
>     task_struct: 0
>  arch_data1: (unused)
>  arch_data2: (unused)
>    switch_stack: 0
>   page_size: 4096
>  xen_kdump_data: (unused)
>  num_prstatus_notes: 4
>  num_qemu_notes: 0
>  vmcoreinfo: 2725a98
>     size_vmcoreinfo: 1939
>  nt_prstatus_percpu:
>     02725410 027255ac 02725748 027258e4
>  nt_qemu_percpu:
>    backup_src_start: 0
>     backup_src_size: 0
>   backup_offset: 0
> 
> Elf64_Ehdr:
>     e_ident: \177ELF
>   e_ident[EI_CLASS]: 2 (ELFCLASS64)
>    e_ident[EI_DATA]: 1 (ELFDATA2LSB)
>     e_ident[EI_VERSION]: 1 (EV_CURRENT)
>   e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV)
> e_ident[EI_ABIVERSION]: 0
>  e_type: 4 (ET_CORE)
>   e_machine: 183 (EM_AARCH64)
>   e_version: 1 (EV_CURRENT)
>     e_entry: 0
>     e_phoff: 40
>     e_shoff: 0
>     e_flags: 0
>    e_ehsize: 40
>     e_phentsize: 38
>     e_phnum: 5
>     e_shentsize: 0
>     e_shnum: 0
>  e_shstrndx: 0
> Elf64_Phdr:
>  p_type: 4 (PT_NOTE)
>    p_offset: 4096 (1000)
>     p_vaddr: 0
>     p_paddr: 0
>  

Re: [Crash-utility] [PATCH][v2] Remove __exception_text_start and __exception_text_end for ARM64

2019-12-30 Thread Dave Anderson
> Hi Prabhakar,
>
> On Mon, Dec 30, 2019 at 9:55 AM Prabhakar Kushwaha
>  wrote:
>>
>> Dear Bhupesh,
>>
>> On Tue, Dec 24, 2019 at 9:39 AM Prabhakar Kushwaha
>>  wrote:
>> >
>> > On Tue, Dec 24, 2019 at 1:09 AM Bhupesh Sharma 
>> wrote:
>> > >
>> > > On Mon, Dec 23, 2019 at 7:49 PM Prabhakar Kushwaha
>> > >  wrote:
>> > > >
>> > > > On Mon, Dec 23, 2019 at 7:32 PM Dave Anderson
>>  wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > > - Original Message -
>> > > > > > __exception_text_start and __exception_text_end is used to
>> group functions
>> > > > > > and place according to linker script in such a way to achieve
>> > > > > > kprobe blacklist. Linux commit b6e43c0e3129 ("arm64: remove
>> __exception
>> > > > > > annotations") has removed __exception_text_start and
>> > > > > > __exception_text_end and uses NOKPROBE_SYMBOL() for blacklist
>> kprobes.
>> > > > > >
>> > > > > > So removing references of __exception_text_start and
>> __exception_text_end
>> > > > > > for ARM64.
>> > > > >
>> > > > > NAK for a couple of reasons...
>> > > > >
>> > > > > First, they cannot be removed for backward-compatibility
>> purposes, and secondly
>> > > > > an alternative method is required for arm64_back_trace_cmd() for
>> handling
>> > > > > exception frames.
>> > > > >
>> > > >
>> > > > We are getting following error with crash tool with latest kernel.
>> > > > crash: cannot resolve "__exception_text_start" error
>> > > >
>> > > > it is because of Linux commit b6e43c0e3129 which removes
>> > > > __exception_text_start and __exception_text_end.
>> > > > We need to find alternate way of fixing it.
>> > >
>> > > I cannot seem to remember reproducing this issue with crash with
>> > > latest kernel on my arm64 boards.
>> > > Can you be more specific on the exact kernel version and the crash
>> > > command you used, just to be sure we are able to reproduce the same?
>> > >
>> >
>> > We are seeing this issue on Thuder X platform (arm64).
>> >
>> > Please find details of commit id.
>> > A) crash tool: https://github.com/crash-utility/crash.git
>> > commit:  af7f78dc501b8acf7fee3f924f69e93513d0a74b (Fix for the "log
>> -a" option.)
>> >
>> > B) Linux:
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> > commit:  46cf053efec6a3a5f343fead83efe8252a46 (Linux 5.5-rc3)
>> >
>> > C) Kexec:
>> https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
>> > commit: bd077966e2b9041c24df5b689e67670f02be7b0d (kexec-tools: Fix
>> > conversion overflow when compiling on 32-bit platforms)
>> >
>> >
>> > Logs
>> > root@ubuntu$ ./crash /proc/vmcore /usr/src/tovards/linux/vmlinux
>> >
>> > crash 7.2.7++
>> > Copyright (C) 2002-2019  Red Hat, Inc.
>> > Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>> > Copyright (C) 1999-2006  Hewlett-Packard Co
>> > Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>> > Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>> > Copyright (C) 2005, 2011  NEC Corporation
>> > Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>> > Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>> > This program is free software, covered by the GNU General Public
>> License,
>> > and you are welcome to change it and/or distribute copies of it under
>> > certain conditions.  Enter "help copying" to see the conditions.
>> > This program has absolutely no warranty.  Enter "help warranty" for
>> details.
>> >
>> > vmcoreinfo : vabits_actual: 48
>> > GNU gdb (GDB) 7.6
>> > Copyright (C) 2013 Free Software Foundation, Inc.
>> > License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>> > This is free software: you are free to change and redistribute it.
>> > There is NO WARRANTY, to the extent permitted by law.  Type "show
>> copying"
>> > and "show warranty" for details.

Re: [Crash-utility] [PATCH][v2] Remove __exception_text_start and __exception_text_end for ARM64

2019-12-23 Thread Dave Anderson



- Original Message -
> __exception_text_start and __exception_text_end is used to group functions
> and place according to linker script in such a way to achieve
> kprobe blacklist. Linux commit b6e43c0e3129 ("arm64: remove __exception
> annotations") has removed __exception_text_start and
> __exception_text_end and uses NOKPROBE_SYMBOL() for blacklist kprobes.
> 
> So removing references of __exception_text_start and __exception_text_end
> for ARM64.

NAK for a couple of reasons...  

First, they cannot be removed for backward-compatibility purposes, and secondly
an alternative method is required for arm64_back_trace_cmd() for handling
exception frames.

Dave

> Signed-off-by: Prabhakar Kushwaha 
> Cc: James Morse 
> ---
> Changes for v2: Updated description and added CC-By
> 
>  arm64.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arm64.c b/arm64.c
> index 1b024a4..56ec088 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -673,8 +673,6 @@ arm64_dump_machdep_table(ulong arg)
>   fprintf(fp, "kimage_voffset: %016lx\n", 
> ms->kimage_voffset);
>   }
>   fprintf(fp, "   phys_offset: %lx\n", ms->phys_offset);
> - fprintf(fp, "__exception_text_start: %lx\n", 
> ms->__exception_text_start);
> - fprintf(fp, "  __exception_text_end: %lx\n", ms->__exception_text_end);
>   fprintf(fp, " __irqentry_text_start: %lx\n", ms->__irqentry_text_start);
>   fprintf(fp, "   __irqentry_text_end: %lx\n", ms->__irqentry_text_end);
>   fprintf(fp, "  exp_entry1_start: %lx\n", ms->exp_entry1_start);
> @@ -1644,10 +1642,6 @@ arm64_stackframe_init(void)
>   machdep->machspec->kern_eframe_offset = SIZE(pt_regs);
>   }
>  
> - machdep->machspec->__exception_text_start =
> - symbol_value("__exception_text_start");
> - machdep->machspec->__exception_text_end =
> - symbol_value("__exception_text_end");
>   if ((sp1 = kernel_symbol_search("__irqentry_text_start")) &&
>   (sp2 = kernel_symbol_search("__irqentry_text_end"))) {
>   machdep->machspec->__irqentry_text_start = sp1->value;
> @@ -1861,9 +1855,7 @@ arm64_in_exception_text(ulong ptr)
>  {
>   struct machine_specific *ms = machdep->machspec;
>  
> - if ((ptr >= ms->__exception_text_start) &&
> - (ptr < ms->__exception_text_end))
> - return TRUE;
> + return TRUE;
>  
>   if (ms->__irqentry_text_start && ms->__irqentry_text_end &&
>   ((ptr >= ms->__irqentry_text_start) &&
> --
> 2.17.1
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes

2019-12-16 Thread Dave Anderson


- Original Message -
> Hi Dave,
> I find 32-bit arm has the same issue, so the patch should also been applied 
> to 32-bit arm.
> Since I have no 32-bit arm kdump in hand, could you pls help test for the 
> attached patch?
> 
> Best regards,
> Qiwu

Your patch doesn't compile -- as shown here in a tree originally created with 
"make target=ARM":

  $ make
  TARGET: ARM
   CRASH: 7.2.8rc23
 GDB: 7.6
  
  cc -c -g -DARM -m32 -D_FILE_OFFSET_BITS=64 -DLZO -DSNAPPY -DGDB_7_6  
build_data.c  
  cc -c -g -DARM -m32 -D_FILE_OFFSET_BITS=64 -DLZO -DSNAPPY -DGDB_7_6  arm.c  
  arm.c: In function ‘arm_get_crash_notes’:
  arm.c:556:3: warning: ‘return’ with a value, in function returning void 
[enabled by default]
 return FALSE;
 ^
  arm.c:661:10: error: ‘struct machine_specific’ has no member named 
‘panic_task_regs’
 free(ms->panic_task_regs);
^
  arm.c:662:5: error: ‘struct machine_specific’ has no member named 
‘panic_task_regs’
 ms->panic_task_regs = NULL;
   ^
  make[4]: *** [arm.o] Error 1
  make[3]: *** [gdb] Error 2
  make[2]: *** [rebuild] Error 2
  make[1]: *** [gdb_merge] Error 2
  make: *** [all] Error 2
  $ 
  
I fixed the errors and checked it in for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/63df9c067de0b2017f50f5d236954890bbb42fe3

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes

2019-12-15 Thread Dave Anderson


- Original Message -
> Hi Dave,
> 
> I test your attached patch, it achieves the proper expectation for your
> modifies .

Thanks Qiwu -- the patch is queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/c408862daff0b07f0d98a1c309febcf6590ccf0c

Dave
  
> 
> WARNING: cpu 0: invalid NT_PRSTATUS note (name != "CORE")
> WARNING: cpu 3: invalid NT_PRSTATUS note (name != "CORE")
> WARNING: cpu 4: invalid NT_PRSTATUS note (name != "CORE")
> please wait... (determining panic task)
> WARNING: cannot determine starting stack frame for task ffdd0066cd80
> 
> WARNING: cannot determine starting stack frame for task ffdd0066ae80
> 
> WARNING: cannot determine starting stack frame for task ffdcb7e40f80
> 
> crash> bt -a
> PID: 244TASK: ffdcb7e40f80  CPU: 0   COMMAND: "hang_detect"
> bt: WARNING: cannot determine starting stack frame for task ffdcb7e40f80
> 
> PID: 0  TASK: ffdd0066dd00  CPU: 1   COMMAND: "swapper/1"
>  #0 [ffdd3fe4ea40] mrdump_stop_noncore_cpu at ffa5f6932a0c
>  #1 [ffdd3fe4ebc0] flush_smp_call_function_queue at ffa5f58b8684
>  #2 [ffdd3fe4ec20] generic_smp_call_function_single_interrupt at
>  ffa5f58ba8c4
>  #3 [ffdd3fe4ec30] handle_IPI at ffa5f56a8a10
>  #4 [ffdd3fe4eca0] gic_handle_irq at ffa5f5682294
> ---  ---
>  #5 [ffdd006c7de0] el1_irq at ffa5f56844e4
>  PC: ffa5f7120b28  [cpuidle_enter_state+336]
>  LR: ffa5f7120d70  [cpuidle_enter_state+920]
>  SP: ffdd006c7df0  PSTATE: 10c00145
> X29: ffdd006c7df0  X28: 0001  X27: 0001
> X26: ffa5fa04e000  X25: 0001  X24: 378c55cbccab
> X23: ffa5f9970980  X22: 0001  X21: ffdcb6f6a400
> X20: ffa5fa04efa8  X19: 378c56210393  X18: 
> X17:   X16:   X15: 
> X14: ffdd0066dd00  X13: 0037475a6000  X12: 3455591d
> X11:   X10: 1000   X9: 
>  X8: ff8ba00d8f6c   X7:    X6: 1ff4bf57ac45
>  X5: 00209246a095bf14   X4: 348a8d795b93   X3: 431bde82d7b634db
>  X2: 1ffba00cdba3   X1:    X0: 
>  #6 [ffdd006c7df0] cpuidle_enter_state at ffa5f7120b24
>  #7 [ffdd006c7e70] cpuidle_enter at ffa5f712150c
>  #8 [ffdd006c7e80] call_cpuidle at ffa5f5805e24
>  #9 [ffdd006c7eb0] do_idle at ffa5f5806320
> #10 [ffdd006c7f80] cpu_startup_entry at ffa5f5806910
> #11 [ffdd006c7fa0] secondary_start_kernel at ffa5f56a7fac
> 
> PID: 0  TASK: ffdd00669f00  CPU: 2   COMMAND: "swapper/2"
>  #0 [ffdd3fe6ca40] mrdump_stop_noncore_cpu at ffa5f6932a0c
>  #1 [ffdd3fe6cbc0] flush_smp_call_function_queue at ffa5f58b8684
>  #2 [ffdd3fe6cc20] generic_smp_call_function_single_interrupt at
>  ffa5f58ba8c4
>  #3 [ffdd3fe6cc30] handle_IPI at ffa5f56a8a10
>  #4 [ffdd3fe6cca0] gic_handle_irq at ffa5f5682294
> ---  ---
>  #5 [ffdd006cfde0] el1_irq at ffa5f56844e4
>  PC: ffa5f7120b28  [cpuidle_enter_state+336]
>  LR: ffa5f7120d70  [cpuidle_enter_state+920]
>  SP: ffdd006cfdf0  PSTATE: 10c00145
> X29: ffdd006cfdf0  X28: 0001  X27: 0001
> X26: ffa5fa04e000  X25: 0001  X24: 378c5463ce04
> X23: ffa5f9970980  X22: 0001  X21: ffdcb6f69b00
> X20: ffa5fa04efa8  X19: 378c5621219f  X18: 
> X17:   X16:   X15: 
> X14: ffdd00669f00  X13: 0037475c4000  X12: 3455591d
> X11:   X10: 1000   X9: 
>  X8: ff8ba00d9f6c   X7:    X6: 1ff4bf57ac5d
>  X5: 00209246a095bf14   X4: 348a8d795b93   X3: 431bde82d7b634db
>  X2: 1ffba00cd3e3   X1:    X0: 
>  #6 [ffdd006cfdf0] cpuidle_enter_state at ffa5f7120b24
>  #7 [ffdd006cfe70] cpuidle_enter at ffa5f712150c
>  #8 [ffdd006cfe80] call_cpuidle at ffa5f5805e24
>  #9 [ffdd006cfeb0] do_idle at ffa5f5806320
> #10 [ffdd006cff80] cpu_startup_entry at ffa5f580690c
> #11 [ffdd006cffa0] secondary_start_kernel at ffa5f56a7fac
> 
> PID: 0  TASK: ffdd0066cd80  CPU: 3   COMMAND: "swapper/3"
> bt: WARNING: cannot determine starting stack frame for task ffdd0066cd80
> 
> PID: 0  TASK: ffdd0066ae80  CPU: 4   COMMAND: "swapper/4"
> bt: WARNING: cannot deter

Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes

2019-12-12 Thread Dave Anderson


- Original Message -
> Hi Dave,
> Above your suggestion, I made changes for patch v2.
> 
> Best regards,
> Qiwu

Sorry, but I'm going to NAK this patch in its current form.  I'm not exactly 
sure
what you're trying to accomplish, but in my testing, it breaks things 
unnecessarily.

For example, moving the call to arm64_get_crash_notes() from POST_VM to 
POST_INIT
breaks this logic in task_init():

645 else {
646 if (KDUMP_DUMPFILE())
647 map_cpus_to_prstatus();
648 else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE())
649 map_cpus_to_prstatus_kdump_cmprs();
650 please_wait("determining panic task");
651 set_context(get_panic_context(), NO_PID);
652 please_wait_done();
653 }

The get_panic_context() call requires that the ARM64 arm64_get_crash_notes()
has already been called, and that ms->panic_task_regs[] array has been allocated
and populated.  But with your patch applied, it has not been called yet, and
therefore ms->page_task_regs is still NULL when get_panic_context() is called
above.
  
As a result, for example, on a compressed kdump clone created by virsh dump,
I now see this during initialization:

  please wait... (determining panic task) 
  WARNING: cannot determine starting stack frame for task 08c25280
  
  WARNING: cannot determine starting stack frame for task 8000fa01b000
  
  WARNING: cannot determine starting stack frame for task 8000fa01c000
  
  WARNING: cannot determine starting stack frame for task 8000fa01d000
  
But all of those active tasks have NT_PRSTATUS notes and backtraces:
  
  crash> help -D | grep PC:
 LR: 08085938   SP: 08be3ee0   PC: 
08099cf8
 LR: 08085938   SP: 8000fa06ff30   PC: 
08099cf8
 LR: 08085938   SP: 8000fa073f30   PC: 
08099cf8
 LR: 08085938   SP: 8000fa077f30   PC: 
08099cf8
  crash>

And therefore have legitimate starting stack frames:

  crash> bt 08c25280 8000fa01b000 8000fa01c000 8000fa01d000
  PID: 0  TASK: 08c25280  CPU: 0   COMMAND: "swapper/0"
   #0 [08be3ee0] cpu_do_idle at 08099cf4
   #1 [08be3f10] default_idle_call at 08766f90
   #2 [08be3f20] cpu_startup_entry at 08110fe4
   #3 [08be3f80] rest_init at 08761674
   #4 [08be3f90] start_kernel at 08ab0be8
  
  PID: 0  TASK: 8000fa01b000  CPU: 1   COMMAND: "swapper/1"
   #0 [8000fa06ff30] cpu_do_idle at 08099cf4
   #1 [8000fa06ff60] default_idle_call at 08766f90
   #2 [8000fa06ff70] cpu_startup_entry at 08110fe4
   #3 [8000fa06ffd0] secondary_start_kernel at 0808ecb8
  
  PID: 0  TASK: 8000fa01c000  CPU: 2   COMMAND: "swapper/2"
   #0 [8000fa073f30] cpu_do_idle at 08099cf4
   #1 [8000fa073f60] default_idle_call at 08766f90
   #2 [8000fa073f70] cpu_startup_entry at 08110fe4
   #3 [8000fa073fd0] secondary_start_kernel at 0808ecb8
  
  PID: 0  TASK: 8000fa01d000  CPU: 3   COMMAND: "swapper/3"
   #0 [8000fa077f30] cpu_do_idle at 08099cf4
   #1 [8000fa077f60] default_idle_call at 08766f90
   #2 [8000fa077f70] cpu_startup_entry at 08110fe4
   #3 [8000fa077fd0] secondary_start_kernel at ffff0808ecb8
  crash> 

Dave


  


> -Original Message-
> From: Dave Anderson 
> Sent: Wednesday, December 11, 2019 12:51 AM
> To: qiwuche...@gmail.com
> Cc: crash-utility@redhat.com; 陈启武 
> Subject: [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting
> crash notes
> 
> 
> 
> - Original Message -
> > From: chenqiwu 
> >
> > 1) ARM64 call arm64_get_crash_notes() to retrieve active task
> > registers when POST_VM before calling map_cpus_to_prstatus() to remap
> > the NT_PRSTATUS elf notes to the online cpus. It's better to call
> > arm64_get_crash_notes() when POST_INIT.
> > 2) arm64_get_crash_notes() check the sanity of NT_PRSTATUS notes only
> > for online cpus. If one cpu contains invalid note, it's better to
> > continue finding the crash notes for other online cpus.
> > So we can extract the backtraces for the online cpus which contain
> > valid note by using command "bt -a".
> > 3) map_cpus_to_prstatus() remap the NT_PRSTATUS notes only to the
> > online cpus. Make sure there must be a one-to-one relationship between
> > the number of online cpus and the number of notes.
> 
> The code in map_cpus_to_prstatus()

Re: [Crash-utility] [PATCH] Debugging xen hypervisor failed

2019-12-12 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> Am Mittwoch, 11. Dezember 2019, 20:28:49 CET schrieb Dave Anderson:
> > 
> ...
> > 
> > If you don't have a 32-bit x86 machine, or don't have the proper
> > libraries to build a 32-bit crash binary on an x86_64 host with
> > "make target=X86", just re-post the patch with your best effort
> > and I'll build-test it.
> > 
> > Thanks,
> >   Dave
> 
> Sorry, my bad, I didn't think of 32 bit.
> I installed the 32 bit libraries and hopefully fixed the issuses.
> But I'm not able to test the 32 bit variant because I have no proper dumps.
> Many thanks.
> 
> Dietmar.


Thanks Dietmar -- your patch is queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/4e4e5859731da650d3520150d7ea2ef07094c7af

Dave


> 
> Signed-off-by: Dietmar Hahn 
> 
> ---
>  x86.c   | 12 ++--
>  x86_64.c| 20 +++-
>  xen_hyper.c | 22 --
>  xen_hyper_defs.h|  8 
>  xen_hyper_dump_tables.c |  8 
>  5 files changed, 41 insertions(+), 29 deletions(-)
> 
> diff --git a/x86.c b/x86.c
> index 88562b6..de0d3d3 100644
> --- a/x86.c
> +++ b/x86.c
> @@ -5600,18 +5600,18 @@ x86_get_stackbase_hyper(ulong task)
>  
>   if (symbol_exists("init_tss")) {
>   init_tss = symbol_value("init_tss");
> - init_tss += XEN_HYPER_SIZE(tss_struct) * pcpu;
> + init_tss += XEN_HYPER_SIZE(tss) * pcpu;
>   } else {
>   init_tss = symbol_value("per_cpu__init_tss");
>   init_tss = xen_hyper_per_cpu(init_tss, pcpu);
>   }
>   
> - buf = GETBUF(XEN_HYPER_SIZE(tss_struct));
> + buf = GETBUF(XEN_HYPER_SIZE(tss));
>   if (!readmem(init_tss, KVADDR, buf,
> - XEN_HYPER_SIZE(tss_struct), "init_tss", 
> RETURN_ON_ERROR)) {
> + XEN_HYPER_SIZE(tss), "init_tss", RETURN_ON_ERROR)) {
>   error(FATAL, "cannot read init_tss.\n");
>   }
> - esp = ULONG(buf + XEN_HYPER_OFFSET(tss_struct_esp0));
> + esp = ULONG(buf + XEN_HYPER_OFFSET(tss_esp0));
>   FREEBUF(buf);
>   base = esp & (~(STACKSIZE() - 1));
>  
> @@ -5745,8 +5745,8 @@ x86_init_hyper(int when)
>  #endif
>   XEN_HYPER_STRUCT_SIZE_INIT(cpu_time, "cpu_time");
>   XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86");
> - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
> - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_esp0, "tss_struct", 
> "esp0");
> + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct");
> + XEN_HYPER_MEMBER_OFFSET_INIT(tss_esp0, "tss_struct", "esp0");
>   XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_local_tsc_stamp, 
> "cpu_time",
>   "local_tsc_stamp");
>   XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_local_stamp, 
> "cpu_time",
>   "stime_local_stamp");
>   XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_master_stamp, 
> "cpu_time",
>   "stime_master_stamp");
> diff --git a/x86_64.c b/x86_64.c
> index a4138ed..4f1a6d7 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -7973,13 +7973,23 @@ x86_64_init_hyper(int when)
>  
>   case POST_GDB:
>   XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86");
> - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
> - if (MEMBER_EXISTS("tss_struct", "__blh")) {
> - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = 
> MEMBER_OFFSET("tss_struct",
> "__blh") + sizeof(short unsigned int);
> + if (symbol_exists("per_cpu__tss_page")) {
> + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss64");
> + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
> + MEMBER_OFFSET("tss64", 
> "rsp0");
> + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss64", "ist");
>   } else {
> - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = 
> MEMBER_OFFSET("tss_struct",
> "rsp0");
> + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct");
> + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss_struct", 
> "ist");
> + if (MEMBER

Re: [Crash-utility] [PATCH] Debugging xen hypervisor failed

2019-12-11 Thread Dave Anderson


- Original Message -
> Hi Dave,
> 
> debugging newer xen hypervisors failed with:
> 
> crash: cannot resolve "init_tss"
> 
> This is caused by a change in the xen hypervisor with commit 78884406256,
> from 4.12.0-rc5-763-g7888440625. In this patch the struct tss_struct was
> renamed to tss64 and the structure tss_page was introduced which contains a
> single tss64.
> Now tss information are accessible via symbol "per_cpu__tss_page"
> 
> The code is as follows:
> 
> struct __packed tss64 {
> uint32_t :32;
> uint64_t rsp0, rsp1, rsp2;
> uint64_t :64;
> /*
>  * Interrupt Stack Table is 1-based so tss->ist[0] corresponds to an IST
>  * value of 1 in an Interrupt Descriptor.
>  */
> uint64_t ist[7];
> uint64_t :64;
> uint16_t :16, bitmap;
> };
> struct tss_page {
> struct tss64 __aligned(PAGE_SIZE) tss;
> };
> DECLARE_PER_CPU(struct tss_page, tss_page);
> 
> To keep the change simple and small I renamed xen_hyper_size_table.tss_struct
> to xen_hyper_size_table.tss and consequently I did the same for
> tss_struct_rsp0, tss_struct_esp0 and tss_struct_ist.
> But I'm not sure this is the way to go.
> Thanks.
> 
> Dietmar.

Hi Dietmar,

The patch looks good to me, and doesn't break backwards compatibility
with my old sample hypervisor dumps -- but the tss name changes breaks
the 32-bit x86 build:

$ make warn
TARGET: X86
 CRASH: 7.2.8rc22
   GDB: 7.6

... [ cut ] ...

cc -c -g -DX86 -m32 -D_FILE_OFFSET_BITS=64 -DLZO -DSNAPPY -DGDB_7_6  x86.c 
-DMCLX
In file included from x86.c:54:0:
x86.c: In function ‘x86_get_stackbase_hyper’:
xen_hyper_defs.h:766:61: error: ‘struct xen_hyper_size_table’ has no member 
named ‘tss_struct’
 #define XEN_HYPER_SIZE(X)  (SIZE_verify(xen_hyper_size_table.X, (char 
*)__FUNCTION__, __FILE__, __LINE__, #X))
 ^
x86.c:5603:15: note: in expansion of macro ‘XEN_HYPER_SIZE’
   init_tss += XEN_HYPER_SIZE(tss_struct) * pcpu;
   ^
In file included from x86.c:53:0:
xen_hyper_defs.h:766:61: error: ‘struct xen_hyper_size_table’ has no member 
named ‘tss_struct’
 #define XEN_HYPER_SIZE(X)  (SIZE_verify(xen_hyper_size_table.X, (char 
*)__FUNCTION__, __FILE__, __LINE__, #X))
 ^
defs.h:5070:35: note: in definition of macro ‘GETBUF’
 #define GETBUF(X)   getbuf((long)(X))
   ^
x86.c:5609:15: note: in expansion of macro ‘XEN_HYPER_SIZE’
  buf = GETBUF(XEN_HYPER_SIZE(tss_struct));
   ^
In file included from x86.c:54:0:
xen_hyper_defs.h:766:61: error: ‘struct xen_hyper_size_table’ has no member 
named ‘tss_struct’
 #define XEN_HYPER_SIZE(X)  (SIZE_verify(xen_hyper_size_table.X, (char 
*)__FUNCTION__, __FILE__, __LINE__, #X))
 ^
x86.c:5611:4: note: in expansion of macro ‘XEN_HYPER_SIZE’
XEN_HYPER_SIZE(tss_struct), "init_tss", RETURN_ON_ERROR)) {
^
In file included from x86.c:53:0:
xen_hyper_defs.h:767:67: error: ‘struct xen_hyper_offset_table’ has no member 
named ‘tss_struct_esp0’
 #define XEN_HYPER_OFFSET(X)  (OFFSET_verify(xen_hyper_offset_table.X, (char 
*)__FUNCTION__, __FILE__, __LINE__, #X))
   ^
defs.h:2376:46: note: in definition of macro ‘ULONG’
 #define ULONG(ADDR) *((ulong *)((char *)(ADDR)))
  ^
x86.c:5614:20: note: in expansion of macro ‘XEN_HYPER_OFFSET’
  esp = ULONG(buf + XEN_HYPER_OFFSET(tss_struct_esp0));
^
In file included from x86.c:54:0:
x86.c: In function ‘x86_init_hyper’:
xen_hyper_defs.h:774:55: error: ‘struct xen_hyper_size_table’ has no member 
named ‘tss_struct’
 #define XEN_HYPER_ASSIGN_SIZE(X) (xen_hyper_size_table.X)
   ^
xen_hyper_defs.h:777:43: note: in expansion of macro ‘XEN_HYPER_ASSIGN_SIZE’
 #define XEN_HYPER_STRUCT_SIZE_INIT(X, Y) (XEN_HYPER_ASSIGN_SIZE(X) = 
STRUCT_SIZE(Y))
   ^
x86.c:5748:3: note: in expansion of macro ‘XEN_HYPER_STRUCT_SIZE_INIT’
   XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
   ^
xen_hyper_defs.h:775:59: error: ‘struct xen_hyper_offset_table’ has no member 
named ‘tss_struct_esp0’
 #define XEN_HYPER_ASSIGN_OFFSET(X) (xen_hyper_offset_table.X)
   ^
xen_hyper_defs.h:779:48: note: in expansion of macro ‘XEN_HYPER_ASSIGN_OFFSET’
 #define XEN_HYPER_MEMBER_OFFSET_INIT(X, Y, Z) (XEN_HYPER_ASSIGN_OFFSET(X) = 
MEMBER_OFFSET(Y, Z))
^
x86.c:5749:3: note: in expansion of macro ‘XEN_HYPER_MEMBER_OFFSET_INIT’
   XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_esp0, "tss_struct", "esp0");
   ^
make[4]: *** [x86.o] Error 1
make[3]: *** [gdb] Error 2
make[2]: *** [rebuild] Error 2
make[1]: *** [gdb_merge] Error 2
make: *** [warn] Error 2

$

If you 

Re: [Crash-utility] [PATCH] Bugfix and optimization for ARM64 getting crash notes

2019-12-10 Thread Dave Anderson



- Original Message -
> From: chenqiwu 
> 
> 1) ARM64 call arm64_get_crash_notes() to retrieve active task
> registers when POST_VM before calling map_cpus_to_prstatus()
> to remap the NT_PRSTATUS elf notes to the online cpus. It's
> better to call arm64_get_crash_notes() when POST_INIT.
> 2) arm64_get_crash_notes() check the sanity of NT_PRSTATUS notes
> only for online cpus. If one cpu contains invalid note, it's
> better to continue finding the crash notes for other online cpus.
> So we can extract the backtraces for the online cpus which contain
> valid note by using command "bt -a".
> 3) map_cpus_to_prstatus() remap the NT_PRSTATUS notes only to the
> online cpus. Make sure there must be a one-to-one relationship
> between the number of online cpus and the number of notes.

The code in map_cpus_to_prstatus() and map_cpus_to_prstatus_kdump_cmprs()
has been in place forever.  Both the nd->nt_prstatus_percpu[] and 
dd->nt_prstatus_percpu[] arrays are per-cpu regardless whether
they are online or offline.  However, since kdump only creates 
NT_PRSTATUS notes for on-line cpus, the "i" index is needed for
each cpu, and the "j" index is needed for the existing NT_PRSTATUS
notes.  If a cpu is offline, its nt_prstatus_percpu[] entry will be 
zeroed out.  

I'm not arguing that the arm64 online-cpu handling may be suspect, but
your patch should not be making changes to architectural-neutral code
unless the issue affects all architectures.  So please leave those
two functions alone.

Dave


> 
> Signed-off-by: chenqiwu 
> ---
>  arm64.c| 49 +
>  diskdump.c |  9 +++--
>  netdump.c  |  4 ++--
>  3 files changed, 34 insertions(+), 28 deletions(-)
> 
> diff --git a/arm64.c b/arm64.c
> index 233029d..cbad461 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -458,7 +458,7 @@ arm64_init(int when)
>   arm64_stackframe_init()
>   break;
>  
> - case POST_VM:
> + case POST_INIT:
>   /*
>* crash_notes contains machine specific information about the
>* crash. In particular, it contains CPU registers at the time
> @@ -3587,7 +3587,7 @@ arm64_get_crash_notes(void)
>   ulong offset;
>   char *buf, *p;
>   ulong *notes_ptrs;
> - ulong i;
> + ulong i, j;
>  
>   if (!symbol_exists("crash_notes"))
>   return FALSE;
> @@ -3620,12 +3620,12 @@ arm64_get_crash_notes(void)
>   if (!(ms->panic_task_regs = calloc((size_t)kt->cpus, sizeof(struct
>   arm64_pt_regs
>   error(FATAL, "cannot calloc panic_task_regs space\n");
>   
> - for  (i = 0; i < kt->cpus; i++) {
> -
> + for  (i = 0, j = 0; i < kt->cpus; i++) {
>   if (!readmem(notes_ptrs[i], KVADDR, buf, SIZE(note_buf),
>   "note_buf_t", RETURN_ON_ERROR)) {
> - error(WARNING, "failed to read note_buf_t\n");
> - goto fail;
> + error(WARNING, "cpu#%d: failed to read note_buf_t\n", 
> i);
> + ++j;
> + continue;
>   }
>  
>   /*
> @@ -3655,19 +3655,29 @@ arm64_get_crash_notes(void)
>   note->n_descsz == notesz)
>   BCOPY((char *)note, buf, notesz);
>   } else {
> - error(WARNING,
> - "cannot find NT_PRSTATUS note for cpu: 
> %d\n", i);
> + if (CRASHDEBUG(1))
> + error(WARNING,
> + "cpu#%d: cannot find 
> NT_PRSTATUS note\n", i);
> + ++j;
>   continue;
>   }
>   }
>  
> + /*
> +  * Check the sanity of NT_PRSTATUS note only for each online 
> cpu.
> +  * If this cpu has invalid note, continue to find the crash 
> notes
> +  * for other online cpus.
> +  */
>   if (note->n_type != NT_PRSTATUS) {
> - error(WARNING, "invalid note (n_type != 
> NT_PRSTATUS)\n");
> - goto fail;
> + error(WARNING, "cpu#%d: invalid note (n_type != 
> NT_PRSTATUS)\n", i);
> + ++j;
> + continue;
>   }
> - if (p[0] != 'C' || p[1] != 'O' || p[2] != 'R' || p[3] != 'E') {
> - error(WARNING, "invalid note (name != \"CORE\"\n");
> - goto fail;
> +
> + if (!STRNEQ(p, "CORE")) {
> + error(WARNING, "cpu#%d: invalid note (name != 
> \"CORE\")\n", i);
> + ++j;
> + continue;
>   }
>  
>   /*
> @@ -3684,14 +3694,13 @@ arm64_get_crash_notes(void)
>  
>   FREEBUF(buf);
>   FREEBUF(notes_ptrs);
> - 

Re: [Crash-utility] README: Reword crash support for arch different than host arch

2019-12-06 Thread Dave Anderson
The point is that regardless whether the target vmcore is from
a different architecture, the "alternate" crash binary that is
built with "make target=XXX" must match the architecture of the
host.

Dave

- Original Message -
> 
> 
> Hi,
> 
> 
> 
> 
> 
> May be this rewording is needed?
> 
> 
> 
> 
> 
> 
> 
> crash$ git diff README
> 
> diff --git a/README b/README
> 
> index 62855c2..3bef5ec 100644
> 
> --- a/README
> 
> +++ b/README
> 
> @@ -82,9 +82,10 @@
> 
> must be configured and built. Alternatively, the crash source RPM file
> 
> may be installed and built, and the resultant crash binary RPM file
> installed.
> 
> 
> 
> - The crash binary can only be used on systems of the same architecture as
> 
> - the host build system. There are a few optional manners of building the
> 
> - crash binary:
> 
> + The crash binary built as above can only be used on systems of the same
> 
> + architecture as the host build system. However crash can be built to handle
> 
> + dumpfiles from different architecture than host build system, here are the
> 
> + available build options:
> 
> 
> 
> o On an x86_64 host, a 32-bit x86 binary that can be used to analyze
> 
> 32-bit x86 dumpfiles may be built by typing "make target=X86".
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> Santosh
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH v2] Obtain KASLR offset from early S390X dumps

2019-11-26 Thread Dave Anderson



Hi Mikhail,

Your patch is queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/6664cb3f4ea2eac1b6d482e541b56d7792a4be04

Note that I made the s390x_lc_kaslr check gated upon a successful return from 
readmem(),
just for the highly unlikely case of a failed read.

Thanks,
  Dave


- Original Message -
> If the kernel crashes before vmcoreinfo initialization, there is
> no way to extract KASLR offset for such early s390 dumps. With a new s390
> kernel patch, the KASLR offset will be stored in the lowcore memory during
> early boot and then overwritten after vmcoreinfo is initialized.
> This patch allows crash to identify the KASLR offset stored in the lowcore
> memory for s390 dumps.
> 
> Signed-off-by: Mikhail Zaslonko 
> ---
>  s390x.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/s390x.c b/s390x.c
> index 4a1a466..5e28ea4 100644
> --- a/s390x.c
> +++ b/s390x.c
> @@ -46,6 +46,8 @@
>  
>  #define S390X_PSW_MASK_PSTATE0x0001UL
>  
> +#define S390X_LC_VMCORE_INFO 0xe0c
> +
>  /*
>   * Flags for Region and Segment table entries.
>   */
> @@ -460,6 +462,8 @@ static void s390x_check_live(void)
>  void
>  s390x_init(int when)
>  {
> + ulong s390x_lc_kaslr;
> +
>   switch (when)
>   {
>   case SETUP_ENV:
> @@ -486,6 +490,23 @@ s390x_init(int when)
>   machdep->verify_paddr = generic_verify_paddr;
>   machdep->get_kvaddr_ranges = s390x_get_kvaddr_ranges;
>   machdep->ptrs_per_pgd = PTRS_PER_PGD;
> + if (DUMPFILE() && !(kt->flags & RELOC_SET)) {
> + /* Read the value from well-known lowcore location*/
> + readmem(S390X_LC_VMCORE_INFO, PHYSADDR, _lc_kaslr,
> + sizeof(s390x_lc_kaslr), "s390x_lc_kaslr",
> + QUIET|RETURN_ON_ERROR);
> + /* Check for explicit kaslr offset flag */
> + if (s390x_lc_kaslr & 0x1UL) {
> + /* Drop the last bit to get an offset value */
> + s390x_lc_kaslr &= ~(0x1UL);
> + /* Make sure the offset is aligned by 0x1000 */
> + if (s390x_lc_kaslr && !(s390x_lc_kaslr & 
> 0xfff)) {
> + kt->relocate = s390x_lc_kaslr * (-1);
> + kt->flags |= RELOC_SET;
> + kt->flags2 |= KASLR;
> + }
> + }
> + }
>   break;
>  
>   case PRE_GDB:
> --
> 2.17.1
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] Obtain KASLR offset from early S390X dumps

2019-11-25 Thread Dave Anderson



- Original Message -
> 
> - Original Message -
> > If the kernel crashes before vmcoreinfo initialization, there is
> > no way to extract KASLR offset for such early s390 dumps.
> > With a new s390 kernel patch, the KASLR offset will be stored in the
> > lowcore
> > memory during early boot and then overwritten after vmcoreinfo is
> > initialized.
> > This patch allows crash to identify the KASLR offset stored in lowcore
> > memory for s390 dumps.
> > 
> > Signed-off-by: Mikhail Zaslonko 
> > ---
> >  s390x.c | 21 +
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/s390x.c b/s390x.c
> > index 4a1a466..d2c6702 100644
> > --- a/s390x.c
> > +++ b/s390x.c
> > @@ -46,6 +46,8 @@
> >  
> >  #define S390X_PSW_MASK_PSTATE  0x0001UL
> >  
> > +#define S390X_LC_VMCORE_INFO   0xe0c
> > +
> >  /*
> >   * Flags for Region and Segment table entries.
> >   */
> > @@ -460,6 +462,8 @@ static void s390x_check_live(void)
> >  void
> >  s390x_init(int when)
> >  {
> > +   ulong s390x_lc_kaslr;
> > +
> > switch (when)
> > {
> > case SETUP_ENV:
> > @@ -486,6 +490,23 @@ s390x_init(int when)
> > machdep->verify_paddr = generic_verify_paddr;
> > machdep->get_kvaddr_ranges = s390x_get_kvaddr_ranges;
> > machdep->ptrs_per_pgd = PTRS_PER_PGD;
> > +   if (!(kt->flags & RELOC_SET)) {
> > +   /* Read the value from well-known lowcore location*/
> > +   readmem(S390X_LC_VMCORE_INFO, PHYSADDR, _lc_kaslr,
> > +   sizeof(s390x_lc_kaslr), "s390x_lc_kaslr",
> > +   FAULT_ON_ERROR);
> > +   /* Check for explicit kaslr offset flag */
> > +   if (s390x_lc_kaslr & 0x1UL) {
> > +   /* Drop the last bit to get an offset value */
> > +   s390x_lc_kaslr &= ~(0x1UL);
> > +   /* Make sure that the offset is aligned by 
> > 0x1000 */
> > +   if (s390x_lc_kaslr && !(s390x_lc_kaslr & 
> > 0xfff)) {
> > +   kt->relocate = s390x_lc_kaslr * (-1);
> > +   kt->flags |= RELOC_SET;
> > +   kt->flags2 |= KASLR;
> > +   }
> > +   }
> > +   }
> > break;
> >  
> > case PRE_GDB:
> > --
> 
> Hi Mikhail,
>  
> Your patch fails on a live system that utilizes /proc/kcore as the memory
> source:
> 
>   # ./crash
> 
>   crash 7.2.7++
>   Copyright (C) 2002-2019  Red Hat, Inc.
>   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>   Copyright (C) 1999-2006  Hewlett-Packard Co
>   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>   Copyright (C) 2005, 2011  NEC Corporation
>   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>   This program is free software, covered by the GNU General Public License,
>   and you are welcome to change it and/or distribute copies of it under
>   certain conditions.  Enter "help copying" to see the conditions.
>   This program has absolutely no warranty.  Enter "help warranty" for
>   details.
>  
>   crash: read error: physical address: e0c  type: "s390x_lc_kaslr"
>   #
> 
> That's because the newly-introduced readmem() becomes the very first memory
> read access, and because you call readmem() with PHYSADDR and FAULT_ON_ERROR,
> you don't allow crash to pivot from /dev/mem to /proc/kcore when it does its
> first KVADDR readmem() later on during initialization:
> 
>   # ./crash -d4
>   ... [ cut ] ...
>  
>   readmem: read_dev_mem() -> /dev/mem
>   
>   
>   /dev/mem: Operation not permitted
>   crash: read(/dev/mem, e0c, 8): -1 ()
>   crash: read error: physical address: e0c  type: "s390x_lc_kaslr"
>   #
> 
> Also, if there is *ever* a chance that the readmem() could fail to read
> that physical address from a dumpfile, I would also suggest that you allow
> it to fail quietly by changing the readmem() flag from FAULT_ON_ERROR
> to QUIET|RETURN_ON_ERROR like this:
> 
> /* Read the value from well-known lowcore location*/
> if (readmem(S390X_LC_VMCORE_INFO, PHYSADDR,
> _lc_kaslr,
> sizeof(s390x_lc_kaslr), "s390x_lc_kaslr",
> QUIET|RETURN_ON_ERROR)) {
> /* Check for explicit kaslr offset flag */
> if (s390x_lc_kaslr & 0x1UL) {
> /* Drop the last bit to get an offset
> value */
> s390x_lc_kaslr &= ~(0x1UL);
> /* Make sure that the offset is
> 

Re: [Crash-utility] [PATCH] Obtain KASLR offset from early S390X dumps

2019-11-25 Thread Dave Anderson


- Original Message -
> If the kernel crashes before vmcoreinfo initialization, there is
> no way to extract KASLR offset for such early s390 dumps.
> With a new s390 kernel patch, the KASLR offset will be stored in the lowcore
> memory during early boot and then overwritten after vmcoreinfo is initialized.
> This patch allows crash to identify the KASLR offset stored in lowcore
> memory for s390 dumps.
> 
> Signed-off-by: Mikhail Zaslonko 
> ---
>  s390x.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/s390x.c b/s390x.c
> index 4a1a466..d2c6702 100644
> --- a/s390x.c
> +++ b/s390x.c
> @@ -46,6 +46,8 @@
>  
>  #define S390X_PSW_MASK_PSTATE0x0001UL
>  
> +#define S390X_LC_VMCORE_INFO 0xe0c
> +
>  /*
>   * Flags for Region and Segment table entries.
>   */
> @@ -460,6 +462,8 @@ static void s390x_check_live(void)
>  void
>  s390x_init(int when)
>  {
> + ulong s390x_lc_kaslr;
> +
>   switch (when)
>   {
>   case SETUP_ENV:
> @@ -486,6 +490,23 @@ s390x_init(int when)
>   machdep->verify_paddr = generic_verify_paddr;
>   machdep->get_kvaddr_ranges = s390x_get_kvaddr_ranges;
>   machdep->ptrs_per_pgd = PTRS_PER_PGD;
> + if (!(kt->flags & RELOC_SET)) {
> + /* Read the value from well-known lowcore location*/
> + readmem(S390X_LC_VMCORE_INFO, PHYSADDR, _lc_kaslr,
> + sizeof(s390x_lc_kaslr), "s390x_lc_kaslr",
> + FAULT_ON_ERROR);
> + /* Check for explicit kaslr offset flag */
> + if (s390x_lc_kaslr & 0x1UL) {
> + /* Drop the last bit to get an offset value */
> + s390x_lc_kaslr &= ~(0x1UL);
> + /* Make sure that the offset is aligned by 
> 0x1000 */
> + if (s390x_lc_kaslr && !(s390x_lc_kaslr & 
> 0xfff)) {
> + kt->relocate = s390x_lc_kaslr * (-1);
> + kt->flags |= RELOC_SET;
> + kt->flags2 |= KASLR;
> + }
> + }
> + }
>   break;
>  
>   case PRE_GDB:
> --

Hi Mikhail,
 
Your patch fails on a live system that utilizes /proc/kcore as the memory 
source:

  # ./crash

  crash 7.2.7++
  Copyright (C) 2002-2019  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
 
  crash: read error: physical address: e0c  type: "s390x_lc_kaslr"
  #

That's because the newly-introduced readmem() becomes the very first memory
read access, and because you call readmem() with PHYSADDR and FAULT_ON_ERROR,
you don't allow crash to pivot from /dev/mem to /proc/kcore when it does its
first KVADDR readmem() later on during initialization:

  # ./crash -d4
  ... [ cut ] ...
 
  readmem: read_dev_mem() -> /dev/mem 
  
  
  /dev/mem: Operation not permitted
  crash: read(/dev/mem, e0c, 8): -1 ()
  crash: read error: physical address: e0c  type: "s390x_lc_kaslr"
  # 

Also, if there is *ever* a chance that the readmem() could fail to read
that physical address from a dumpfile, I would also suggest that you allow
it to fail quietly by changing the readmem() flag from FAULT_ON_ERROR
to QUIET|RETURN_ON_ERROR like this:

/* Read the value from well-known lowcore location*/
if (readmem(S390X_LC_VMCORE_INFO, PHYSADDR, 
_lc_kaslr,
sizeof(s390x_lc_kaslr), "s390x_lc_kaslr",
QUIET|RETURN_ON_ERROR)) {
/* Check for explicit kaslr offset flag */
if (s390x_lc_kaslr & 0x1UL) {
/* Drop the last bit to get an offset 
value */
s390x_lc_kaslr &= ~(0x1UL);
/* Make sure that the offset is aligned 
by 0x1000 */
if (s390x_lc_kaslr && !(s390x_lc_kaslr 
& 0xfff)) {
kt->relocate = s390x_lc_kaslr * 
(-1);
kt->flags |= RELOC_SET;
   

Re: [Crash-utility] [PATCH] Fix typo for 'bf -FF' command

2019-11-21 Thread Dave Anderson



Hi Austin,

Thanks for catching that, but you missed one -- there's another one just above 
it!
Queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/b259940b228cc7025904f9b7372348b56f73a4d2

Dave
 
 
- Original Message -
> When we use 'help bt' command, the instruction of 'bt' is printed as below.
> crash> help bt
> 
> NAME
>   bt - backtrace
> ...
> crash> bf -FF
> ...
>  #4 [810072b47f10] vfs_write at 800789d8
> 810072b47f18: [81007e020380:files_cache]
> [81007e2c2880:filp]
> 810072b47f28: 0002 fff7
> 810072b47f38: 2b141825d000 sys_write+69
>  #5 [810072b47f40] sys_write at 80078f75
> 
> But it seems that 'bf -FF' shows misleading information
> because invalid output is displayed using 'bf -FF' command as below.
> crash>  bf -FF 1
> crash: command not found: bf
> 
> But 'bt -FF 1' shows valid output.
> crash> bt -FF 1
> PID: 1  TASK: cf932d40  CPU: 0   COMMAND: "systemd"
>  #0 [] (__schedule) from []
> [PC: c0c1609c  LR: c0c162c0  SP: cf92fe18  SIZE: 72]
> cf92fe18: 40060093 _end+238341700 _end+239562316
> trace_buffer_unlock_commit_regs+280
> cf92fe28: 001c754b schedule+132 schedule+124 0004
> cf92fe38: 001c754b  __stack_chk_guard 
> cf92fe48:  80060013 _end+239554564 
> cf92fe58: _end+239562340 schedule+132
>  #1 [] (schedule) from []
> [PC: c0c162c0  LR: c0c1ab5c  SP: cf92fe60  SIZE: 8]
> cf92fe60: _end+239562452 schedule_hrtimeout_range_clock+300
> 
> So fix typo for 'bf -FF' command as below to avoid confusion.
> 
> Signed-off-by: Austin Kim 
> ---
>  help.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/help.c b/help.c
> index 2b2285b..934d8bc 100644
> --- a/help.c
> +++ b/help.c
> @@ -2133,7 +2133,7 @@ char *help_bt[] = {
>  "810072b47f38: 2b141825d000 sys_write+69   ",
>  " #5 [810072b47f40] sys_write at 80078f75",
>  "...",
> -"%s> bf -FF",
> +"%s> bt -FF",
>  "...",
>  " #4 [810072b47f10] vfs_write at 800789d8",
>  "810072b47f18: [81007e020380:files_cache]
>  [81007e2c2880:filp]",
> --
> 2.6.2
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [PATCH] crash/arm64: Determine vabits_actual value from 'TCR_EL1.T1SZ' value in vmcoreinfo

2019-11-19 Thread Dave Anderson


- Original Message -
> I have recently sent a kernel patch upstream to add 'TCR_EL1.T1SZ' to
> vmcoreinfo for arm64 (see [0]), instead of VA_BITS_ACTUAL.
> 
> 'crash' can read the 'TCR_EL1.T1SZ' value from vmcoreinfo
> [which indicates the size offset of the memory region addressed by
> TTBR1_EL1] and hence can be used for determining the vabits_actual
> value.

Thanks Bhupesh -- your patch has been queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/bfd9a651f9426d86250295ac875d7e33d8de2a97

Dave


> 
> [0].http://lists.infradead.org/pipermail/kexec/2019-November/023962.html
> 
> Cc: Dave Anderson 
> Cc: AKASHI Takahiro 
> Cc: Prabhakar Kushwaha 
> Cc: crash-utility@redhat.com
> Signed-off-by: Bhupesh Sharma 
> ---
>  arm64.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arm64.c b/arm64.c
> index af7147d24e20..083491331985 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -3856,8 +3856,17 @@ arm64_calc_VA_BITS(void)
>   } else if (ACTIVE())
>   error(FATAL, "cannot determine VA_BITS_ACTUAL: please 
> use
>   /proc/kcore\n");
>   else {
> - if ((string = 
> pc->read_vmcoreinfo("NUMBER(VA_BITS_ACTUAL)"))) {
> - value = atol(string);
> + if ((string = 
> pc->read_vmcoreinfo("NUMBER(tcr_el1_t1sz)"))) {
> + /* See ARMv8 ARM for the description of
> +  * TCR_EL1.T1SZ and how it can be used
> +  * to calculate the vabits_actual
> +  * supported by underlying kernel.
> +  *
> +  * Basically:
> +  * vabits_actual = 64 - T1SZ;
> +  */
> + value = 64 - strtoll(string, NULL, 0);
> + fprintf(fp,  "vmcoreinfo : vabits_actual: 
> %ld\n", value);
>   free(string);
>   machdep->machspec->VA_BITS_ACTUAL = value;
>   machdep->machspec->VA_BITS = value;
> --
> 2.7.4
> 
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



[Crash-utility] Undeliverable email sent to crash-utility@redhat.com

2019-11-15 Thread Dave Anderson



There is a new problem that has arisen concerning all Red Hat external mailing
lists that is related to a new DMARC policy that our security team has recently
changed.  As a result, the moderators of several Red Hat mailing lists 
(including
me) have started seeing issues where posts to their mailing lists are being 
rejected as Undeliverable.  There have been multiple internal support tickets
filed in hopes of a resolution, but there is no workaround that I am aware of
(except for some kind of mailing list configuration that needs to be done at the
sender's site, which is unacceptable).  

Hopefully it will be fixed soon.  If you receive such a response, please
re-send it and cc: ander...@redhat.com.

The response looks like this:



us-smtp-1.mimecast.com rejected your message to the following email addresses:

Discussion list for crash utility usage, maintenance and development 
(crash-utility@redhat.com)
Your message couldn't be delivered. It appears that the email address you sent 
your message to wasn't found at the destination domain, or the recipient's 
mailbox is unavailable. The email address might be misspelled or it might not 
exist. Try to fix the problem by doing one or more of the following:

Send the message again. Before you do, delete and retype the complete address. 
If your email program automatically suggests an address to use don't select it.

Clear the recipient Auto-Complete List entry in your email program by following 
the steps in this article. Then resend the message, but before you do, delete 
and retype the complete address. If your email program suggests an address to 
use don't select it.

Contact the recipient by some other means (by phone, for example) to confirm 
you're using the right address. Ask them if they've set up an email forwarding 
rule that could be forwarding your message to an incorrect address.
If you're still unable to fix the problem, ask the recipient to tell their 
email admin about the problem, and give them the server that reported the error 
below.

For Email Admins
When Office 365 tried to send the message, the external email server returned 
the error below. This error was reported by an email server outside Office 365, 
and if the sender is unable to fix the problem by correcting the recipient's 
email address or clearing the Auto-Complete List entry, then it's likely a 
problem that only the recipient's email admin can fix.

Check the error for information about where the problem is happening. For 
example, look for a domain name. The domain name will tell you which 
organization was responsible for the error. The recipient's email server could 
be causing the problem, or it could be due to a third-party service that your 
organization or the recipient's organization is using to process or filter 
email messages.

If you can't fix the problem, contact the responsible party's email admin. This 
could be the recipient's email admin, your smart host service admin, or someone 
similar. Give them the error and the name of the server that reported the error 
to help them troubleshoot the issue.
Unfortunately, Office 365 support is unlikely to be able to help with these 
kinds of externally reported errors.



us-smtp-1.mimecast.com gave this error:
Remote server returned unknown recipient or mailbox unavailable -> 550 Invalid 
Recipient - https://community.mimecast.com/docs/DOC-1369#550 
[vVcOhPBXNFukJRNZ9UedSA.us264] 




--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Fix for the determination of the ARM64 page size

2019-11-15 Thread Dave Anderson


- Original Message -
> Hi Dave and yueyi,
> I read what you mean, above your suggestions, I made changes for patch v2.

Looks good -- queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/babd7ae62d4e8fd6f93fd30b88040d9376522aa3

Thanks,
  Dave

> 
> Best regards,
> Qiwu
> 
> 
> -Original Message-
> From: Yueyi Li 
> Sent: Friday, November 15, 2019 12:17 AM
> To: Discussion list for crash utility usage, maintenance and development
> ; 陈启武 
> Subject: [External Mail]Re: [Crash-utility] Fix for the determination of the
> ARM64 page size
> 
> 
> 
> On 2019/11/14 22:14, Dave Anderson wrote:
> >
> >
> > - Original Message -
> >> Hi Dave,
> >> Since linux 4.4 and later kernels will always be able to determine
> >> the page size by reading the kernel flags (if there is no
> >> vmcoreinfo), I agree checking for (THIS_KERNEL_VERSION < LINUX(4,4,0)) is
> >> more reasonable.
> Hi Qiwu,
> 
> Do you noticed this line?
> 
>   182 if (!machdep->pagesize &&
>   183 kernel_symbol_exists("swapper_pg_dir") &&
>   184 kernel_symbol_exists("idmap_pg_dir")) {
> 
> That means "pagesize" can not be read from VMCOREINFO and kernel image
> header, so the kernel version must be earlier than Linux 4.4 if this code
> section be executed. So, just change it back should be OK.
> 
> Besides, I can only see your message by Dave quoted. Could you please add
> mailing list crash-uil...@redhat.com to 'CC' list, or just sending mail to
> mailing list for any discussion?
> 
> Thanks,
> Yueyi
> >
> > Did you finish reading my response from yesterday?
> >
> > There is no reason to check for (THIS_KERNEL_VERSION < LINUX(4,4,0)),
> > because the code section will *only* be executed if the kernel is earlier
> > than Linux 4.4.
> >
> > Again: the "else" section is dead code because it can never be executed:
> >
> > +   if (THIS_KERNEL_VERSION < LINUX(4,16,0)) {
> > +   value = symbol_value("swapper_pg_dir") -
> > +   symbol_value("idmap_pg_dir");
> > +   } else {
> > +   if (kernel_symbol_exists("tramp_pg_dir"))
> > +   value =
> > symbol_value("tramp_pg_dir");
> > +   else if
> > (kernel_symbol_exists("reserved_ttbr0"))
> > +   value =
> > symbol_value("reserved_ttbr0");
> > +   else
> > +   value =
> > + symbol_value("swapper_pg_dir");
> > +
> > +   value -= symbol_value("idmap_pg_dir");
> > +   }
> >
> > You can just use "swapper_pg_dir" and "idmap_pg_dir".
> >
> > Dave
> >
> >
> >
> >
> >
> >>
> >> Best regards,
> >> Qiwu
> >>
> >> -Original Message-
> >> From: Dave Anderson 
> >> Sent: Wednesday, November 13, 2019 11:28 PM
> >> To: 陈启武 
> >> Cc: Discussion list for crash utility usage, maintenance and
> >> development 
> >> Subject: Re: [External Mail]Re: Fix for the determination of the
> >> ARM64 page size
> >>
> >>
> >>
> >> - Original Message -
> >>> Hi Dave,
> >>> I find the bug from an ELF format arm64 ramdump (not vmcoreinfo)
> >>> with linux 3.18.
> >>> As we know, given the page size flags entry was introduced on Linux
> >>> 4.4 -rc1 and later versions, so the PAGE_SIZE cannot be determinated
> >>> by the following steps for ELF format
> >>> arm64 ramdump files with previous Linux 4.4 versions:
> >>>(1) checking the vmcoreinfo data, and
> >>>(2) checking the kernel image header for the flags field.
> >>>
> >>> If we ignore the following two steps, could the PAGE_SIZE be
> >>> determinated by the third step for previous Linux 4.16 versions?
> >>> I think the answer is no, because the symbols order from lowest to
> >>> highest value is idmap_pg_dir -> swapper_pg_dir -> reserved_ttbr0 ->
> >>> tramp_pg_dir.
> >>>  idmap_pg_dir = .;
> >>>  . += IDMAP_DIR_SIZE;
&

Re: [Crash-utility] Fix for the determination of the ARM64 page size

2019-11-14 Thread Dave Anderson


- Original Message -
> Hi Dave,
> Since linux 4.4 and later kernels will always be able to determine the page
> size by reading the kernel flags (if there is no vmcoreinfo), I agree
> checking for (THIS_KERNEL_VERSION < LINUX(4,4,0)) is more reasonable.

Did you finish reading my response from yesterday?  

There is no reason to check for (THIS_KERNEL_VERSION < LINUX(4,4,0)), because
the code section will *only* be executed if the kernel is earlier than Linux 
4.4.

Again: the "else" section is dead code because it can never be executed:

+   if (THIS_KERNEL_VERSION < LINUX(4,16,0)) {
+   value = symbol_value("swapper_pg_dir") -
+   symbol_value("idmap_pg_dir");
+   } else {
+   if (kernel_symbol_exists("tramp_pg_dir"))
+   value = symbol_value("tramp_pg_dir");
+   else if (kernel_symbol_exists("reserved_ttbr0"))
+   value = symbol_value("reserved_ttbr0");
+   else
+   value = symbol_value("swapper_pg_dir");
+
+   value -= symbol_value("idmap_pg_dir");
+   }

You can just use "swapper_pg_dir" and "idmap_pg_dir".

Dave





> 
> Best regards,
> Qiwu
> 
> -Original Message-
> From: Dave Anderson 
> Sent: Wednesday, November 13, 2019 11:28 PM
> To: 陈启武 
> Cc: Discussion list for crash utility usage, maintenance and development
> 
> Subject: Re: [External Mail]Re: Fix for the determination of the ARM64 page
> size
> 
> 
> 
> - Original Message -
> > Hi Dave,
> > I find the bug from an ELF format arm64 ramdump (not vmcoreinfo) with linux
> > 3.18.
> > As we know, given the page size flags entry was introduced on Linux
> > 4.4 -rc1 and later versions, so the PAGE_SIZE cannot be determinated
> > by the following steps for ELF format
> > arm64 ramdump files with previous Linux 4.4 versions:
> >   (1) checking the vmcoreinfo data, and
> >   (2) checking the kernel image header for the flags field.
> >
> > If we ignore the following two steps, could the PAGE_SIZE be
> > determinated by the third step for previous Linux 4.16 versions?
> > I think the answer is no, because the symbols order from lowest to
> > highest value is idmap_pg_dir -> swapper_pg_dir -> reserved_ttbr0 ->
> > tramp_pg_dir.
> > idmap_pg_dir = .;
> > . += IDMAP_DIR_SIZE;
> > swapper_pg_dir = .;
> > . += SWAPPER_DIR_SIZE;
> >
> > #ifdef CONFIG_ARM64_SW_TTBR0_PAN
> > reserved_ttbr0 = .;
> > . += RESERVED_TTBR0_SIZE;
> > #endif
> >
> > #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> > tramp_pg_dir = .;
> > . += PAGE_SIZE;
> > #endif
> >
> > For Linux 4.16 and later kernels with commit
> > 1e1b8c04fa3451e2b7190930adae43c95f0fae31 have changed the symbols
> > order, from lowest to highest value is idmap_pg_dir -> tramp_pg_dir ->
> > reserved_ttbr0 -> swapper_pg_dir.
> > idmap_pg_dir = .;
> > . += IDMAP_DIR_SIZE;
> >
> > #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> > tramp_pg_dir = .;
> > . += PAGE_SIZE;
> > #endif
> >
> > #ifdef CONFIG_ARM64_SW_TTBR0_PAN
> > reserved_ttbr0 = .;
> > . += RESERVED_TTBR0_SIZE;
> > #endif
> > swapper_pg_dir = .;
> > . += PAGE_SIZE;
> > swapper_pg_end = .;
> >
> > so we must consider the case on previous Linux 4.16 kernels,
> > especially for previous Linux 4.4 kernels without commit
> > 9d372c9fab34cd8803141871195141995f85c7f7.
> 
> But we really only need to consider kernels that are earlier than Linux 4.4,
> because Linux 4.4 and later kernels will always be able to determine the
> page size by reading the kernel flags (if there is no vmcoreinfo).  So the
> code below that you are patching will only be executed if:
> 
>   (1) there is no vmcoreinfo, and
>   (2) no kernel flags (in kernels earlier than Linux 4.4):
> 
> That being the case, I don't see how it would ever be possible for the "else"
> section below to ever be executed:
> 
> +   if (THIS_KERNEL_VERSION < LINUX(4,16,0)) {
> +   value = symbol_value("swapper_pg_dir") -
> +   symbol_value("idmap_pg_dir");
> +   

Re: [Crash-utility] [External Mail]Re: Fix for the determination of the ARM64 page size

2019-11-13 Thread Dave Anderson


- Original Message -
> Hi Dave,
> I find the bug from an ELF format arm64 ramdump (not vmcoreinfo) with linux 
> 3.18.
> As we know, given the page size flags entry was introduced on Linux 4.4 -rc1
> and later versions, so the PAGE_SIZE cannot be determinated by the following 
> steps for ELF format
> arm64 ramdump files with previous Linux 4.4 versions:
>   (1) checking the vmcoreinfo data, and
>   (2) checking the kernel image header for the flags field.
> 
> If we ignore the following two steps, could the PAGE_SIZE be determinated by
> the third step for previous Linux 4.16 versions?
> I think the answer is no, because the symbols order from lowest to highest
> value is idmap_pg_dir -> swapper_pg_dir -> reserved_ttbr0 -> tramp_pg_dir.
> idmap_pg_dir = .;
> . += IDMAP_DIR_SIZE;
> swapper_pg_dir = .;
> . += SWAPPER_DIR_SIZE;
> 
> #ifdef CONFIG_ARM64_SW_TTBR0_PAN
> reserved_ttbr0 = .;
> . += RESERVED_TTBR0_SIZE;
> #endif
> 
> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> tramp_pg_dir = .;
> . += PAGE_SIZE;
> #endif
> 
> For Linux 4.16 and later kernels with commit
> 1e1b8c04fa3451e2b7190930adae43c95f0fae31 have changed the symbols order,
> from lowest to highest value is idmap_pg_dir -> tramp_pg_dir ->
> reserved_ttbr0 -> swapper_pg_dir.
> idmap_pg_dir = .;
> . += IDMAP_DIR_SIZE;
> 
> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> tramp_pg_dir = .;
> . += PAGE_SIZE;
> #endif
> 
> #ifdef CONFIG_ARM64_SW_TTBR0_PAN
> reserved_ttbr0 = .;
> . += RESERVED_TTBR0_SIZE;
> #endif
> swapper_pg_dir = .;
> . += PAGE_SIZE;
> swapper_pg_end = .;
> 
> so we must consider the case on previous Linux 4.16 kernels, especially for
> previous Linux 4.4 kernels without commit
> 9d372c9fab34cd8803141871195141995f85c7f7.

But we really only need to consider kernels that are earlier than Linux 4.4,
because Linux 4.4 and later kernels will always be able to determine the page
size by reading the kernel flags (if there is no vmcoreinfo).  So the code below
that you are patching will only be executed if:

  (1) there is no vmcoreinfo, and
  (2) no kernel flags (in kernels earlier than Linux 4.4):

That being the case, I don't see how it would ever be possible for the
"else" section below to ever be executed:

+   if (THIS_KERNEL_VERSION < LINUX(4,16,0)) {
+   value = symbol_value("swapper_pg_dir") -
+   symbol_value("idmap_pg_dir");
+   } else {
+   if (kernel_symbol_exists("tramp_pg_dir"))
+   value = symbol_value("tramp_pg_dir");
+   else if (kernel_symbol_exists("reserved_ttbr0"))
+   value = symbol_value("reserved_ttbr0");
+   else
+   value = symbol_value("swapper_pg_dir");
+
+   value -= symbol_value("idmap_pg_dir");
+   }

I was going to suggest checking for (THIS_KERNEL_VERSION < LINUX(4,4,0)),
but I don't think that's even necessary given that the code sequence above
will *only* be executed if the kernel is Linux 4.4 or earlier.  So the
"else" section has become dead code.

Dave


> Best regards,
> Qiwu
> 
> 
> -Original Message-
> From: Dave Anderson 
> Sent: Tuesday, November 12, 2019 11:34 PM
> To: 陈启武 
> Cc: Discussion list for crash utility usage, maintenance and development
> 
> Subject: [External Mail]Re: Fix for the determination of the ARM64 page size
> 
> 
> - Original Message -
> > Hi Dave,
> > There is a bug for the determination of the ARM64 page size happen on
> > kernel 3.18 crash kdump.
> 
> If it is a kdump, there should be a PAGESIZE vmcoreinfo entry.
> As far as I can tell, the PAGE_SIZE has always been exported as the second
> item for all architectures here:
> 
>   static int __init crash_save_vmcoreinfo_init(void)
>   {
>   VMCOREINFO_OSRELEASE(init_uts_ns.name.release);
>   VMCOREINFO_PAGESIZE(PAGE_SIZE);
>   ...
> 
> What does "help -D" show for the vmcoreinfo data on your dumpfile?
> 
> 
> > The crash session failed immediately with the error message "crash: cannot
> > determine page size" since the page size cannot be determinted by kernel
> > image header flags field or subtraction of symbol values address.
> > ffc0024df000 A idmap_pg_dir
> > ffc0024

Re: [Crash-utility] Fix for the determination of the ARM64 page size

2019-11-12 Thread Dave Anderson


- Original Message -
> Hi Dave,
> There is a bug for the determination of the ARM64 page size happen on kernel 
> 3.18 crash kdump.

If it is a kdump, there should be a PAGESIZE vmcoreinfo entry.  
As far as I can tell, the PAGE_SIZE has always been exported
as the second item for all architectures here:

  static int __init crash_save_vmcoreinfo_init(void)
  {
  VMCOREINFO_OSRELEASE(init_uts_ns.name.release);
  VMCOREINFO_PAGESIZE(PAGE_SIZE);
  ...

What does "help -D" show for the vmcoreinfo data on your dumpfile?


> The crash session failed immediately with the error message "crash: cannot 
> determine page size" since the page size cannot be determinted by kernel 
> image header flags field or subtraction of symbol values address.
> ffc0024df000 A idmap_pg_dir
> ffc0024e2000 A swapper_pg_dir
> ffc0024e4000 A tramp_pg_dir
> so value = symbol_value("tramp_pg_dir") - symbol_value("idmap_pg_dir") = 5 * 
> PAGE_SIZE, the vaule result is determined by the order of symbol address:
> [kernel-3.18/arch/arm64/kernel/vmlinux.lds.S]
> BSS_SECTION(0, 0, 0)
> 
> . = ALIGN(PAGE_SIZE);
> idmap_pg_dir = .;
> . += IDMAP_DIR_SIZE;
> swapper_pg_dir = .;
> . += SWAPPER_DIR_SIZE;
> 
> #ifdef CONFIG_ARM64_SW_TTBR0_PAN
> reserved_ttbr0 = .;
> . += RESERVED_TTBR0_SIZE;
> #endif
> 
> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> tramp_pg_dir = .;
> . += PAGE_SIZE;
> #endif
> 
> For Linux 4.16 and later kernels have changed the order of symbol definition 
> due to containing the commit 1e1b8c04fa3451e2b7190930adae43c95f0fae31,
> So crash utility upstream commit 764e2d09978bb3f87dfaff4c6a59d4a5cc00f277 to 
> fix it, but it ignore the determination of the ARM64 page size on previous
> Linux 4.16 kernels.
> 
> So I recommend this patch to fix it.

I have several old arm64 dumpfiles, with kernel versions 3.19, 4.2, 4.4, 4.5, 
4.7, 
4.9 and 4.14.  However, none of them reach your patch because the code section 
that
you are patching is only used as a third option after:

  (1) checking the vmcoreinfo data, and 
  (2) checking the kernel image header for the flags field.  

In Linux 4.4, this patch added the page size to the kernel image header:

  commit 9d372c9fab34cd8803141871195141995f85c7f7
  Author: Ard Biesheuvel 
  Date:   Mon Oct 19 14:19:36 2015 +0100

arm64: Add page size to the kernel image header

This patch adds the page size to the arm64 kernel image header
so that one can infer the PAGESIZE used by the kernel. This will
be helpful to diagnose failures to boot the kernel with page size
not supported by the CPU.

And later on in Linux 4.6, "_kernel_flags_le" was replaced by
"_kernel_flags_le_lo32" and "_kernel_flags_le_hi32":

  commit 6ad1fe5d9077a1ab40bf74b61994d2e770b00b14
  Author: Ard Biesheuvel 
  Date:   Sat Dec 26 13:48:02 2015 +0100

arm64: avoid R_AARCH64_ABS64 relocations for Image header fields

Unfortunately, the current way of using the linker to emit build time
constants into the Image header will no longer work once we switch to
the use of PIE executables. The reason is that such constants are emitted
into the binary using R_AARCH64_ABS64 relocations, which are resolved at
runtime, not at build time, and the places targeted by those relocations
will contain zeroes before that.

So refactor the endian swapping linker script constant generation code so
that it emits the upper and lower 32-bit words separately.

Anyway, given that the page size flags entry was introduced in Linux 4.4, I 
don't 
believe your patch checking for LINUX(4,16,0) is correct:

+   if (THIS_KERNEL_VERSION < LINUX(4,16,0)) {
+   value = symbol_value("swapper_pg_dir") -
+   symbol_value("idmap_pg_dir");
+   } else {

Do you agree?

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] [External Mail]Re: [PATCH] Fix a potential segfault for the ARM64 "bt -S " command

2019-11-05 Thread Dave Anderson

Ok, so since this is simply a fix to prevent a SIGSEGV, then my alternative
suggestion to have arm64_is_kernel_exception_frame() return FALSE if the
"regs" address assignment is invalid should suffice.

Thanks,
  Dave

- Original Message -
> Hi Dave,
> 1. Is this a kdump-generated dumpfile?
> It's a kdump-generated dumpfile for arm64.
> 
> 2. Have you looked into why you get the "bt: WARNING: cannot determine
> starting stack frame for task ffcd74122000" message?
> Because kernel didn't enable crash_notes symbol to save active task regs.
> 
> 3. You didn't show the results of your patch -- if you apply it, does the
> backtrace get displayed correctly?
> From the result of my patch, it shows bade stack frame for sp address
> 0xff800c42ba00.
> crash> bt -S ff800c42ba00 108
> PID: 108TASK: ffcd74122000  CPU: 5   COMMAND: "rtmm_reclaim"
> bt: WARNING: cannot determine starting stack frame for task ffcd74122000
>  #0 [ff9c29e4fa10] (null) at fffc
> 
> 4. Since the "bt -S" option is almost never used.  Would it be possible to
> restrict your patch to fix/verify things in the section where it handles the
> bt->hp->sp setting?
> I add the following change to print where it handles the bt->hp->sp setting:
> --- a/arm64.c
> +++ b/arm64.c
> @@ -2542,7 +2542,8 @@ arm64_back_trace_cmd(struct bt_info *bt)
>  *   x+8: contains stackframe.pc -- text return address
>  *  x+16: is the stackframe.sp address
>  */
> -
> +   fprintf(stderr, "bt:flags=%llx, bptr=%lx, eip=%lx, esp=%lx,
> stkptr=%lx, instptr=%lx, frameptr=%lx\n",
> +   bt->flags, bt->bptr, bt->hp->eip, bt->hp->esp, bt->stkptr,
> bt->instptr, bt->frameptr);
> if (bt->flags & BT_KDUMP_ADJUST) {
> if (arm64_on_irq_stack(bt->tc->processor, bt->bptr)) {
> arm64_set_irq_stack(bt);
> @@ -2572,6 +2573,7 @@ arm64_back_trace_cmd(struct bt_info *bt)
> stackframe.fp = bt->frameptr;
> }
> 
> +   fprintf(stderr, "stackframe:sp=%lx, pc=%lx, fp=%lx\n", stackframe.sp,
> stackframe.pc, stackframe.fp);
> if (bt->flags & BT_TEXT_SYMBOLS) {
> arm64_print_text_symbols(bt, , ofp);
>  if (BT_REFERENCE_FOUND(bt)) {
> 
> The result shows as below:
> crash> bt -S ff800c42ba00 108
> bt:flags=4, bptr=0, eip=0, esp=ff800c42ba00,
> stkptr=ff800c42ba00, instptr=0, frameptr=0
> stackframe:sp=ff800c42ba08, pc=0, fp=ff9c29e4fa10
> 
> It seems invalid stackframe.sp and pc calculated by
> GET_STACK_ULONG(bt->hp->esp). I think it must be resulted from invalid
> bt->stackbuf address.
> (gdb) p /x *(struct bt_info *) 0x7fffd640
> $4 = {task = 0xffcd74122000, flags = 0x0, instptr = 0x0, stkptr =
> 0xff800c42ba00, bptr = 0x0, stackbase = 0xff800c428000,
>   stacktop = 0xff800c42c000, stackbuf = 0x55f23ae0, tc =
>   0x596e1778, hp = 0x7fffd5f0, textlist = 0x0, ref = 0x0, frameptr =
>   0x0,
>   call_target = 0x0, machdep = 0x0, debug = 0x0, eframe_ip = 0x0, radix =
>   0x0, cpumask = 0x0}
> 
> so this is the reason for that matter what is the stackframe.pc and
> stackframe.fp.
> 
> Best regards,
> Qiwu
> 
> 
> 
> -Original Message-
> From: Dave Anderson 
> Sent: Monday, November 4, 2019 11:39 PM
> To: Discussion list for crash utility usage, maintenance and development
> 
> Cc: 陈启武 
> Subject: [External Mail]Re: [Crash-utility] [PATCH] Fix a potential segfault
> for the ARM64 "bt -S " command
> 
> 
> 
> - Original Message -
> 
> > > The stackframe.fp(0xff9c29e4f8e0) is larger than the stacktop
> > > address, so lead to segmentation violation gernarated by accessing
> > > regs->sp:
> > > (gdb) p /x 18446743644915693792//stkptr
> > > $5 = 0xff9c29e4f8e0
> > > (gdb) p /x
> > > 0xff9c29e4f8e0-0xff800c428000//STACK_OFFSET_TYPE(stkptr)
> > > $6 = 0x1c1da278e0
> > > (gdb) p /x regs
> > > $7 = 0x55717394b3c0
> > > (gdb) p *(struct arm64_pt_regs *) 0x55717394b3c0 Cannot access
> > > memory at address 0x55717394b3c0
> > >
> > > For fix this, I think it must be add a condition
> > > "arm64_in_exception_text(stackframe.pc) && INSTACK(stackframe.fp, bt)"
> > > to avoid an invalid exception frame before transitioning to the process
> > > stack.
> 
> Or alternatively, would it be better to have
> arm64_is_kernel_exception_frame() verify that th

  1   2   3   4   5   6   7   8   9   10   >