Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-07 Thread Dominique Martinet
Dave Anderson wrote on Fri, Feb 07, 2020:
> > Actually, for do_datatype_addr(), I'm pretty sure we never use the
> > member_offset at all -- I just replaced member_to_datatype() by
> > something that just assigned the member to dm->member and things just
> > work for anything I could try (struct foo, struct foo.member, struct
> > foo.member1,member2, struct foo.member1.innermember, struct -o foo,
> > struct -o foo.member...)
> 
> I'm not following, your SHOW_RAW_DATA patch bumps the incoming addr by
> dm->member_offset right?  Or do you mean in the normal case?  You are 
> probably right there.

Yes, I meant the normal case does not seem to use the result of
member_to_datatype() at all, except for it setting dm->member to
memberlist[i]

Sorry, I'm just mostly thinking out loud, it's just annoying me to call
datatype_info twice when most of the time it will get both offset and
size filled in.
I just tried and with 1000 elements time spent here goes up from 0.79s
to 0.97s but that still is very acceptable compared to... 132.30s for
the version without -r.
It's the stingy part of me that doesn't like doing the same gdb call
twice :)


> > Can get it in one call with casts, but that's not exactly pretty...
> > "(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"...
> > 
> > Would that work for you?
> 
> I'd prefer to put it a separate function and macro, i.e., a new 
> ANON_MEMBER_SIZE()
> macro and accompanying anon_member_size(), just to follow convention.

Yes, I meant doing that in a new function, was just worried the
size computation is a bit too convoluted.

I will do as you suggested and submit a new patch when I find some time
to cleanup what I have.


Cheers,
-- 
Dominique


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-07 Thread Dave Anderson



- Original Message -
> Dave Anderson wrote on Fri, Feb 07, 2020:
> > > Following up with patch, with a couple of remarks:
> > >  - I had to change member_to_datatype() to use datatype_info() directly
> > > instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if
> > > this will have any side effects, but things like 'struct foo.a,b' still
> > > work at least. You might have a better idea of what to check.
> > 
> > Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and
> > not have datatype_info() re-initialize the incoming dm.  (except for
> > the setting of dm->member).  Maybe have a different flag for gathering
> > the size as you have, and keep the original functionality the same?
> > Or alternatively, leave the call to member_datatype() as-is, and if
> > do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()?
> 
> Ok, I just did that to save an extra gdb call as I was assuming that was
> slow, but it would probably be acceptable to call into gdb twice except
> for intellectual satisfaction... :)
> 
> Alternatively, have member_to_datatype() declare a temporary dm, pass
> that to datatype_info, and fill only dm->member_size ?
> I find it a bit weird to fill only member_offset and not the size...
> 
> Actually, for do_datatype_addr(), I'm pretty sure we never use the
> member_offset at all -- I just replaced member_to_datatype() by
> something that just assigned the member to dm->member and things just
> work for anything I could try (struct foo, struct foo.member, struct
> foo.member1,member2, struct foo.member1.innermember, struct -o foo,
> struct -o foo.member...)

I'm not following, your SHOW_RAW_DATA patch bumps the incoming addr by
dm->member_offset right?  Or do you mean in the normal case?  You are 
probably right there.

> 
> 
> > >  - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the
> > > non-raw case.
> > 
> > I think you mean just the opposite...
> 
> woops, I meant what I wrote here, the code I sent is wrong -- we're not
> getting the member_size in the ANON_MEMBER_QUERY so I didn't want it for
> a first approach.
> 
> > All of the ANON_xxx macros were added for getting information for members
> > that are declared inside of anonymous structures within a structure, where
> > where the generic datatype_info() call fails.  In those cases, the request
> > gets directed to a gdb print command within anon_member_offset().  There is
> > no support for getting the size of such a member, so MEMBER_SIZE() would
> > fail.  So I don't think this feature would work for those types of members,
> > and would need some kind of ANON_MEMBER_SIZE() and accompanying
> > anon_member_size() functionality.
> 
> Hm, yeah, probably want to support these too.
> 
> anon_member_offset tricks gdb with the usual "&((type*)0)->member", we
> could get the size by printing the same "+1" and computing the
> difference.
> 
> Can get it in one call with casts, but that's not exactly pretty...
> "(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"...
> 
> Would that work for you?

I'd prefer to put it a separate function and macro, i.e., a new 
ANON_MEMBER_SIZE()
macro and accompanying anon_member_size(), just to follow convention.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-07 Thread Dominique Martinet
Dave Anderson wrote on Fri, Feb 07, 2020:
> > Following up with patch, with a couple of remarks:
> >  - I had to change member_to_datatype() to use datatype_info() directly
> > instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if
> > this will have any side effects, but things like 'struct foo.a,b' still
> > work at least. You might have a better idea of what to check.
> 
> Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and
> not have datatype_info() re-initialize the incoming dm.  (except for
> the setting of dm->member).  Maybe have a different flag for gathering
> the size as you have, and keep the original functionality the same?  
> Or alternatively, leave the call to member_datatype() as-is, and if 
> do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()?

Ok, I just did that to save an extra gdb call as I was assuming that was
slow, but it would probably be acceptable to call into gdb twice except
for intellectual satisfaction... :)

Alternatively, have member_to_datatype() declare a temporary dm, pass
that to datatype_info, and fill only dm->member_size ?
I find it a bit weird to fill only member_offset and not the size...

Actually, for do_datatype_addr(), I'm pretty sure we never use the
member_offset at all -- I just replaced member_to_datatype() by
something that just assigned the member to dm->member and things just
work for anything I could try (struct foo, struct foo.member, struct
foo.member1,member2, struct foo.member1.innermember, struct -o foo,
struct -o foo.member...)



> >  - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the
> > non-raw case. 
> 
> I think you mean just the opposite...

woops, I meant what I wrote here, the code I sent is wrong -- we're not
getting the member_size in the ANON_MEMBER_QUERY so I didn't want it for
a first approach.

> All of the ANON_xxx macros were added for getting information for members
> that are declared inside of anonymous structures within a structure, where
> where the generic datatype_info() call fails.  In those cases, the request
> gets directed to a gdb print command within anon_member_offset().  There is
> no support for getting the size of such a member, so MEMBER_SIZE() would
> fail.  So I don't think this feature would work for those types of members,
> and would need some kind of ANON_MEMBER_SIZE() and accompanying 
> anon_member_size() functionality.

Hm, yeah, probably want to support these too.

anon_member_offset tricks gdb with the usual "&((type*)0)->member", we
could get the size by printing the same "+1" and computing the
difference.

Can get it in one call with casts, but that's not exactly pretty...
"(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"...

Would that work for you?

-- 
Dominique


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-07 Thread Dave Anderson



- Original Message -
> Dave Anderson wrote on Thu, Feb 06, 2020:
> > Right, the time-consumer is the call into gdb to get the structure member
> > details.
> > 
> > I'm not following what you mean by "making 'datatype_member' static ...",
> > but
> > off the top of my head, I was thinking of adding a "tmp_offset" and
> > "tmp_size"
> > fields to the global offset_table and size_table structures, and adding a
> > new
> > pc->curcmd_flags bit.  Then, if a command that wants to support such a
> > concept,
> > it could:
> > 
> >   (1) check the new pc->curcmd_flags bit, which will always be 0 the first
> >   time
> >   the function gets called by the exec_args_input_file() loop.
> >   (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size)
> >   (3) set the new flag in pc->curcmd_flags, and continue...
> > 
> > Then during the second and subsequent times through the loop,
> > pc->curcmd_flags
> > will still be set/unchanged, because restore_sanity() is not called from
> > the
> > exec_args_input_flags() loop.
> > 
> > But that scheme falls down if a user requests a comma-separated list of
> > multiple members (argc_members would be > 1).  Although, it might be better
> > if the "struct -r' option rejects multiple-member arguments, especially
> > given
> > that the output would be pretty much unreadable.
> 
> I would tend to agree with that, struct -r with multiple members might
> be somewhat parsable but if someone can do that they can dump the whole
> struct and parse it anyway, so let's go with only one number.
> 
> On the good news though, this whole caching isn't going to be
> immediately needed. I just finished the first part of this (allowing
> struct -r with a member), and struct -r is already infinitely faster
> than struct; so getting the offset wasn't the slow part:
>  - with a small 100-elements file, I'm already going down from 12s to
> near-instant on this old laptop.
>  - I didn't wait for 1000-elements to finish normally but it's just
> about one second with -r, which is acceptable enough for me.
> 
> Caching might make it another order of magnitude faster but for now I'm
> happy to wait a couple of minutes for my 100k elements list, it's better
> than not finishing in half an hour :)

Ok, so it must be the gdb-assisted print_struct() and parsing that's 
time-consuming, and not the gdb datatype query.

> 
> Following up with patch, with a couple of remarks:
>  - I had to change member_to_datatype() to use datatype_info() directly
> instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if
> this will have any side effects, but things like 'struct foo.a,b' still
> work at least. You might have a better idea of what to check.

Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and
not have datatype_info() re-initialize the incoming dm.  (except for
the setting of dm->member).  Maybe have a different flag for gathering
the size as you have, and keep the original functionality the same?  
Or alternatively, leave the call to member_datatype() as-is, and if 
do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()?

>  - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the
> non-raw case. 

I think you mean just the opposite...

   if (!member_to_datatype(memberlist[i], dm,
-   ANON_MEMBER_QUERY))
+   (flags & SHOW_RAW_DATA) 
? ANON_MEMBER_QUERY : 0))
error(FATAL, "invalid data structure 
reference: %s.%s\n",
  dm->name, memberlist[i]);

The use of ANON_MEMBER_QUERY is just there for a fall-back option if the
MEMBER_OFFSET() call fails.  

>I'm not quite sure why we couldn't get the member size if
> it's an anon union/stuct, but I'm not sure how one would name an
> anonymous field in the first place here? Anyway, one would get invalid
> data structure reference message there if they do. It might be better
> to always pass the argument and then check for SHOW_RAW_DATA set with
> dm->member_size still being 0 after call to give another more
> appropriate error if you think people might hit that.

All of the ANON_xxx macros were added for getting information for members
that are declared inside of anonymous structures within a structure, where
where the generic datatype_info() call fails.  In those cases, the request
gets directed to a gdb print command within anon_member_offset().  There is
no support for getting the size of such a member, so MEMBER_SIZE() would
fail.  So I don't think this feature would work for those types of members,
and would need some kind of ANON_MEMBER_SIZE() and accompanying 
anon_member_size() functionality.

Dave

> Anyway, thanks for the advices.
> --
> Dominique
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> 

Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-06 Thread Dominique Martinet
Dave Anderson wrote on Thu, Feb 06, 2020:
> Right, the time-consumer is the call into gdb to get the structure member 
> details.
> 
> I'm not following what you mean by "making 'datatype_member' static ...", but
> off the top of my head, I was thinking of adding a "tmp_offset" and "tmp_size"
> fields to the global offset_table and size_table structures, and adding a new
> pc->curcmd_flags bit.  Then, if a command that wants to support such a 
> concept, 
> it could:
> 
>   (1) check the new pc->curcmd_flags bit, which will always be 0 the first 
> time
>   the function gets called by the exec_args_input_file() loop.
>   (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size)
>   (3) set the new flag in pc->curcmd_flags, and continue...
> 
> Then during the second and subsequent times through the loop, pc->curcmd_flags
> will still be set/unchanged, because restore_sanity() is not called from the 
> exec_args_input_flags() loop.
> 
> But that scheme falls down if a user requests a comma-separated list of
> multiple members (argc_members would be > 1).  Although, it might be better
> if the "struct -r' option rejects multiple-member arguments, especially given
> that the output would be pretty much unreadable.

I would tend to agree with that, struct -r with multiple members might
be somewhat parsable but if someone can do that they can dump the whole
struct and parse it anyway, so let's go with only one number.

On the good news though, this whole caching isn't going to be
immediately needed. I just finished the first part of this (allowing
struct -r with a member), and struct -r is already infinitely faster
than struct; so getting the offset wasn't the slow part:
 - with a small 100-elements file, I'm already going down from 12s to
near-instant on this old laptop.
 - I didn't wait for 1000-elements to finish normally but it's just
about one second with -r, which is acceptable enough for me.

Caching might make it another order of magnitude faster but for now I'm
happy to wait a couple of minutes for my 100k elements list, it's better
than not finishing in half an hour :)


Following up with patch, with a couple of remarks:
 - I had to change member_to_datatype() to use datatype_info() directly
instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if
this will have any side effects, but things like 'struct foo.a,b' still
work at least. You might have a better idea of what to check.
 - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the
non-raw case. I'm not quite sure why we couldn't get the member size if
it's an anon union/stuct, but I'm not sure how one would name an
anonymous field in the first place here? Anyway, one would get invalid
data structure reference message there if they do. It might be better
to always pass the argument and then check for SHOW_RAW_DATA set with
dm->member_size still being 0 after call to give another more
appropriate error if you think people might hit that.

Anyway, thanks for the advices.
-- 
Dominique


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-06 Thread Dave Anderson



- Original Message -
> Dave Anderson wrote on Wed, Feb 05, 2020:
> > > What might make sense is to use the "struct -r" option, which does a raw
> > > memory dump of a data structure.  But for a reason I do not recall, it
> > > prevents that option from being used with a "struct_name.field" argument.
> > > (see line 6628 of symbols.c).  But I don't see why that couldn't be made
> > > to work, though, since the end result is simply a call to
> > > raw_data_dump().
> 
> I'll give this a try tomorrow, probably just needs to add
> dm->member_offset to addr and dump dm->member_size long value, that
> looks straightforward enough.
> 
> > ...and then if you get "struct -r" to work with a "struct_name.field"
> > argument, the next challenge would be the caching aspect of your request.
> > 
> > Currently there's no manner in which command-specific information is
> > cached beyond the execution of a single command.  With "< file", the
> > command gets executed from scratch each time.
> 
> That does look more challenging... Or rather more a matter of taste? a
> kludge probably wouldn't be so bad to put in, but it's probably better
> to have something more generic than making 'datatype_member' static in
> cmd_datatype_common (well, it needs a bit more than that as the argument
> strings won't be useable from one call to the next...)
>  
> I assume the slow part in this will be the member_to_datatype call in
> do_datatype_addr? I'll first confirm that's the only slow bit, if there
> is only one spot to optimize away it might not be so bad.
> 
> But yeah, without caching I don't think it's realistic; and making the
> '< file' construct iterate within the function looks more work than
> trying to make struct cache some info.
> 
> Thanks!

Right, the time-consumer is the call into gdb to get the structure member 
details.

I'm not following what you mean by "making 'datatype_member' static ...", but
off the top of my head, I was thinking of adding a "tmp_offset" and "tmp_size"
fields to the global offset_table and size_table structures, and adding a new
pc->curcmd_flags bit.  Then, if a command that wants to support such a concept, 
it could:

  (1) check the new pc->curcmd_flags bit, which will always be 0 the first time
  the function gets called by the exec_args_input_file() loop.
  (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size)
  (3) set the new flag in pc->curcmd_flags, and continue...

Then during the second and subsequent times through the loop, pc->curcmd_flags
will still be set/unchanged, because restore_sanity() is not called from the 
exec_args_input_flags() loop.

But that scheme falls down if a user requests a comma-separated list of
multiple members (argc_members would be > 1).  Although, it might be better
if the "struct -r' option rejects multiple-member arguments, especially given
that the output would be pretty much unreadable.

Dave


 
 > --
> Dominique
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-05 Thread Dominique Martinet
Dave Anderson wrote on Wed, Feb 05, 2020:
> > What might make sense is to use the "struct -r" option, which does a raw
> > memory dump of a data structure.  But for a reason I do not recall, it
> > prevents that option from being used with a "struct_name.field" argument.
> > (see line 6628 of symbols.c).  But I don't see why that couldn't be made
> > to work, though, since the end result is simply a call to raw_data_dump().

I'll give this a try tomorrow, probably just needs to add
dm->member_offset to addr and dump dm->member_size long value, that
looks straightforward enough.

> ...and then if you get "struct -r" to work with a "struct_name.field" 
> argument, the next challenge would be the caching aspect of your request.
> 
> Currently there's no manner in which command-specific information is
> cached beyond the execution of a single command.  With "< file", the
> command gets executed from scratch each time. 

That does look more challenging... Or rather more a matter of taste? a
kludge probably wouldn't be so bad to put in, but it's probably better
to have something more generic than making 'datatype_member' static in
cmd_datatype_common (well, it needs a bit more than that as the argument
strings won't be useable from one call to the next...)


I assume the slow part in this will be the member_to_datatype call in
do_datatype_addr? I'll first confirm that's the only slow bit, if there
is only one spot to optimize away it might not be so bad.

But yeah, without caching I don't think it's realistic; and making the
'< file' construct iterate within the function looks more work than
trying to make struct cache some info.

Thanks!
-- 
Dominique


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-05 Thread Dave Anderson



- Original Message -
> 
> 
> - Original Message -
> > Hi,
> > 
> > I often find myself dumping a bunch of addresses to files to iterate
> > with 'struct_name.field < file_with_addresses', but that is horribly
> > slow for large number of iterations.
> > 
> > `help list` comment for -S vs. -s made me try to use `rd` instead,
> > e.g. get offset manually from `struct -o` then use rd instead like
> > `rd -o xx < addr_list | awk '{ print $2 }' > value_list` -- and that is
> > infinitely better.
> > 
> > 
> > Would it make sense to add a similar option to 'struct' instead so one
> > could do e.g. `struct -S struct_name.field addr` instead of the dance I was
> > doing?
> > (That would require to cache field offset in crash and not query it
> > again everytime, from a quick look at the code, but we could only cache
> > one and still gain a lot for such iterations...)
> > 
> > 
> > Am I missing another more practical way of doing this?
> > (I guess it's not so bad now I came up with using 'rd', but that was
> > non-obvious to me. My use case here involved following a couple of
> > pointers from a list so I dumped the first pointer to follow from list
> > with -S struct1.field1, but then the following iteration just wouldn't
> > end naively)
> 
> Dominique,
> 
> What might make sense is to use the "struct -r" option, which does a raw
> memory dump of a data structure.  But for a reason I do not recall, it
> prevents that option from being used with a "struct_name.field" argument.
> (see line 6628 of symbols.c).  But I don't see why that couldn't be made
> to work, though, since the end result is simply a call to raw_data_dump().
> 
> Dave

...and then if you get "struct -r" to work with a "struct_name.field" 
argument, the next challenge would be the caching aspect of your request.

Currently there's no manner in which command-specific information is
cached beyond the execution of a single command.  With "< file", the
command gets executed from scratch each time. 

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility



Re: [Crash-utility] Faster iteration on list of struct.field

2020-02-05 Thread Dave Anderson



- Original Message -
> Hi,
> 
> I often find myself dumping a bunch of addresses to files to iterate
> with 'struct_name.field < file_with_addresses', but that is horribly
> slow for large number of iterations.
> 
> `help list` comment for -S vs. -s made me try to use `rd` instead,
> e.g. get offset manually from `struct -o` then use rd instead like
> `rd -o xx < addr_list | awk '{ print $2 }' > value_list` -- and that is
> infinitely better.
> 
> 
> Would it make sense to add a similar option to 'struct' instead so one
> could do e.g. `struct -S struct_name.field addr` instead of the dance I was 
> doing?
> (That would require to cache field offset in crash and not query it
> again everytime, from a quick look at the code, but we could only cache
> one and still gain a lot for such iterations...)
> 
> 
> Am I missing another more practical way of doing this?
> (I guess it's not so bad now I came up with using 'rd', but that was
> non-obvious to me. My use case here involved following a couple of
> pointers from a list so I dumped the first pointer to follow from list
> with -S struct1.field1, but then the following iteration just wouldn't
> end naively)

Dominique,

What might make sense is to use the "struct -r" option, which does a raw
memory dump of a data structure.  But for a reason I do not recall, it
prevents that option from being used with a "struct_name.field" argument.
(see line 6628 of symbols.c).  But I don't see why that couldn't be made 
to work, though, since the end result is simply a call to raw_data_dump().  

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility