Re: [Crash-utility] Faster iteration on list of struct.field
Dave Anderson wrote on Fri, Feb 07, 2020: > > Actually, for do_datatype_addr(), I'm pretty sure we never use the > > member_offset at all -- I just replaced member_to_datatype() by > > something that just assigned the member to dm->member and things just > > work for anything I could try (struct foo, struct foo.member, struct > > foo.member1,member2, struct foo.member1.innermember, struct -o foo, > > struct -o foo.member...) > > I'm not following, your SHOW_RAW_DATA patch bumps the incoming addr by > dm->member_offset right? Or do you mean in the normal case? You are > probably right there. Yes, I meant the normal case does not seem to use the result of member_to_datatype() at all, except for it setting dm->member to memberlist[i] Sorry, I'm just mostly thinking out loud, it's just annoying me to call datatype_info twice when most of the time it will get both offset and size filled in. I just tried and with 1000 elements time spent here goes up from 0.79s to 0.97s but that still is very acceptable compared to... 132.30s for the version without -r. It's the stingy part of me that doesn't like doing the same gdb call twice :) > > Can get it in one call with casts, but that's not exactly pretty... > > "(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"... > > > > Would that work for you? > > I'd prefer to put it a separate function and macro, i.e., a new > ANON_MEMBER_SIZE() > macro and accompanying anon_member_size(), just to follow convention. Yes, I meant doing that in a new function, was just worried the size computation is a bit too convoluted. I will do as you suggested and submit a new patch when I find some time to cleanup what I have. Cheers, -- Dominique -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
- Original Message - > Dave Anderson wrote on Fri, Feb 07, 2020: > > > Following up with patch, with a couple of remarks: > > > - I had to change member_to_datatype() to use datatype_info() directly > > > instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if > > > this will have any side effects, but things like 'struct foo.a,b' still > > > work at least. You might have a better idea of what to check. > > > > Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and > > not have datatype_info() re-initialize the incoming dm. (except for > > the setting of dm->member). Maybe have a different flag for gathering > > the size as you have, and keep the original functionality the same? > > Or alternatively, leave the call to member_datatype() as-is, and if > > do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()? > > Ok, I just did that to save an extra gdb call as I was assuming that was > slow, but it would probably be acceptable to call into gdb twice except > for intellectual satisfaction... :) > > Alternatively, have member_to_datatype() declare a temporary dm, pass > that to datatype_info, and fill only dm->member_size ? > I find it a bit weird to fill only member_offset and not the size... > > Actually, for do_datatype_addr(), I'm pretty sure we never use the > member_offset at all -- I just replaced member_to_datatype() by > something that just assigned the member to dm->member and things just > work for anything I could try (struct foo, struct foo.member, struct > foo.member1,member2, struct foo.member1.innermember, struct -o foo, > struct -o foo.member...) I'm not following, your SHOW_RAW_DATA patch bumps the incoming addr by dm->member_offset right? Or do you mean in the normal case? You are probably right there. > > > > > - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the > > > non-raw case. > > > > I think you mean just the opposite... > > woops, I meant what I wrote here, the code I sent is wrong -- we're not > getting the member_size in the ANON_MEMBER_QUERY so I didn't want it for > a first approach. > > > All of the ANON_xxx macros were added for getting information for members > > that are declared inside of anonymous structures within a structure, where > > where the generic datatype_info() call fails. In those cases, the request > > gets directed to a gdb print command within anon_member_offset(). There is > > no support for getting the size of such a member, so MEMBER_SIZE() would > > fail. So I don't think this feature would work for those types of members, > > and would need some kind of ANON_MEMBER_SIZE() and accompanying > > anon_member_size() functionality. > > Hm, yeah, probably want to support these too. > > anon_member_offset tricks gdb with the usual "&((type*)0)->member", we > could get the size by printing the same "+1" and computing the > difference. > > Can get it in one call with casts, but that's not exactly pretty... > "(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"... > > Would that work for you? I'd prefer to put it a separate function and macro, i.e., a new ANON_MEMBER_SIZE() macro and accompanying anon_member_size(), just to follow convention. Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
Dave Anderson wrote on Fri, Feb 07, 2020: > > Following up with patch, with a couple of remarks: > > - I had to change member_to_datatype() to use datatype_info() directly > > instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if > > this will have any side effects, but things like 'struct foo.a,b' still > > work at least. You might have a better idea of what to check. > > Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and > not have datatype_info() re-initialize the incoming dm. (except for > the setting of dm->member). Maybe have a different flag for gathering > the size as you have, and keep the original functionality the same? > Or alternatively, leave the call to member_datatype() as-is, and if > do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()? Ok, I just did that to save an extra gdb call as I was assuming that was slow, but it would probably be acceptable to call into gdb twice except for intellectual satisfaction... :) Alternatively, have member_to_datatype() declare a temporary dm, pass that to datatype_info, and fill only dm->member_size ? I find it a bit weird to fill only member_offset and not the size... Actually, for do_datatype_addr(), I'm pretty sure we never use the member_offset at all -- I just replaced member_to_datatype() by something that just assigned the member to dm->member and things just work for anything I could try (struct foo, struct foo.member, struct foo.member1,member2, struct foo.member1.innermember, struct -o foo, struct -o foo.member...) > > - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the > > non-raw case. > > I think you mean just the opposite... woops, I meant what I wrote here, the code I sent is wrong -- we're not getting the member_size in the ANON_MEMBER_QUERY so I didn't want it for a first approach. > All of the ANON_xxx macros were added for getting information for members > that are declared inside of anonymous structures within a structure, where > where the generic datatype_info() call fails. In those cases, the request > gets directed to a gdb print command within anon_member_offset(). There is > no support for getting the size of such a member, so MEMBER_SIZE() would > fail. So I don't think this feature would work for those types of members, > and would need some kind of ANON_MEMBER_SIZE() and accompanying > anon_member_size() functionality. Hm, yeah, probably want to support these too. anon_member_offset tricks gdb with the usual "&((type*)0)->member", we could get the size by printing the same "+1" and computing the difference. Can get it in one call with casts, but that's not exactly pretty... "(u64)(&((type*)0)->member + 1) - (u64)&((type*)0)->member"... Would that work for you? -- Dominique -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
- Original Message - > Dave Anderson wrote on Thu, Feb 06, 2020: > > Right, the time-consumer is the call into gdb to get the structure member > > details. > > > > I'm not following what you mean by "making 'datatype_member' static ...", > > but > > off the top of my head, I was thinking of adding a "tmp_offset" and > > "tmp_size" > > fields to the global offset_table and size_table structures, and adding a > > new > > pc->curcmd_flags bit. Then, if a command that wants to support such a > > concept, > > it could: > > > > (1) check the new pc->curcmd_flags bit, which will always be 0 the first > > time > > the function gets called by the exec_args_input_file() loop. > > (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size) > > (3) set the new flag in pc->curcmd_flags, and continue... > > > > Then during the second and subsequent times through the loop, > > pc->curcmd_flags > > will still be set/unchanged, because restore_sanity() is not called from > > the > > exec_args_input_flags() loop. > > > > But that scheme falls down if a user requests a comma-separated list of > > multiple members (argc_members would be > 1). Although, it might be better > > if the "struct -r' option rejects multiple-member arguments, especially > > given > > that the output would be pretty much unreadable. > > I would tend to agree with that, struct -r with multiple members might > be somewhat parsable but if someone can do that they can dump the whole > struct and parse it anyway, so let's go with only one number. > > On the good news though, this whole caching isn't going to be > immediately needed. I just finished the first part of this (allowing > struct -r with a member), and struct -r is already infinitely faster > than struct; so getting the offset wasn't the slow part: > - with a small 100-elements file, I'm already going down from 12s to > near-instant on this old laptop. > - I didn't wait for 1000-elements to finish normally but it's just > about one second with -r, which is acceptable enough for me. > > Caching might make it another order of magnitude faster but for now I'm > happy to wait a couple of minutes for my 100k elements list, it's better > than not finishing in half an hour :) Ok, so it must be the gdb-assisted print_struct() and parsing that's time-consuming, and not the gdb datatype query. > > Following up with patch, with a couple of remarks: > - I had to change member_to_datatype() to use datatype_info() directly > instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if > this will have any side effects, but things like 'struct foo.a,b' still > work at least. You might have a better idea of what to check. Hmmm, I'd prefer to keep the member_datatype() behavior as it is, and not have datatype_info() re-initialize the incoming dm. (except for the setting of dm->member). Maybe have a different flag for gathering the size as you have, and keep the original functionality the same? Or alternatively, leave the call to member_datatype() as-is, and if do_datatype_addr() sees SHOW_RAW_DATA, additionally call MEMBER_SIZE()? > - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the > non-raw case. I think you mean just the opposite... if (!member_to_datatype(memberlist[i], dm, - ANON_MEMBER_QUERY)) + (flags & SHOW_RAW_DATA) ? ANON_MEMBER_QUERY : 0)) error(FATAL, "invalid data structure reference: %s.%s\n", dm->name, memberlist[i]); The use of ANON_MEMBER_QUERY is just there for a fall-back option if the MEMBER_OFFSET() call fails. >I'm not quite sure why we couldn't get the member size if > it's an anon union/stuct, but I'm not sure how one would name an > anonymous field in the first place here? Anyway, one would get invalid > data structure reference message there if they do. It might be better > to always pass the argument and then check for SHOW_RAW_DATA set with > dm->member_size still being 0 after call to give another more > appropriate error if you think people might hit that. All of the ANON_xxx macros were added for getting information for members that are declared inside of anonymous structures within a structure, where where the generic datatype_info() call fails. In those cases, the request gets directed to a gdb print command within anon_member_offset(). There is no support for getting the size of such a member, so MEMBER_SIZE() would fail. So I don't think this feature would work for those types of members, and would need some kind of ANON_MEMBER_SIZE() and accompanying anon_member_size() functionality. Dave > Anyway, thanks for the advices. > -- > Dominique > > > -- > Crash-utility mailing list > Crash-utility@redhat.com >
Re: [Crash-utility] Faster iteration on list of struct.field
Dave Anderson wrote on Thu, Feb 06, 2020: > Right, the time-consumer is the call into gdb to get the structure member > details. > > I'm not following what you mean by "making 'datatype_member' static ...", but > off the top of my head, I was thinking of adding a "tmp_offset" and "tmp_size" > fields to the global offset_table and size_table structures, and adding a new > pc->curcmd_flags bit. Then, if a command that wants to support such a > concept, > it could: > > (1) check the new pc->curcmd_flags bit, which will always be 0 the first > time > the function gets called by the exec_args_input_file() loop. > (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size) > (3) set the new flag in pc->curcmd_flags, and continue... > > Then during the second and subsequent times through the loop, pc->curcmd_flags > will still be set/unchanged, because restore_sanity() is not called from the > exec_args_input_flags() loop. > > But that scheme falls down if a user requests a comma-separated list of > multiple members (argc_members would be > 1). Although, it might be better > if the "struct -r' option rejects multiple-member arguments, especially given > that the output would be pretty much unreadable. I would tend to agree with that, struct -r with multiple members might be somewhat parsable but if someone can do that they can dump the whole struct and parse it anyway, so let's go with only one number. On the good news though, this whole caching isn't going to be immediately needed. I just finished the first part of this (allowing struct -r with a member), and struct -r is already infinitely faster than struct; so getting the offset wasn't the slow part: - with a small 100-elements file, I'm already going down from 12s to near-instant on this old laptop. - I didn't wait for 1000-elements to finish normally but it's just about one second with -r, which is acceptable enough for me. Caching might make it another order of magnitude faster but for now I'm happy to wait a couple of minutes for my 100k elements list, it's better than not finishing in half an hour :) Following up with patch, with a couple of remarks: - I had to change member_to_datatype() to use datatype_info() directly instead of MEMBER_OFFSET(), to fill dm->member_size. I'm not sure if this will have any side effects, but things like 'struct foo.a,b' still work at least. You might have a better idea of what to check. - I'm only passing ANON_MEMBER_QUERY to member_to_datatype() in the non-raw case. I'm not quite sure why we couldn't get the member size if it's an anon union/stuct, but I'm not sure how one would name an anonymous field in the first place here? Anyway, one would get invalid data structure reference message there if they do. It might be better to always pass the argument and then check for SHOW_RAW_DATA set with dm->member_size still being 0 after call to give another more appropriate error if you think people might hit that. Anyway, thanks for the advices. -- Dominique -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
- Original Message - > Dave Anderson wrote on Wed, Feb 05, 2020: > > > What might make sense is to use the "struct -r" option, which does a raw > > > memory dump of a data structure. But for a reason I do not recall, it > > > prevents that option from being used with a "struct_name.field" argument. > > > (see line 6628 of symbols.c). But I don't see why that couldn't be made > > > to work, though, since the end result is simply a call to > > > raw_data_dump(). > > I'll give this a try tomorrow, probably just needs to add > dm->member_offset to addr and dump dm->member_size long value, that > looks straightforward enough. > > > ...and then if you get "struct -r" to work with a "struct_name.field" > > argument, the next challenge would be the caching aspect of your request. > > > > Currently there's no manner in which command-specific information is > > cached beyond the execution of a single command. With "< file", the > > command gets executed from scratch each time. > > That does look more challenging... Or rather more a matter of taste? a > kludge probably wouldn't be so bad to put in, but it's probably better > to have something more generic than making 'datatype_member' static in > cmd_datatype_common (well, it needs a bit more than that as the argument > strings won't be useable from one call to the next...) > > I assume the slow part in this will be the member_to_datatype call in > do_datatype_addr? I'll first confirm that's the only slow bit, if there > is only one spot to optimize away it might not be so bad. > > But yeah, without caching I don't think it's realistic; and making the > '< file' construct iterate within the function looks more work than > trying to make struct cache some info. > > Thanks! Right, the time-consumer is the call into gdb to get the structure member details. I'm not following what you mean by "making 'datatype_member' static ...", but off the top of my head, I was thinking of adding a "tmp_offset" and "tmp_size" fields to the global offset_table and size_table structures, and adding a new pc->curcmd_flags bit. Then, if a command that wants to support such a concept, it could: (1) check the new pc->curcmd_flags bit, which will always be 0 the first time the function gets called by the exec_args_input_file() loop. (2) if the new bit is 0, then set OFFSET(tmp_offset) and SIZE(tmp_size) (3) set the new flag in pc->curcmd_flags, and continue... Then during the second and subsequent times through the loop, pc->curcmd_flags will still be set/unchanged, because restore_sanity() is not called from the exec_args_input_flags() loop. But that scheme falls down if a user requests a comma-separated list of multiple members (argc_members would be > 1). Although, it might be better if the "struct -r' option rejects multiple-member arguments, especially given that the output would be pretty much unreadable. Dave > -- > Dominique > > > -- > Crash-utility mailing list > Crash-utility@redhat.com > https://www.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
Dave Anderson wrote on Wed, Feb 05, 2020: > > What might make sense is to use the "struct -r" option, which does a raw > > memory dump of a data structure. But for a reason I do not recall, it > > prevents that option from being used with a "struct_name.field" argument. > > (see line 6628 of symbols.c). But I don't see why that couldn't be made > > to work, though, since the end result is simply a call to raw_data_dump(). I'll give this a try tomorrow, probably just needs to add dm->member_offset to addr and dump dm->member_size long value, that looks straightforward enough. > ...and then if you get "struct -r" to work with a "struct_name.field" > argument, the next challenge would be the caching aspect of your request. > > Currently there's no manner in which command-specific information is > cached beyond the execution of a single command. With "< file", the > command gets executed from scratch each time. That does look more challenging... Or rather more a matter of taste? a kludge probably wouldn't be so bad to put in, but it's probably better to have something more generic than making 'datatype_member' static in cmd_datatype_common (well, it needs a bit more than that as the argument strings won't be useable from one call to the next...) I assume the slow part in this will be the member_to_datatype call in do_datatype_addr? I'll first confirm that's the only slow bit, if there is only one spot to optimize away it might not be so bad. But yeah, without caching I don't think it's realistic; and making the '< file' construct iterate within the function looks more work than trying to make struct cache some info. Thanks! -- Dominique -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
- Original Message - > > > - Original Message - > > Hi, > > > > I often find myself dumping a bunch of addresses to files to iterate > > with 'struct_name.field < file_with_addresses', but that is horribly > > slow for large number of iterations. > > > > `help list` comment for -S vs. -s made me try to use `rd` instead, > > e.g. get offset manually from `struct -o` then use rd instead like > > `rd -o xx < addr_list | awk '{ print $2 }' > value_list` -- and that is > > infinitely better. > > > > > > Would it make sense to add a similar option to 'struct' instead so one > > could do e.g. `struct -S struct_name.field addr` instead of the dance I was > > doing? > > (That would require to cache field offset in crash and not query it > > again everytime, from a quick look at the code, but we could only cache > > one and still gain a lot for such iterations...) > > > > > > Am I missing another more practical way of doing this? > > (I guess it's not so bad now I came up with using 'rd', but that was > > non-obvious to me. My use case here involved following a couple of > > pointers from a list so I dumped the first pointer to follow from list > > with -S struct1.field1, but then the following iteration just wouldn't > > end naively) > > Dominique, > > What might make sense is to use the "struct -r" option, which does a raw > memory dump of a data structure. But for a reason I do not recall, it > prevents that option from being used with a "struct_name.field" argument. > (see line 6628 of symbols.c). But I don't see why that couldn't be made > to work, though, since the end result is simply a call to raw_data_dump(). > > Dave ...and then if you get "struct -r" to work with a "struct_name.field" argument, the next challenge would be the caching aspect of your request. Currently there's no manner in which command-specific information is cached beyond the execution of a single command. With "< file", the command gets executed from scratch each time. Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility
Re: [Crash-utility] Faster iteration on list of struct.field
- Original Message - > Hi, > > I often find myself dumping a bunch of addresses to files to iterate > with 'struct_name.field < file_with_addresses', but that is horribly > slow for large number of iterations. > > `help list` comment for -S vs. -s made me try to use `rd` instead, > e.g. get offset manually from `struct -o` then use rd instead like > `rd -o xx < addr_list | awk '{ print $2 }' > value_list` -- and that is > infinitely better. > > > Would it make sense to add a similar option to 'struct' instead so one > could do e.g. `struct -S struct_name.field addr` instead of the dance I was > doing? > (That would require to cache field offset in crash and not query it > again everytime, from a quick look at the code, but we could only cache > one and still gain a lot for such iterations...) > > > Am I missing another more practical way of doing this? > (I guess it's not so bad now I came up with using 'rd', but that was > non-obvious to me. My use case here involved following a couple of > pointers from a list so I dumped the first pointer to follow from list > with -S struct1.field1, but then the following iteration just wouldn't > end naively) Dominique, What might make sense is to use the "struct -r" option, which does a raw memory dump of a data structure. But for a reason I do not recall, it prevents that option from being used with a "struct_name.field" argument. (see line 6628 of symbols.c). But I don't see why that couldn't be made to work, though, since the end result is simply a call to raw_data_dump(). Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility