Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
labath added a comment. In https://reviews.llvm.org/D24610#554587, @omjavaid wrote: > Give this approach a rethink I dont see a lot of problems with this final > implementation unless it fails on other architectures. > We are already hacking our way to have these byte selection watchpoints > working in existing code. New code seems to be improving the hack in my > opinion. - I don't believe that the presence of one hack justifies the addition of another. If anything, then it's the opposite. - The fact that we have to fiddle with the byte selects to get the kernel to accept our watchpoints could be thought of as a "hack", but I think there it's justified, as otherwise we would be unable to watch anything that isn't dword-aligned. And most importantly, the issue is contained within the SetHardwareWatchpoint function -- noone outside of that needs to know about the fact that we are doing something funny with byte-selects. This is not the case here, as I explain below. > Can you think of any scenarios which might fail for this approach? I can. Try the following sequence of commands: - position the PC on an instruction that will write to address A (aligned) - set a byte watchpoint on A+0 - set a byte watchpoint on A+1 - clear the second watchpoint - single-step What will happen with your patch applied? (*) Nothing, as the watchpoint will not get hit because you disabled it. Now this is a pretty dubious example, but I think it demonstrates, the problems I have with this solution very nicely. **You are lying to the client about which watchpoints are enabled**. The client asked you to remove just one watchpoint, so as far as he is concerned the other one is still active. But you have disabled the other one as well. And now you make this a feature, as without it single stepping a multi-watchpoint will not work. And you are here relying on the fact that the client implements the step-over-watchpoint as a simple $z2, $s, $Z2 packet sequence. If the client decides to do anything more complicated, say evaluate an expression to get some state before the update, then your feature will break, as an intermediate $c, will reinstate your hidden watchpoint (in fact, this is maybe possible already, if a conditional breakpoint and a watchpoint are hit simultaneously). And you don't even fix the step-over-instruction-triggering-multiple-watchpoints problem definitively - this can still happen e.g., if you have a watch on 0x1000 and 0x1008, and a single instruction triggers both.(**) This is why I think this change is bad, as it is making an invisible line between the lowest levels of the server and the highest levels of the client, which now have to be kept in sync, otherwise things will unexpectedly break. I don't believe that having the ability to watch more than 4 locations at once is worth it, particularly when it can be easily worked around by the user (just set one big watchpoint instead of multiple small ones). Do you have a specific use case where this would be necessary/useful? I personally have never used more than one watchpoint per debug session in my life, and I expect this to be true for most people (I think the typical use case is "who is corrupting my variable?"), So, I believe that it is better to have support for fewer watchpoints, but have it working well, than the opposite. (*) Now when I tried this out on lldb without this change, it still did not work as expected - for some reason, after hitting and stepping over the watchpoint, lldb decided to to issue $c, and lost control of the inferior. I think that tracking this issue and fixing it would have more impact than adding the multi-watchpoint support (and it will reduce the number of corner cases, where we are wrong rather than increasing it). (**) Another issue whose solving would have more impact than this. https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
omjavaid updated this revision to Diff 72723. omjavaid added a comment. Give this approach a rethink I dont see a lot of problems with this final implementation unless it fails on other architectures. We are already hacking our way to have these byte selection watchpoints working in existing code. New code seems to be improving the hack in my opinion. Let me explain what I am doing and may be you can provide your suggestion and feedback. Watchpoint Install => Register watchpoint into hardware watchpoint register cache Watchpoint Enable => Enable in cache and write registers using ptrace interface. Ideally we should be able to install/uninstall watchpoints in cache and then enable them all on a resume. In case of arm If a watchpoint is hit we should be able to disable that watchpoint. Step over watchpoint instruction and then re-enable the watchpoint. Our existing implementation will require a lot of changes to do that so thats why here is what i am doing. SetHardwareWatchpoint => Performs Install and Enable - If a new watchpoint slot is going to be used we Install and enable. - For new watchpoint we should be able to complete both Install and or we report an error. - If a duplicate slot is going to be used we Install and enable if required. - Install means updating size if needed plus updating ref count. - Enable means updating registers if size was updated. ClearHardwareWatchpoint - Disable and uinstall watchpoint means - Decrement ref count and clear hardware watchpoint regsiters. - Advantage of keeping ref counts is: - If refcount is greater than zero then SetHardwareWatchpoint cannot use this slot for a new watchpoint (new address). - But SetHardwareWatchpoint can be use this slot to install duplicate watchpoints (same address but different byte or word) ClearAllHardwareWatchpoint -- Just clear the whole watchpoint cache and call SetHardwareWatchpoint for all available watchpoints. NativeThreadLinux: On Watchpoint Remove -> Invalidate watchpoint cache On Resume - > Re-validate watchpoints by creating a new cache and re-enabling all watchpoints So this fixes our step-over issue and also preserves watchpoint slot if it is being used by multiple watchpoints. Can you think of any scenarios which might fail for this approach? https://reviews.llvm.org/D24610 Files: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/Makefile packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/TestWatchpointMultipleSlots.py packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp source/Plugins/Process/Linux/NativeThreadLinux.cpp source/Plugins/Process/Linux/NativeThreadLinux.h Index: source/Plugins/Process/Linux/NativeThreadLinux.h === --- source/Plugins/Process/Linux/NativeThreadLinux.h +++ source/Plugins/Process/Linux/NativeThreadLinux.h @@ -108,6 +108,7 @@ using WatchpointIndexMap = std::map; WatchpointIndexMap m_watchpoint_index_map; cpu_set_t m_original_cpu_set; // For single-step workaround. + bool m_invalidate_watchpoints; }; typedef std::shared_ptr NativeThreadLinuxSP; Index: source/Plugins/Process/Linux/NativeThreadLinux.cpp === --- source/Plugins/Process/Linux/NativeThreadLinux.cpp +++ source/Plugins/Process/Linux/NativeThreadLinux.cpp @@ -87,7 +87,8 @@ NativeThreadLinux::NativeThreadLinux(NativeProcessLinux *process, lldb::tid_t tid) : NativeThreadProtocol(process, tid), m_state(StateType::eStateInvalid), - m_stop_info(), m_reg_context_sp(), m_stop_description() {} + m_stop_info(), m_reg_context_sp(), m_stop_description(), + m_invalidate_watchpoints(false) {} std::string NativeThreadLinux::GetName() { NativeProcessProtocolSP process_sp = m_process_wp.lock(); @@ -185,8 +186,10 @@ return Error(); uint32_t wp_index = wp->second; m_watchpoint_index_map.erase(wp); - if (GetRegisterContext()->ClearHardwareWatchpoint(wp_index)) + if (GetRegisterContext()->ClearHardwareWatchpoint(wp_index)) { +m_invalidate_watchpoints = true; return Error(); + } return Error("Clearing hardware watchpoint failed."); } @@ -198,8 +201,13 @@ m_stop_info.reason = StopReason::eStopReasonNone; m_stop_description.clear(); - // If watchpoints have been set, but none on this thread, - // then this is a new thread. So set all existing watchpoints. + // Invalidate watchpoint index map for a re-sync + if (m_invalidate_watchpoints) { +m_invalidate_watchpoints = false; +m_watchpoint_index_map.clear(); + } + + // Re-sync all available watchpoints. if (m_watchpoint_index_map.empty()) { NativeProcessLinux &process = GetProcess(); Index: source/Plugins/Process/Linux/NativeRegisterContextLinux_ar
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
labath requested changes to this revision. labath added a comment. This revision now requires changes to proceed. In https://reviews.llvm.org/D24610#553331, @omjavaid wrote: > This is a new version of what seems to me fully implementing functionality we > intend to have here. > > On a second thought nuking ClearHardwareWatchpoint function seems to be the > wrong approach here. I spent some time taking different approaches and it > turns out that if we do not ClearHardwareWatchpoint when back-end asks us to > remove it then we wont be able to step over watchpoints. On ARM targets we > have to first clear and then reinstall watchpoints to step over the > watchpoint instruction. > > On the other hand if we call > NativeRegisterContextLinux_arm::ClearHardwareWatchpoint then that watchpoint > stands removed if call is just to delete watch on one of the bytes. And if we > follow up with creating a new watchpoint on a different word the slot being > used may appear vaccant which is actually inconsistent behavior. > > So I have a new approach that does clear watchpoint registers if > NativeRegisterContextLinux_arm::ClearHardwareWatchpoint is called but we > still track reference counts by re-introducing refcount that I removed in my > last patch. This will mean that a follow up create may fail just because > there are still references to disabled watchpoint and watchpoint slots are > still not vaccant. I have made changes to the test to reflect this behaviour. > > Please comment if you have any reservation about this approach. Hmm I am indeed starting to have serious reservations about this. The more I think about this, the more it starts to look like a big hack. So, now ClearHardwareWatchpoint still maintains a refcount on the number of users of the watchpoint slot, but it disables the slot everytime the slot usage count is decremented ? (as opposed to when the refcount reaches zero). And this is supposed to be the reason that step-over watchpoint (a pretty critical piece of functionality) works ? And the reason why we are still able to do the watchpoints is that before a continue (but not before a single-step, because that would break the previous item) we nuke the watchpoint registers (and their reference counts) and start over? I am not convinced that having watchpoint slot sharing is important enough to balance the amount of technical debt this introduces. https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
omjavaid updated this revision to Diff 72589. omjavaid added a comment. This is a new version of what seems to me fully implementing functionality we intend to have here. On a second thought nuking ClearHardwareWatchpoint function seems to be the wrong approach here. I spent some time taking different approaches and it turns out that if we do not ClearHardwareWatchpoint when back-end asks us to remove it then we wont be able to step over watchpoints. On ARM targets we have to first clear and then reinstall watchpoints to step over the watchpoint instruction. On the other hand if we call NativeRegisterContextLinux_arm::ClearHardwareWatchpoint then that watchpoint stands removed if call is just to delete watch on one of the bytes. And if we follow up with creating a new watchpoint on a different word the slot being used may appear vaccant which is actually inconsistent behavior. So I have a new approach that does clear watchpoint registers if NativeRegisterContextLinux_arm::ClearHardwareWatchpoint is called but we still track reference counts by re-introducing refcount that I removed in my last patch. This will mean that a follow up create may fail just because there are still references to disabled watchpoint and watchpoint slots are still not vaccant. I have made changes to the test to reflect this behaviour. Please comment if you have any reservation about this approach. https://reviews.llvm.org/D24610 Files: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/Makefile packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/TestWatchpointMultipleSlots.py packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp source/Plugins/Process/Linux/NativeThreadLinux.cpp Index: source/Plugins/Process/Linux/NativeThreadLinux.cpp === --- source/Plugins/Process/Linux/NativeThreadLinux.cpp +++ source/Plugins/Process/Linux/NativeThreadLinux.cpp @@ -198,8 +198,10 @@ m_stop_info.reason = StopReason::eStopReasonNone; m_stop_description.clear(); - // If watchpoints have been set, but none on this thread, - // then this is a new thread. So set all existing watchpoints. + // Invalidate watchpoint index map for a re-sync + m_watchpoint_index_map.clear(); + + // Re-sync all available watchpoints. if (m_watchpoint_index_map.empty()) { NativeProcessLinux &process = GetProcess(); Index: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp === --- source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp +++ source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp @@ -508,8 +508,19 @@ if (error.Fail()) return LLDB_INVALID_INDEX32; - uint32_t control_value = 0, wp_index = 0, addr_word_offset = 0, byte_mask = 0; + uint32_t control_value = 0; lldb::addr_t real_addr = addr; + uint32_t wp_index = LLDB_INVALID_INDEX32; + + // Find out how many bytes we need to watch after 4-byte alignment boundary. + uint8_t watch_size = (addr & 0x03) + size; + + // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary. + if (size == 0 || watch_size > 4) +return LLDB_INVALID_INDEX32; + + // Strip away last two bits of address for byte/half-word/word selection. + addr &= ~((lldb::addr_t)3); // Check if we are setting watchpoint other than read/write/access // Also update watchpoint flag to match Arm write-read bit configuration. @@ -526,86 +537,63 @@ return LLDB_INVALID_INDEX32; } - // Can't watch zero bytes - // Can't watch more than 4 bytes per WVR/WCR pair + // Iterate over stored watchpoints and find a free or duplicate wp_index + for (uint32_t i = 0; i < m_max_hwp_supported; i++) { +if ((m_hwp_regs[i].control & 1) == 0 && (m_hwp_regs[i].refcount <= 0)) { + wp_index = i; // Mark last free slot +} else if (m_hwp_regs[i].address == addr) { + wp_index = i; // Mark duplicate index + break;// Stop searching here +} + } - if (size == 0 || size > 4) + // No vaccant slot available and no duplicate slot found. + if (wp_index == LLDB_INVALID_INDEX32) return LLDB_INVALID_INDEX32; - // Check 4-byte alignment for hardware watchpoint target address. - // Below is a hack to recalculate address and size in order to - // make sure we can watch non 4-byte alligned addresses as well. - if (addr & 0x03) { -uint8_t watch_mask = (addr & 0x03) + size; - -if (watch_mask > 0x04) - return LLDB_INVALID_INDEX32; -else if (watch_mask <= 0x02) - size = 2; -else if (watch_mask <= 0x04) - size = 4; - -addr = addr & (~0x03); + uint8_t current_watch_size, new_watch_size; + // Calculate overall size width to be watched by current hardware watchpoint slot. + current_watch_size =
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
labath accepted this revision. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521 @@ -513,1 +512,11 @@ + + // Find out how many bytes we need to watch after 4-byte alignment boundary. + uint8_t watch_size = (addr & 0x03) + size; + + // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary. + if (size == 0 || watch_size > 4) +return LLDB_INVALID_INDEX32; + + // Strip away last two bits of address for byte/half-word/word selection. + addr &= ~((lldb::addr_t)3); omjavaid wrote: > labath wrote: > > zturner wrote: > > > This block of code is a bit confusing to me. Is this equivalent to: > > > > > > ``` > > > lldb::addr_t start = llvm::alignDown(addr, 4); > > > lldb::addr_t end = addr + size; > > > if (start == end || (end-start)>4) > > > return LLDB_INVALID_INDEX32; > > > ``` > > I am not sure this is much clearer, especially, as we will later need a > > separate varaible for `end-start` anyway. > > > > +1 for `llvm::alignDown` though. > There is significant performance difference when we choose addr = addr & > (~0x03); over llvm::alignDown > > It will eventually effect responsiveness if we keep increasing code size like > this. > > Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I > havnt tried clang though. > > Instructions needed for addr = addr & (~0x03): > 4008bd: 48 8b 45 e8 mov-0x18(%rbp),%rax > 4008c1: 48 83 e0 fc and$0xfffc,%rax > 4008c5: 48 89 45 e8 mov%rax,-0x18(%rbp) > > Call penalty for llvm::alignDown > 400918: ba 00 00 00 00 mov$0x0,%edx > 40091d: 48 89 cemov%rcx,%rsi > 400920: 48 89 c7mov%rax,%rdi > 400923: e8 ae 00 00 00 callq 4009d6 <_Z9alignDownmmm> > 400928: 49 89 c4mov%rax,%r12 > 40092b: 48 8b 5d d8 mov-0x28(%rbp),%rbx > > Disassembly for llvm::alignDown > 004009d6 <_Z9alignDownmmm>: > 4009d6: 55 push %rbp > 4009d7: 48 89 e5mov%rsp,%rbp > 4009da: 48 89 7d f8 mov%rdi,-0x8(%rbp) > 4009de: 48 89 75 f0 mov%rsi,-0x10(%rbp) > 4009e2: 48 89 55 e8 mov%rdx,-0x18(%rbp) > 4009e6: 48 8b 45 e8 mov-0x18(%rbp),%rax > 4009ea: ba 00 00 00 00 mov$0x0,%edx > 4009ef: 48 f7 75 f0 divq -0x10(%rbp) > 4009f3: 48 89 55 e8 mov%rdx,-0x18(%rbp) > 4009f7: 48 8b 45 f8 mov-0x8(%rbp),%rax > 4009fb: 48 2b 45 e8 sub-0x18(%rbp),%rax > 4009ff: ba 00 00 00 00 mov$0x0,%edx > 400a04: 48 f7 75 f0 divq -0x10(%rbp) > 400a08: 48 0f af 45 f0 imul -0x10(%rbp),%rax > 400a0d: 48 89 c2mov%rax,%rdx > 400a10: 48 8b 45 e8 mov-0x18(%rbp),%rax > 400a14: 48 01 d0add%rdx,%rax > 400a17: 5d pop%rbp > 400a18: c3 retq > 400a19: 0f 1f 80 00 00 00 00nopl 0x0(%rax) > > Number of instructions generated for alignDown with gcc -Os > > 400892: 48 8b 6c 24 08 mov0x8(%rsp),%rbp > 400897: 48 8b 4c 24 10 mov0x10(%rsp),%rcx > 40089c: 31 d2 xor%edx,%edx > 40089e: be c2 0a 40 00 mov$0x400ac2,%esi > 4008a3: bf a0 11 60 00 mov$0x6011a0,%edi > 4008a8: 48 89 e8mov%rbp,%rax > 4008ab: 48 f7 f1div%rcx > 4008ae: 48 0f af c1 imul %rcx,%rax > 4008b2: 48 89 c3mov%rax,%rbx I tried this with clang-3.6 -O3. The entire alignDown function call compiled down to `andq $-4, %rdi`. I'll leave this up to you, as I think it is readable enough right now, but I don't think we should be afraid of using utility functions like this. I think LLVM cares a lot more about performance than we, so I believe we can rely on them knowing what they're doing. Also we have much bigger higher-level issues affecting performance, the least of which is the change below, where you re-calculate all watchpoints on every resume. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552 @@ -559,1 +551,3 @@ + uint8_t current_watch_size, new_watch_size; + // Calculate overall size width to be watched by current hardware watchpoint slot. omjavaid wrote: > labath wrote: > > Looks much better. Any reason for not using `NextPowerOf2` ? Among other > > things, it is self-documenting, so you do not need the comment above that. > so llvm::NextPowerOf2 doesnt serve the intended behaviour. > > llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
omjavaid added a comment. Answers to comments. I will upload a updated patch after corrections and updates. Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23 @@ +22,3 @@ +{ +printf("About to write byteArray[%d] ...\n", i); // About to write byteArray + labath wrote: > zturner wrote: > > What's up with all the double spaced source code? Is this intentional? > Indeed. The spaces seem superfluous. Agreed. Will be removed in next iteration. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521 @@ -513,1 +512,11 @@ + + // Find out how many bytes we need to watch after 4-byte alignment boundary. + uint8_t watch_size = (addr & 0x03) + size; + + // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary. + if (size == 0 || watch_size > 4) +return LLDB_INVALID_INDEX32; + + // Strip away last two bits of address for byte/half-word/word selection. + addr &= ~((lldb::addr_t)3); labath wrote: > zturner wrote: > > This block of code is a bit confusing to me. Is this equivalent to: > > > > ``` > > lldb::addr_t start = llvm::alignDown(addr, 4); > > lldb::addr_t end = addr + size; > > if (start == end || (end-start)>4) > > return LLDB_INVALID_INDEX32; > > ``` > I am not sure this is much clearer, especially, as we will later need a > separate varaible for `end-start` anyway. > > +1 for `llvm::alignDown` though. There is significant performance difference when we choose addr = addr & (~0x03); over llvm::alignDown It will eventually effect responsiveness if we keep increasing code size like this. Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I havnt tried clang though. Instructions needed for addr = addr & (~0x03): 4008bd: 48 8b 45 e8 mov-0x18(%rbp),%rax 4008c1: 48 83 e0 fc and$0xfffc,%rax 4008c5: 48 89 45 e8 mov%rax,-0x18(%rbp) Call penalty for llvm::alignDown 400918: ba 00 00 00 00 mov$0x0,%edx 40091d: 48 89 cemov%rcx,%rsi 400920: 48 89 c7mov%rax,%rdi 400923: e8 ae 00 00 00 callq 4009d6 <_Z9alignDownmmm> 400928: 49 89 c4mov%rax,%r12 40092b: 48 8b 5d d8 mov-0x28(%rbp),%rbx Disassembly for llvm::alignDown 004009d6 <_Z9alignDownmmm>: 4009d6: 55 push %rbp 4009d7: 48 89 e5mov%rsp,%rbp 4009da: 48 89 7d f8 mov%rdi,-0x8(%rbp) 4009de: 48 89 75 f0 mov%rsi,-0x10(%rbp) 4009e2: 48 89 55 e8 mov%rdx,-0x18(%rbp) 4009e6: 48 8b 45 e8 mov-0x18(%rbp),%rax 4009ea: ba 00 00 00 00 mov$0x0,%edx 4009ef: 48 f7 75 f0 divq -0x10(%rbp) 4009f3: 48 89 55 e8 mov%rdx,-0x18(%rbp) 4009f7: 48 8b 45 f8 mov-0x8(%rbp),%rax 4009fb: 48 2b 45 e8 sub-0x18(%rbp),%rax 4009ff: ba 00 00 00 00 mov$0x0,%edx 400a04: 48 f7 75 f0 divq -0x10(%rbp) 400a08: 48 0f af 45 f0 imul -0x10(%rbp),%rax 400a0d: 48 89 c2mov%rax,%rdx 400a10: 48 8b 45 e8 mov-0x18(%rbp),%rax 400a14: 48 01 d0add%rdx,%rax 400a17: 5d pop%rbp 400a18: c3 retq 400a19: 0f 1f 80 00 00 00 00nopl 0x0(%rax) Number of instructions generated for alignDown with gcc -Os 400892: 48 8b 6c 24 08 mov0x8(%rsp),%rbp 400897: 48 8b 4c 24 10 mov0x10(%rsp),%rcx 40089c: 31 d2 xor%edx,%edx 40089e: be c2 0a 40 00 mov$0x400ac2,%esi 4008a3: bf a0 11 60 00 mov$0x6011a0,%edi 4008a8: 48 89 e8mov%rbp,%rax 4008ab: 48 f7 f1div%rcx 4008ae: 48 0f af c1 imul %rcx,%rax 4008b2: 48 89 c3mov%rax,%rbx Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552 @@ -559,1 +551,3 @@ + uint8_t current_watch_size, new_watch_size; + // Calculate overall size width to be watched by current hardware watchpoint slot. labath wrote: > Looks much better. Any reason for not using `NextPowerOf2` ? Among other > things, it is self-documenting, so you do not need the comment above that. so llvm::NextPowerOf2 doesnt serve the intended behaviour. llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4. We just want to make sure that our new_size is a power of 2 if its already not which will only be the case if size turns out to be 3.
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
labath added inline comments. Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23 @@ +22,3 @@ +{ +printf("About to write byteArray[%d] ...\n", i); // About to write byteArray + zturner wrote: > What's up with all the double spaced source code? Is this intentional? Indeed. The spaces seem superfluous. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521 @@ -513,1 +512,11 @@ + + // Find out how many bytes we need to watch after 4-byte alignment boundary. + uint8_t watch_size = (addr & 0x03) + size; + + // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary. + if (size == 0 || watch_size > 4) +return LLDB_INVALID_INDEX32; + + // Strip away last two bits of address for byte/half-word/word selection. + addr &= ~((lldb::addr_t)3); zturner wrote: > This block of code is a bit confusing to me. Is this equivalent to: > > ``` > lldb::addr_t start = llvm::alignDown(addr, 4); > lldb::addr_t end = addr + size; > if (start == end || (end-start)>4) > return LLDB_INVALID_INDEX32; > ``` I am not sure this is much clearer, especially, as we will later need a separate varaible for `end-start` anyway. +1 for `llvm::alignDown` though. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552 @@ -559,1 +551,3 @@ + uint8_t current_watch_size, new_watch_size; + // Calculate overall size width to be watched by current hardware watchpoint slot. Looks much better. Any reason for not using `NextPowerOf2` ? Among other things, it is self-documenting, so you do not need the comment above that. Comment at: source/Plugins/Process/Linux/NativeThreadLinux.cpp:202 @@ +201,3 @@ + // Invalidate watchpoint index map to re-sync watchpoint registers. + m_watchpoint_index_map.clear(); + If you add this, then the comment below becomes obsolete. Seems like a pretty elegant solution to the incremental watchpoint update problem. I am wondering whether we need to do it on every resume though. I think it should be enough to do it when a watchpoint gets deleted (`NTL::RemoveWatchpoint`). Also, we should throw out the implementation of `NativeRegisterContextLinux_arm::ClearHardwareWatchpoint` -- it's no longer necessary, and it's not even correct anymore. https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
zturner added a subscriber: zturner. Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23 @@ +22,3 @@ +{ +printf("About to write byteArray[%d] ...\n", i); // About to write byteArray + What's up with all the double spaced source code? Is this intentional? Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521 @@ -513,1 +512,11 @@ + + // Find out how many bytes we need to watch after 4-byte alignment boundary. + uint8_t watch_size = (addr & 0x03) + size; + + // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary. + if (size == 0 || watch_size > 4) +return LLDB_INVALID_INDEX32; + + // Strip away last two bits of address for byte/half-word/word selection. + addr &= ~((lldb::addr_t)3); This block of code is a bit confusing to me. Is this equivalent to: ``` lldb::addr_t start = llvm::alignDown(addr, 4); lldb::addr_t end = addr + size; if (start == end || (end-start)>4) return LLDB_INVALID_INDEX32; ``` https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
omjavaid updated this revision to Diff 71633. omjavaid added a comment. Herald added subscribers: srhines, danalbert, tberghammer. I have added a new test case that tests suggested scnario without changing any previous test cases. Also I have made sure we re validate all watchpoint installed on thread resume to make sure we have the latest values assigned to hardware watchpoint registers. This passes on ARM (RaspberryPi3, Samsung Chromebook). I have not yet tested on android. This will fail on targets which dont support multiple watchpoint slots. Also this should fail on AArch64 which I am currently working on. https://reviews.llvm.org/D24610 Files: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/Makefile packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/TestWatchpointMultipleSlots.py packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp source/Plugins/Process/Linux/NativeThreadLinux.cpp Index: source/Plugins/Process/Linux/NativeThreadLinux.cpp === --- source/Plugins/Process/Linux/NativeThreadLinux.cpp +++ source/Plugins/Process/Linux/NativeThreadLinux.cpp @@ -198,6 +198,9 @@ m_stop_info.reason = StopReason::eStopReasonNone; m_stop_description.clear(); + // Invalidate watchpoint index map to re-sync watchpoint registers. + m_watchpoint_index_map.clear(); + // If watchpoints have been set, but none on this thread, // then this is a new thread. So set all existing watchpoints. if (m_watchpoint_index_map.empty()) { Index: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp === --- source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp +++ source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp @@ -410,15 +410,13 @@ if ((m_hbr_regs[bp_index].control & 1) == 0) { m_hbr_regs[bp_index].address = addr; m_hbr_regs[bp_index].control = control_value; -m_hbr_regs[bp_index].refcount = 1; // PTRACE call to set corresponding hardware breakpoint register. error = WriteHardwareDebugRegs(eDREGTypeBREAK, bp_index); if (error.Fail()) { m_hbr_regs[bp_index].address = 0; m_hbr_regs[bp_index].control &= ~1; - m_hbr_regs[bp_index].refcount = 0; return LLDB_INVALID_INDEX32; } @@ -508,8 +506,19 @@ if (error.Fail()) return LLDB_INVALID_INDEX32; - uint32_t control_value = 0, wp_index = 0, addr_word_offset = 0, byte_mask = 0; + uint32_t control_value = 0; lldb::addr_t real_addr = addr; + uint32_t wp_index = LLDB_INVALID_INDEX32; + + // Find out how many bytes we need to watch after 4-byte alignment boundary. + uint8_t watch_size = (addr & 0x03) + size; + + // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary. + if (size == 0 || watch_size > 4) +return LLDB_INVALID_INDEX32; + + // Strip away last two bits of address for byte/half-word/word selection. + addr &= ~((lldb::addr_t)3); // Check if we are setting watchpoint other than read/write/access // Also update watchpoint flag to match Arm write-read bit configuration. @@ -526,86 +535,58 @@ return LLDB_INVALID_INDEX32; } - // Can't watch zero bytes - // Can't watch more than 4 bytes per WVR/WCR pair - - if (size == 0 || size > 4) -return LLDB_INVALID_INDEX32; - - // Check 4-byte alignment for hardware watchpoint target address. - // Below is a hack to recalculate address and size in order to - // make sure we can watch non 4-byte alligned addresses as well. - if (addr & 0x03) { -uint8_t watch_mask = (addr & 0x03) + size; - -if (watch_mask > 0x04) - return LLDB_INVALID_INDEX32; -else if (watch_mask <= 0x02) - size = 2; -else if (watch_mask <= 0x04) - size = 4; - -addr = addr & (~0x03); + // Iterate over stored watchpoints and find a free or duplicate wp_index + for (uint32_t i = 0; i < m_max_hwp_supported; i++) { +if ((m_hwp_regs[i].control & 1) == 0) { + wp_index = i; // Mark last free slot +} else if (m_hwp_regs[i].address == addr) { + wp_index = i; // Mark duplicate index + break;// Stop searching here +} } - // We can only watch up to four bytes that follow a 4 byte aligned address - // per watchpoint register pair, so make sure we can properly encode this. - addr_word_offset = addr % 4; - byte_mask = ((1u << size) - 1u) << addr_word_offset; - - // Check if we need multiple watchpoint register - if (byte_mask > 0xfu) + // No vaccant slot available and no duplicate slot found. + if (wp_index == LLDB_INVALID_INDEX32) return LLDB_INVALID_INDEX32; + uint8_t current_watch_size, new_watch_size; + // Calculate overall size width to be watched by current hardware watchpoint slot. + current_watch_
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
omjavaid added a comment. comments inline. Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/watchpoint_size/TestWatchpointSizes.py:43 @@ -42,2 +42,3 @@ """Test to selectively watch different bytes in a 8-byte array.""" -self.run_watchpoint_size_test('byteArray', 8, '1') +if self.getArchitecture() in ['arm']: +self.run_watchpoint_size_test('byteArray', 8, '1', 1) labath wrote: > It's not clear to me why you need to modify the existing test for this > change. You are adding functionality, so all existing tests should pass as-is > (which will also validate that your change did not introduce regressions). As we keep on adding these tests its increasing our overall testing time. I just thought using the same test with some changes will cover the cases we want to test. Anyways we can write separate tests as well. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:556 @@ +555,3 @@ +uint32_t current_size = GetWatchpointSize(wp_index); +if ((current_size == 4) || (current_size == 2 && watch_mask <= 2) || +(current_size == 1 && watch_mask == 1)) labath wrote: > The logic here is extremely convoluted. Doesn't this code basically boil down > to: > ``` > current_size = m_hwp_regs[wp_index].control & 1 ? GetWatchpointSize(wp_index) > : 0; > new_size = llvm::NextPowerOf2(std::max(current_size, watch_mask)); > // update the control value, write the debug registers... > ``` > > Also `watch_mask` should probably be renamed to `watch_size`, as it doesn't > appear to be a mask. Seems legit. I ll update this in next patch. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:602 @@ -612,3 +601,3 @@ bool NativeRegisterContextLinux_arm::ClearHardwareWatchpoint( uint32_t wp_index) { labath wrote: > This looks a bit worrying. > What will happen after the following sequence of events: > - client tells us to set a watchpoint at 0x1000 > - we set the watchpoint > - client tells us to set a watchpoint at 0x1001 > - we extend the previous watchpoint to watch this address as well > - client tells us to delete the watchpoint at 0x1000 > - ??? > > Will we remain watching the address 0x1001? I don't see how will you be able > to do that without maintaining a some info about the original watchpoints the > client requested (and I have not seen that code). Please add a test for this. I just realized that I missed a crucial change in my last patch that we need to make all this work. Let me get back with correction and desired test cases. https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
clayborg requested changes to this revision. clayborg added a comment. Great fix. Just fix the testing so that it isn't ARM specific. There shouldn't be any: if self.getArchitecture() in ['arm']: do arm stuff else: do non arm stuff Also we will need to be able to test the set watch at 0x1000, then at 0x1001 by sharing, clear 0x1000 make sure 0x1001 still works, etc. https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
labath requested changes to this revision. labath added a comment. This revision now requires changes to proceed. I have some doubts about the validity of this patch. We should make sure those are cleared before putting this in. Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/watchpoint_size/TestWatchpointSizes.py:43 @@ -42,2 +42,3 @@ """Test to selectively watch different bytes in a 8-byte array.""" -self.run_watchpoint_size_test('byteArray', 8, '1') +if self.getArchitecture() in ['arm']: +self.run_watchpoint_size_test('byteArray', 8, '1', 1) It's not clear to me why you need to modify the existing test for this change. You are adding functionality, so all existing tests should pass as-is (which will also validate that your change did not introduce regressions). Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/watchpoint_size/TestWatchpointSizes.py:154 @@ +153,3 @@ +# Continue after hitting watchpoint twice (Read/Write) +if slots != 0: +self.runCmd("process continue") It looks like this test will test something completely different on arm than on other architectures. You would probably be better off writing a new test for this. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:556 @@ +555,3 @@ +uint32_t current_size = GetWatchpointSize(wp_index); +if ((current_size == 4) || (current_size == 2 && watch_mask <= 2) || +(current_size == 1 && watch_mask == 1)) The logic here is extremely convoluted. Doesn't this code basically boil down to: ``` current_size = m_hwp_regs[wp_index].control & 1 ? GetWatchpointSize(wp_index) : 0; new_size = llvm::NextPowerOf2(std::max(current_size, watch_mask)); // update the control value, write the debug registers... ``` Also `watch_mask` should probably be renamed to `watch_size`, as it doesn't appear to be a mask. Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:602 @@ -612,3 +601,3 @@ bool NativeRegisterContextLinux_arm::ClearHardwareWatchpoint( uint32_t wp_index) { This looks a bit worrying. What will happen after the following sequence of events: - client tells us to set a watchpoint at 0x1000 - we set the watchpoint - client tells us to set a watchpoint at 0x1001 - we extend the previous watchpoint to watch this address as well - client tells us to delete the watchpoint at 0x1000 - ??? Will we remain watching the address 0x1001? I don't see how will you be able to do that without maintaining a some info about the original watchpoints the client requested (and I have not seen that code). Please add a test for this. https://reviews.llvm.org/D24610 ___ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits