Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-28 Thread Pavel Labath via lldb-commits
labath added a comment.

In https://reviews.llvm.org/D24610#554587, @omjavaid wrote:

> Give this approach a rethink I dont see a lot of problems with this final 
> implementation unless it fails on other architectures.
>  We are already hacking our way to have these byte selection watchpoints 
> working in existing code. New code seems to be improving the hack in my 
> opinion.


- I don't believe that the presence of one hack justifies the addition of 
another. If anything, then it's the opposite.
- The fact that we have to fiddle with the byte selects to get the kernel to 
accept our watchpoints could be thought of as a "hack", but I think there it's 
justified, as otherwise we would be unable to watch anything that isn't 
dword-aligned. And most importantly, the issue is contained within the 
SetHardwareWatchpoint function -- noone outside of that needs to know about the 
fact that we are doing something funny with byte-selects. This is not the case 
here, as I explain below.

> Can you think of any scenarios which might fail for this approach?


I can. Try the following sequence of commands:

- position the PC on an instruction that will write to address A (aligned)
- set a byte watchpoint on A+0
- set a byte watchpoint on A+1
- clear the second watchpoint
- single-step

What will happen with your patch applied? (*) Nothing, as the watchpoint will 
not get hit because you disabled it. Now this is a pretty dubious example, but 
I think it demonstrates, the problems I have with this solution very nicely.

**You are lying to the client about which watchpoints are enabled**. The client 
asked you to remove just one watchpoint, so as far as he is concerned the other 
one is still active. But you have disabled the other one as well. And now you 
make this a feature, as without it single stepping a multi-watchpoint will not 
work. And you are here relying on the fact that the client implements the 
step-over-watchpoint as a simple $z2, $s, $Z2 packet sequence. If the client 
decides to do anything more complicated, say evaluate an expression to get some 
state before the update, then your feature will break, as an intermediate $c, 
will reinstate your hidden watchpoint (in fact, this is maybe possible already, 
if a conditional breakpoint and a watchpoint are hit simultaneously). And you 
don't even fix the step-over-instruction-triggering-multiple-watchpoints 
problem definitively - this can still happen e.g., if you have a watch on 
0x1000 and 0x1008, and a single instruction triggers both.(**)

This is why I think this change is bad, as it is making an invisible line 
between the lowest levels of the server and the highest levels of the client, 
which now have to be kept in sync, otherwise things will unexpectedly break. I 
don't believe that having the ability to watch more than 4 locations at once is 
worth it, particularly when it can be easily worked around by the user (just 
set one big watchpoint instead of multiple small ones). Do you have a specific 
use case where this would be necessary/useful? I personally have never used 
more than one watchpoint per debug session in my life, and I expect this to be 
true for most people (I think the typical use case is "who is corrupting my 
variable?"), So, I believe that it is better to have support for fewer 
watchpoints, but have it working well, than the opposite.

(*) Now when I tried this out on lldb without this change, it still did not 
work as expected - for some reason, after hitting and stepping over the 
watchpoint, lldb decided to to issue $c, and lost control of the inferior. I 
think that tracking this issue and fixing it would have more impact than adding 
the multi-watchpoint support (and it will reduce the number of corner cases, 
where we are wrong rather than increasing it).

(**) Another issue whose solving would have more impact than this.


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-27 Thread Muhammad Omair Javaid via lldb-commits
omjavaid updated this revision to Diff 72723.
omjavaid added a comment.

Give this approach a rethink I dont see a lot of problems with this final 
implementation unless it fails on other architectures.
We are already hacking our way to have these byte selection watchpoints working 
in existing code. New code seems to be improving the hack in my opinion.

Let me explain what I am doing and may be you can provide your suggestion and 
feedback.

Watchpoint Install => Register watchpoint into hardware watchpoint register 
cache
Watchpoint Enable => Enable in cache and write registers using ptrace interface.

Ideally we should be able to install/uninstall watchpoints in cache and then 
enable them all on a resume.
In case of arm If a watchpoint is hit we should be able to disable that 
watchpoint. 
Step over watchpoint instruction and then re-enable the watchpoint.
Our existing implementation will require a lot of changes to do that so thats 
why here is what i am doing.

SetHardwareWatchpoint => Performs Install and Enable

- If a new watchpoint slot is going to be used we Install and enable.
- For new watchpoint we should be able to complete both Install and or we 
report an error.
- If a duplicate slot is going to be used we Install and enable if required.
- Install means updating size if needed plus updating ref count.
- Enable means updating registers if size was updated.

ClearHardwareWatchpoint

- Disable and uinstall watchpoint means
- Decrement ref count and clear hardware watchpoint regsiters.
- Advantage of keeping ref counts is:
- If refcount is greater than zero then SetHardwareWatchpoint cannot use this 
slot for a new watchpoint (new address).
- But SetHardwareWatchpoint can be use this slot to install duplicate 
watchpoints (same address but different byte or word)

ClearAllHardwareWatchpoint -- Just clear the whole watchpoint cache and call 
SetHardwareWatchpoint for all available watchpoints.

NativeThreadLinux:

  On Watchpoint Remove -> Invalidate watchpoint cache
  On Resume - > Re-validate watchpoints by creating a new cache and re-enabling 
all watchpoints

So this fixes our step-over issue and also preserves watchpoint slot if it is 
being used by multiple watchpoints.

Can you think of any scenarios which might fail for this approach?


https://reviews.llvm.org/D24610

Files:
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/Makefile
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/TestWatchpointMultipleSlots.py
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c
  source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
  source/Plugins/Process/Linux/NativeThreadLinux.cpp
  source/Plugins/Process/Linux/NativeThreadLinux.h

Index: source/Plugins/Process/Linux/NativeThreadLinux.h
===
--- source/Plugins/Process/Linux/NativeThreadLinux.h
+++ source/Plugins/Process/Linux/NativeThreadLinux.h
@@ -108,6 +108,7 @@
   using WatchpointIndexMap = std::map;
   WatchpointIndexMap m_watchpoint_index_map;
   cpu_set_t m_original_cpu_set; // For single-step workaround.
+  bool m_invalidate_watchpoints;
 };
 
 typedef std::shared_ptr NativeThreadLinuxSP;
Index: source/Plugins/Process/Linux/NativeThreadLinux.cpp
===
--- source/Plugins/Process/Linux/NativeThreadLinux.cpp
+++ source/Plugins/Process/Linux/NativeThreadLinux.cpp
@@ -87,7 +87,8 @@
 NativeThreadLinux::NativeThreadLinux(NativeProcessLinux *process,
  lldb::tid_t tid)
 : NativeThreadProtocol(process, tid), m_state(StateType::eStateInvalid),
-  m_stop_info(), m_reg_context_sp(), m_stop_description() {}
+  m_stop_info(), m_reg_context_sp(), m_stop_description(),
+  m_invalidate_watchpoints(false) {}
 
 std::string NativeThreadLinux::GetName() {
   NativeProcessProtocolSP process_sp = m_process_wp.lock();
@@ -185,8 +186,10 @@
 return Error();
   uint32_t wp_index = wp->second;
   m_watchpoint_index_map.erase(wp);
-  if (GetRegisterContext()->ClearHardwareWatchpoint(wp_index))
+  if (GetRegisterContext()->ClearHardwareWatchpoint(wp_index)) {
+m_invalidate_watchpoints = true;
 return Error();
+  }
   return Error("Clearing hardware watchpoint failed.");
 }
 
@@ -198,8 +201,13 @@
   m_stop_info.reason = StopReason::eStopReasonNone;
   m_stop_description.clear();
 
-  // If watchpoints have been set, but none on this thread,
-  // then this is a new thread. So set all existing watchpoints.
+  // Invalidate watchpoint index map for a re-sync
+  if (m_invalidate_watchpoints) {
+m_invalidate_watchpoints = false;
+m_watchpoint_index_map.clear();
+  }
+
+  // Re-sync all available watchpoints.
   if (m_watchpoint_index_map.empty()) {
 NativeProcessLinux &process = GetProcess();
 
Index: source/Plugins/Process/Linux/NativeRegisterContextLinux_ar

Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-27 Thread Pavel Labath via lldb-commits
labath requested changes to this revision.
labath added a comment.
This revision now requires changes to proceed.

In https://reviews.llvm.org/D24610#553331, @omjavaid wrote:

> This is a new version of what seems to me fully implementing functionality we 
> intend to have here.
>
> On a second thought nuking ClearHardwareWatchpoint function seems to be the 
> wrong approach here. I spent some time taking different approaches and it 
> turns out that if we do not ClearHardwareWatchpoint when back-end asks us to 
> remove it then we wont be able to step over watchpoints. On ARM targets we 
> have to first clear and then reinstall watchpoints to step over the 
> watchpoint instruction.
>
> On the other hand if we call 
> NativeRegisterContextLinux_arm::ClearHardwareWatchpoint then that watchpoint 
> stands removed if call is just to delete watch on one of the bytes. And if we 
> follow up with creating a new watchpoint on a different word the slot being 
> used may appear vaccant which is actually inconsistent behavior.
>
> So I have a new approach that does clear watchpoint registers if 
> NativeRegisterContextLinux_arm::ClearHardwareWatchpoint is called but we 
> still track reference counts by re-introducing refcount that I removed in my 
> last patch. This will mean that a follow up create may fail just because 
> there are still references to disabled watchpoint and watchpoint slots are 
> still not vaccant. I have made changes to the test to reflect this behaviour.
>
> Please comment if you have any reservation about this approach.


Hmm I am indeed starting to have serious reservations about this.

The more I think about this, the more it starts to look like a big hack. So, 
now ClearHardwareWatchpoint still maintains a refcount on the number of users 
of the watchpoint slot, but it disables the slot everytime the slot usage count 
is decremented ? (as opposed to when the refcount reaches zero). And this is 
supposed to be the reason that step-over watchpoint (a pretty critical piece of 
functionality) works ? And the reason why we are still able to do the 
watchpoints is that before a continue (but not before a single-step, because 
that would break the previous item) we nuke the watchpoint registers (and their 
reference counts) and start over?

I am not convinced that having watchpoint slot sharing is important enough to 
balance the amount of technical debt this introduces.


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-26 Thread Muhammad Omair Javaid via lldb-commits
omjavaid updated this revision to Diff 72589.
omjavaid added a comment.

This is a new version of what seems to me fully implementing functionality we 
intend to have here.

On a second thought nuking ClearHardwareWatchpoint function seems to be the 
wrong approach here. I spent some time taking different approaches and it turns 
out that if we do not ClearHardwareWatchpoint when back-end asks us to remove 
it then we wont be able to step over watchpoints. On ARM targets we have to 
first clear and then reinstall watchpoints to step over the watchpoint 
instruction.

On the other hand if we call 
NativeRegisterContextLinux_arm::ClearHardwareWatchpoint then that watchpoint 
stands removed if call is just to delete watch on one of the bytes. And if we 
follow up with creating a new watchpoint on a different word the slot being 
used may appear vaccant which is actually inconsistent behavior.

So I have a new approach that does clear watchpoint registers if 
NativeRegisterContextLinux_arm::ClearHardwareWatchpoint is called but we still 
track reference counts by re-introducing refcount that I removed in my last 
patch. This will mean that a follow up create may fail just because there are 
still references to disabled watchpoint and watchpoint slots are still not 
vaccant. I have made changes to the test to reflect this behaviour.

Please comment if you have any reservation about this approach.


https://reviews.llvm.org/D24610

Files:
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/Makefile
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/TestWatchpointMultipleSlots.py
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c
  source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
  source/Plugins/Process/Linux/NativeThreadLinux.cpp

Index: source/Plugins/Process/Linux/NativeThreadLinux.cpp
===
--- source/Plugins/Process/Linux/NativeThreadLinux.cpp
+++ source/Plugins/Process/Linux/NativeThreadLinux.cpp
@@ -198,8 +198,10 @@
   m_stop_info.reason = StopReason::eStopReasonNone;
   m_stop_description.clear();
 
-  // If watchpoints have been set, but none on this thread,
-  // then this is a new thread. So set all existing watchpoints.
+  // Invalidate watchpoint index map for a re-sync
+  m_watchpoint_index_map.clear();
+
+  // Re-sync all available watchpoints.
   if (m_watchpoint_index_map.empty()) {
 NativeProcessLinux &process = GetProcess();
 
Index: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
===
--- source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
+++ source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
@@ -508,8 +508,19 @@
   if (error.Fail())
 return LLDB_INVALID_INDEX32;
 
-  uint32_t control_value = 0, wp_index = 0, addr_word_offset = 0, byte_mask = 0;
+  uint32_t control_value = 0;
   lldb::addr_t real_addr = addr;
+  uint32_t wp_index = LLDB_INVALID_INDEX32;
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 
   // Check if we are setting watchpoint other than read/write/access
   // Also update watchpoint flag to match Arm write-read bit configuration.
@@ -526,86 +537,63 @@
 return LLDB_INVALID_INDEX32;
   }
 
-  // Can't watch zero bytes
-  // Can't watch more than 4 bytes per WVR/WCR pair
+  // Iterate over stored watchpoints and find a free or duplicate wp_index
+  for (uint32_t i = 0; i < m_max_hwp_supported; i++) {
+if ((m_hwp_regs[i].control & 1) == 0 && (m_hwp_regs[i].refcount <= 0)) {
+  wp_index = i; // Mark last free slot
+} else if (m_hwp_regs[i].address == addr) {
+  wp_index = i; // Mark duplicate index
+  break;// Stop searching here
+}
+  }
 
-  if (size == 0 || size > 4)
+  // No vaccant slot available and no duplicate slot found.
+  if (wp_index == LLDB_INVALID_INDEX32)
 return LLDB_INVALID_INDEX32;
 
-  // Check 4-byte alignment for hardware watchpoint target address.
-  // Below is a hack to recalculate address and size in order to
-  // make sure we can watch non 4-byte alligned addresses as well.
-  if (addr & 0x03) {
-uint8_t watch_mask = (addr & 0x03) + size;
-
-if (watch_mask > 0x04)
-  return LLDB_INVALID_INDEX32;
-else if (watch_mask <= 0x02)
-  size = 2;
-else if (watch_mask <= 0x04)
-  size = 4;
-
-addr = addr & (~0x03);
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint slot.
+  current_watch_size =

Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-22 Thread Pavel Labath via lldb-commits
labath accepted this revision.


Comment at: 
source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 

omjavaid wrote:
> labath wrote:
> > zturner wrote:
> > > This block of code is a bit confusing to me.  Is this equivalent to:
> > > 
> > > ```
> > > lldb::addr_t start = llvm::alignDown(addr, 4);
> > > lldb::addr_t end = addr + size;
> > > if (start == end || (end-start)>4)
> > >   return LLDB_INVALID_INDEX32;
> > > ```
> > I am not sure this is much clearer, especially, as we will later need a 
> > separate varaible for `end-start` anyway.
> > 
> > +1 for `llvm::alignDown` though.
> There is significant performance difference when we choose addr = addr & 
> (~0x03); over llvm::alignDown
> 
> It will eventually effect responsiveness if we keep increasing code size like 
> this. 
> 
> Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I 
> havnt tried clang though.
> 
> Instructions needed for addr = addr & (~0x03):
>   4008bd: 48 8b 45 e8 mov-0x18(%rbp),%rax
>   4008c1: 48 83 e0 fc and$0xfffc,%rax
>   4008c5: 48 89 45 e8 mov%rax,-0x18(%rbp)
> 
> Call penalty for llvm::alignDown
> 400918:   ba 00 00 00 00  mov$0x0,%edx
>   40091d: 48 89 cemov%rcx,%rsi
>   400920: 48 89 c7mov%rax,%rdi
>   400923: e8 ae 00 00 00  callq  4009d6 <_Z9alignDownmmm>
>   400928: 49 89 c4mov%rax,%r12
>   40092b: 48 8b 5d d8 mov-0x28(%rbp),%rbx
> 
> Disassembly for llvm::alignDown
> 004009d6 <_Z9alignDownmmm>:
>   4009d6: 55  push   %rbp
>   4009d7: 48 89 e5mov%rsp,%rbp
>   4009da: 48 89 7d f8 mov%rdi,-0x8(%rbp)
>   4009de: 48 89 75 f0 mov%rsi,-0x10(%rbp)
>   4009e2: 48 89 55 e8 mov%rdx,-0x18(%rbp)
>   4009e6: 48 8b 45 e8 mov-0x18(%rbp),%rax
>   4009ea: ba 00 00 00 00  mov$0x0,%edx
>   4009ef: 48 f7 75 f0 divq   -0x10(%rbp)
>   4009f3: 48 89 55 e8 mov%rdx,-0x18(%rbp)
>   4009f7: 48 8b 45 f8 mov-0x8(%rbp),%rax
>   4009fb: 48 2b 45 e8 sub-0x18(%rbp),%rax
>   4009ff: ba 00 00 00 00  mov$0x0,%edx
>   400a04: 48 f7 75 f0 divq   -0x10(%rbp)
>   400a08: 48 0f af 45 f0  imul   -0x10(%rbp),%rax
>   400a0d: 48 89 c2mov%rax,%rdx
>   400a10: 48 8b 45 e8 mov-0x18(%rbp),%rax
>   400a14: 48 01 d0add%rdx,%rax
>   400a17: 5d  pop%rbp
>   400a18: c3  retq   
>   400a19: 0f 1f 80 00 00 00 00nopl   0x0(%rax)
> 
> Number of instructions generated for alignDown with gcc -Os
> 
>   400892: 48 8b 6c 24 08  mov0x8(%rsp),%rbp
>   400897: 48 8b 4c 24 10  mov0x10(%rsp),%rcx
>   40089c: 31 d2   xor%edx,%edx
>   40089e: be c2 0a 40 00  mov$0x400ac2,%esi
>   4008a3: bf a0 11 60 00  mov$0x6011a0,%edi
>   4008a8: 48 89 e8mov%rbp,%rax
>   4008ab: 48 f7 f1div%rcx
>   4008ae: 48 0f af c1 imul   %rcx,%rax
>   4008b2: 48 89 c3mov%rax,%rbx
I tried this with clang-3.6 -O3. The entire alignDown function call compiled 
down to `andq $-4, %rdi`. I'll leave this up to you, as I think it is readable 
enough right now, but I don't think we should be afraid of using utility 
functions like this. I think LLVM cares a lot more about performance than we, 
so I believe we can rely on them knowing what they're doing. Also we have much 
bigger higher-level issues affecting performance, the least of which is the 
change below, where you re-calculate all watchpoints on every resume.


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
 
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint 
slot.

omjavaid wrote:
> labath wrote:
> > Looks much better. Any reason for not using `NextPowerOf2` ? Among other 
> > things, it is self-documenting, so you do not need the comment above that.
> so llvm::NextPowerOf2 doesnt serve the intended behaviour.
> 
> llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.

Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-19 Thread Muhammad Omair Javaid via lldb-commits
omjavaid added a comment.

Answers to comments. I will upload a updated patch after corrections and 
updates.



Comment at: 
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23
@@ +22,3 @@
+{
+printf("About to write byteArray[%d] ...\n", i); // About to write 
byteArray
+

labath wrote:
> zturner wrote:
> > What's up with all the double spaced source code?  Is this intentional?
> Indeed. The spaces seem superfluous.
Agreed. Will be removed in next iteration.


Comment at: 
source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 

labath wrote:
> zturner wrote:
> > This block of code is a bit confusing to me.  Is this equivalent to:
> > 
> > ```
> > lldb::addr_t start = llvm::alignDown(addr, 4);
> > lldb::addr_t end = addr + size;
> > if (start == end || (end-start)>4)
> >   return LLDB_INVALID_INDEX32;
> > ```
> I am not sure this is much clearer, especially, as we will later need a 
> separate varaible for `end-start` anyway.
> 
> +1 for `llvm::alignDown` though.
There is significant performance difference when we choose addr = addr & 
(~0x03); over llvm::alignDown

It will eventually effect responsiveness if we keep increasing code size like 
this. 

Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I havnt 
tried clang though.

Instructions needed for addr = addr & (~0x03):
  4008bd:   48 8b 45 e8 mov-0x18(%rbp),%rax
  4008c1:   48 83 e0 fc and$0xfffc,%rax
  4008c5:   48 89 45 e8 mov%rax,-0x18(%rbp)

Call penalty for llvm::alignDown
400918: ba 00 00 00 00  mov$0x0,%edx
  40091d:   48 89 cemov%rcx,%rsi
  400920:   48 89 c7mov%rax,%rdi
  400923:   e8 ae 00 00 00  callq  4009d6 <_Z9alignDownmmm>
  400928:   49 89 c4mov%rax,%r12
  40092b:   48 8b 5d d8 mov-0x28(%rbp),%rbx

Disassembly for llvm::alignDown
004009d6 <_Z9alignDownmmm>:
  4009d6:   55  push   %rbp
  4009d7:   48 89 e5mov%rsp,%rbp
  4009da:   48 89 7d f8 mov%rdi,-0x8(%rbp)
  4009de:   48 89 75 f0 mov%rsi,-0x10(%rbp)
  4009e2:   48 89 55 e8 mov%rdx,-0x18(%rbp)
  4009e6:   48 8b 45 e8 mov-0x18(%rbp),%rax
  4009ea:   ba 00 00 00 00  mov$0x0,%edx
  4009ef:   48 f7 75 f0 divq   -0x10(%rbp)
  4009f3:   48 89 55 e8 mov%rdx,-0x18(%rbp)
  4009f7:   48 8b 45 f8 mov-0x8(%rbp),%rax
  4009fb:   48 2b 45 e8 sub-0x18(%rbp),%rax
  4009ff:   ba 00 00 00 00  mov$0x0,%edx
  400a04:   48 f7 75 f0 divq   -0x10(%rbp)
  400a08:   48 0f af 45 f0  imul   -0x10(%rbp),%rax
  400a0d:   48 89 c2mov%rax,%rdx
  400a10:   48 8b 45 e8 mov-0x18(%rbp),%rax
  400a14:   48 01 d0add%rdx,%rax
  400a17:   5d  pop%rbp
  400a18:   c3  retq   
  400a19:   0f 1f 80 00 00 00 00nopl   0x0(%rax)

Number of instructions generated for alignDown with gcc -Os

  400892:   48 8b 6c 24 08  mov0x8(%rsp),%rbp
  400897:   48 8b 4c 24 10  mov0x10(%rsp),%rcx
  40089c:   31 d2   xor%edx,%edx
  40089e:   be c2 0a 40 00  mov$0x400ac2,%esi
  4008a3:   bf a0 11 60 00  mov$0x6011a0,%edi
  4008a8:   48 89 e8mov%rbp,%rax
  4008ab:   48 f7 f1div%rcx
  4008ae:   48 0f af c1 imul   %rcx,%rax
  4008b2:   48 89 c3mov%rax,%rbx


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
 
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint 
slot.

labath wrote:
> Looks much better. Any reason for not using `NextPowerOf2` ? Among other 
> things, it is self-documenting, so you do not need the comment above that.
so llvm::NextPowerOf2 doesnt serve the intended behaviour.

llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.

We just want to make sure that our new_size is a power of 2 if its already not 
which will only be the case if size turns out to be 3.


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-18 Thread Pavel Labath via lldb-commits
labath added inline comments.


Comment at: 
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23
@@ +22,3 @@
+{
+printf("About to write byteArray[%d] ...\n", i); // About to write 
byteArray
+

zturner wrote:
> What's up with all the double spaced source code?  Is this intentional?
Indeed. The spaces seem superfluous.


Comment at: 
source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 

zturner wrote:
> This block of code is a bit confusing to me.  Is this equivalent to:
> 
> ```
> lldb::addr_t start = llvm::alignDown(addr, 4);
> lldb::addr_t end = addr + size;
> if (start == end || (end-start)>4)
>   return LLDB_INVALID_INDEX32;
> ```
I am not sure this is much clearer, especially, as we will later need a 
separate varaible for `end-start` anyway.

+1 for `llvm::alignDown` though.


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
 
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint 
slot.

Looks much better. Any reason for not using `NextPowerOf2` ? Among other 
things, it is self-documenting, so you do not need the comment above that.


Comment at: source/Plugins/Process/Linux/NativeThreadLinux.cpp:202
@@ +201,3 @@
+  // Invalidate watchpoint index map to re-sync watchpoint registers.
+  m_watchpoint_index_map.clear();
+

If you add this, then the comment below becomes obsolete.

Seems like a pretty elegant solution to the incremental watchpoint update 
problem. I am wondering whether we need to do it on every resume though. I 
think it should be enough to do it when a watchpoint gets deleted 
(`NTL::RemoveWatchpoint`). Also, we should throw out the implementation of 
`NativeRegisterContextLinux_arm::ClearHardwareWatchpoint` -- it's no longer 
necessary, and it's not even correct anymore.


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-16 Thread Zachary Turner via lldb-commits
zturner added a subscriber: zturner.


Comment at: 
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23
@@ +22,3 @@
+{
+printf("About to write byteArray[%d] ...\n", i); // About to write 
byteArray
+

What's up with all the double spaced source code?  Is this intentional?


Comment at: 
source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 

This block of code is a bit confusing to me.  Is this equivalent to:

```
lldb::addr_t start = llvm::alignDown(addr, 4);
lldb::addr_t end = addr + size;
if (start == end || (end-start)>4)
  return LLDB_INVALID_INDEX32;
```


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-16 Thread Muhammad Omair Javaid via lldb-commits
omjavaid updated this revision to Diff 71633.
omjavaid added a comment.
Herald added subscribers: srhines, danalbert, tberghammer.

I have added a new test case that tests suggested scnario without changing any 
previous test cases.

Also I have made sure we re validate all watchpoint installed on thread resume 
to make sure we have the latest values assigned to hardware watchpoint 
registers.

This passes on ARM (RaspberryPi3, Samsung Chromebook). I have not yet tested on 
android.

This will fail on targets which dont support multiple watchpoint slots.

Also this should fail on AArch64 which I am currently working on.


https://reviews.llvm.org/D24610

Files:
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/Makefile
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/TestWatchpointMultipleSlots.py
  
packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c
  source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
  source/Plugins/Process/Linux/NativeThreadLinux.cpp

Index: source/Plugins/Process/Linux/NativeThreadLinux.cpp
===
--- source/Plugins/Process/Linux/NativeThreadLinux.cpp
+++ source/Plugins/Process/Linux/NativeThreadLinux.cpp
@@ -198,6 +198,9 @@
   m_stop_info.reason = StopReason::eStopReasonNone;
   m_stop_description.clear();
 
+  // Invalidate watchpoint index map to re-sync watchpoint registers.
+  m_watchpoint_index_map.clear();
+
   // If watchpoints have been set, but none on this thread,
   // then this is a new thread. So set all existing watchpoints.
   if (m_watchpoint_index_map.empty()) {
Index: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
===
--- source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
+++ source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp
@@ -410,15 +410,13 @@
   if ((m_hbr_regs[bp_index].control & 1) == 0) {
 m_hbr_regs[bp_index].address = addr;
 m_hbr_regs[bp_index].control = control_value;
-m_hbr_regs[bp_index].refcount = 1;
 
 // PTRACE call to set corresponding hardware breakpoint register.
 error = WriteHardwareDebugRegs(eDREGTypeBREAK, bp_index);
 
 if (error.Fail()) {
   m_hbr_regs[bp_index].address = 0;
   m_hbr_regs[bp_index].control &= ~1;
-  m_hbr_regs[bp_index].refcount = 0;
 
   return LLDB_INVALID_INDEX32;
 }
@@ -508,8 +506,19 @@
   if (error.Fail())
 return LLDB_INVALID_INDEX32;
 
-  uint32_t control_value = 0, wp_index = 0, addr_word_offset = 0, byte_mask = 0;
+  uint32_t control_value = 0;
   lldb::addr_t real_addr = addr;
+  uint32_t wp_index = LLDB_INVALID_INDEX32;
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 
   // Check if we are setting watchpoint other than read/write/access
   // Also update watchpoint flag to match Arm write-read bit configuration.
@@ -526,86 +535,58 @@
 return LLDB_INVALID_INDEX32;
   }
 
-  // Can't watch zero bytes
-  // Can't watch more than 4 bytes per WVR/WCR pair
-
-  if (size == 0 || size > 4)
-return LLDB_INVALID_INDEX32;
-
-  // Check 4-byte alignment for hardware watchpoint target address.
-  // Below is a hack to recalculate address and size in order to
-  // make sure we can watch non 4-byte alligned addresses as well.
-  if (addr & 0x03) {
-uint8_t watch_mask = (addr & 0x03) + size;
-
-if (watch_mask > 0x04)
-  return LLDB_INVALID_INDEX32;
-else if (watch_mask <= 0x02)
-  size = 2;
-else if (watch_mask <= 0x04)
-  size = 4;
-
-addr = addr & (~0x03);
+  // Iterate over stored watchpoints and find a free or duplicate wp_index
+  for (uint32_t i = 0; i < m_max_hwp_supported; i++) {
+if ((m_hwp_regs[i].control & 1) == 0) {
+  wp_index = i; // Mark last free slot
+} else if (m_hwp_regs[i].address == addr) {
+  wp_index = i; // Mark duplicate index
+  break;// Stop searching here
+}
   }
 
-  // We can only watch up to four bytes that follow a 4 byte aligned address
-  // per watchpoint register pair, so make sure we can properly encode this.
-  addr_word_offset = addr % 4;
-  byte_mask = ((1u << size) - 1u) << addr_word_offset;
-
-  // Check if we need multiple watchpoint register
-  if (byte_mask > 0xfu)
+  // No vaccant slot available and no duplicate slot found.
+  if (wp_index == LLDB_INVALID_INDEX32)
 return LLDB_INVALID_INDEX32;
 
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint slot.
+  current_watch_

Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-16 Thread Muhammad Omair Javaid via lldb-commits
omjavaid added a comment.

comments inline.



Comment at: 
packages/Python/lldbsuite/test/functionalities/watchpoint/watchpoint_size/TestWatchpointSizes.py:43
@@ -42,2 +42,3 @@
 """Test to selectively watch different bytes in a 8-byte array."""
-self.run_watchpoint_size_test('byteArray', 8, '1')
+if self.getArchitecture() in ['arm']:
+self.run_watchpoint_size_test('byteArray', 8, '1', 1)

labath wrote:
> It's not clear to me why you need to modify the existing test for this 
> change. You are adding functionality, so all existing tests should pass as-is 
> (which will also validate that your change did not introduce regressions).
As we keep on adding these tests its increasing our overall testing time. I 
just thought using the same test with some changes will cover the cases we want 
to test.

Anyways we can write separate tests as well.


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:556
@@ +555,3 @@
+uint32_t current_size = GetWatchpointSize(wp_index);
+if ((current_size == 4) || (current_size == 2 && watch_mask <= 2) ||
+(current_size == 1 && watch_mask == 1))

labath wrote:
> The logic here is extremely convoluted. Doesn't this code basically boil down 
> to:
> ```
> current_size = m_hwp_regs[wp_index].control & 1 ? GetWatchpointSize(wp_index) 
> : 0;
> new_size = llvm::NextPowerOf2(std::max(current_size, watch_mask));
> // update the control value, write the debug registers...
> ```
> 
> Also `watch_mask` should probably be renamed to `watch_size`, as it doesn't 
> appear to be a mask.
Seems legit. I ll update this in next patch.


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:602
@@ -612,3 +601,3 @@
 
 bool NativeRegisterContextLinux_arm::ClearHardwareWatchpoint(
 uint32_t wp_index) {

labath wrote:
> This looks a bit worrying.
> What will happen after the following sequence of events:
> - client tells us to set a watchpoint at 0x1000
> - we set the watchpoint
> - client tells us to set a watchpoint at 0x1001
> - we extend the previous watchpoint to watch this address as well
> - client tells us to delete the watchpoint at 0x1000
> - ???
> 
> Will we remain watching the address 0x1001? I don't see how will you be able 
> to do that without maintaining a some info about the original watchpoints the 
> client requested (and I have not seen that code). Please add a test for this.
I just realized that I missed a crucial change in my last patch that we need to 
make all this work.

Let me get back with correction and desired test cases.


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-15 Thread Greg Clayton via lldb-commits
clayborg requested changes to this revision.
clayborg added a comment.

Great fix. Just fix the testing so that it isn't ARM specific. There shouldn't 
be any:

  if self.getArchitecture() in ['arm']:
do arm stuff
  else:
do non arm stuff

Also we will need to be able to test the set watch at 0x1000, then at 0x1001 by 
sharing, clear 0x1000 make sure 0x1001 still works, etc.


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

2016-09-15 Thread Pavel Labath via lldb-commits
labath requested changes to this revision.
labath added a comment.
This revision now requires changes to proceed.

I have some doubts about the validity of this patch. We should make sure those 
are cleared before putting this in.



Comment at: 
packages/Python/lldbsuite/test/functionalities/watchpoint/watchpoint_size/TestWatchpointSizes.py:43
@@ -42,2 +42,3 @@
 """Test to selectively watch different bytes in a 8-byte array."""
-self.run_watchpoint_size_test('byteArray', 8, '1')
+if self.getArchitecture() in ['arm']:
+self.run_watchpoint_size_test('byteArray', 8, '1', 1)

It's not clear to me why you need to modify the existing test for this change. 
You are adding functionality, so all existing tests should pass as-is (which 
will also validate that your change did not introduce regressions).


Comment at: 
packages/Python/lldbsuite/test/functionalities/watchpoint/watchpoint_size/TestWatchpointSizes.py:154
@@ +153,3 @@
+# Continue after hitting watchpoint twice (Read/Write)
+if slots != 0:
+self.runCmd("process continue")

It looks like this test will test something completely different on arm than on 
other architectures. You would probably be better off writing a new test for 
this.


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:556
@@ +555,3 @@
+uint32_t current_size = GetWatchpointSize(wp_index);
+if ((current_size == 4) || (current_size == 2 && watch_mask <= 2) ||
+(current_size == 1 && watch_mask == 1))

The logic here is extremely convoluted. Doesn't this code basically boil down 
to:
```
current_size = m_hwp_regs[wp_index].control & 1 ? GetWatchpointSize(wp_index) : 
0;
new_size = llvm::NextPowerOf2(std::max(current_size, watch_mask));
// update the control value, write the debug registers...
```

Also `watch_mask` should probably be renamed to `watch_size`, as it doesn't 
appear to be a mask.


Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:602
@@ -612,3 +601,3 @@
 
 bool NativeRegisterContextLinux_arm::ClearHardwareWatchpoint(
 uint32_t wp_index) {

This looks a bit worrying.
What will happen after the following sequence of events:
- client tells us to set a watchpoint at 0x1000
- we set the watchpoint
- client tells us to set a watchpoint at 0x1001
- we extend the previous watchpoint to watch this address as well
- client tells us to delete the watchpoint at 0x1000
- ???

Will we remain watching the address 0x1001? I don't see how will you be able to 
do that without maintaining a some info about the original watchpoints the 
client requested (and I have not seen that code). Please add a test for this.


https://reviews.llvm.org/D24610



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits