D4505: util: lower water mark when removing nodes after cost limit reached

2018-09-12 Thread indygreg (Gregory Szorc)
This revision was automatically updated to reflect the committed changes.
Closed by commit rHGf296c0b366c8: util: lower water mark when removing nodes 
after cost limit reached (authored by indygreg, committed by ).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D4505?vs=10824=10944

REVISION DETAIL
  https://phab.mercurial-scm.org/D4505

AFFECTED FILES
  mercurial/util.py
  tests/test-lrucachedict.py

CHANGE DETAILS

diff --git a/tests/test-lrucachedict.py b/tests/test-lrucachedict.py
--- a/tests/test-lrucachedict.py
+++ b/tests/test-lrucachedict.py
@@ -314,10 +314,10 @@
 # Inserting new element should free multiple elements so we hit
 # low water mark.
 d.insert('e', 'vd', cost=25)
-self.assertEqual(len(d), 3)
+self.assertEqual(len(d), 2)
 self.assertNotIn('a', d)
 self.assertNotIn('b', d)
-self.assertIn('c', d)
+self.assertNotIn('c', d)
 self.assertIn('d', d)
 self.assertIn('e', d)
 
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -1472,11 +1472,21 @@
 # to walk the linked list and doing this in a loop would be
 # quadratic. So we find the first non-empty node and then
 # walk nodes until we free up enough capacity.
+#
+# If we only removed the minimum number of nodes to free enough
+# cost at insert time, chances are high that the next insert would
+# also require pruning. This would effectively constitute quadratic
+# behavior for insert-heavy workloads. To mitigate this, we set a
+# target cost that is a percentage of the max cost. This will tend
+# to free more nodes when the high water mark is reached, which
+# lowers the chances of needing to prune on the subsequent insert.
+targetcost = int(self.maxcost * 0.75)
+
 n = self._head.prev
 while n.key is _notset:
 n = n.prev
 
-while len(self) > 1 and self.totalcost > self.maxcost:
+while len(self) > 1 and self.totalcost > targetcost:
 del self._cache[n.key]
 self.totalcost -= n.cost
 n.markempty()



To: indygreg, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D4505: util: lower water mark when removing nodes after cost limit reached

2018-09-06 Thread indygreg (Gregory Szorc)
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  See the inline comment for the reasoning here. This is a pretty
  common strategy for garbage collectors, other cache-like primtives.
  
  The performance impact is substantial:
  
  $ hg perflrucachedict --size 4 --gets 100 --sets 100 --mixed 100 
--costlimit 100
  ! inserts w/ cost limit
  ! wall 1.659181 comb 1.65 user 1.65 sys 0.00 (best of 7)
  ! wall 1.722122 comb 1.72 user 1.72 sys 0.00 (best of 6)
  ! mixed w/ cost limit
  ! wall 1.139955 comb 1.14 user 1.14 sys 0.00 (best of 9)
  ! wall 1.182513 comb 1.18 user 1.18 sys 0.00 (best of 9)
  
  $ hg perflrucachedict --size 1000 --gets 100 --sets 100 --mixed 
100 --costlimit 1
  ! inserts
  ! wall 0.679546 comb 0.68 user 0.68 sys 0.00 (best of 15)
  ! sets
  ! wall 0.825147 comb 0.83 user 0.83 sys 0.00 (best of 13)
  ! inserts w/ cost limit
  ! wall 25.105273 comb 25.08 user 25.08 sys 0.00 (best of 3)
  ! wall  1.724397 comb  1.72 user  1.72 sys 0.00 (best of 6)
  ! mixed
  ! wall 0.807096 comb 0.81 user 0.81 sys 0.00 (best of 13)
  ! mixed w/ cost limit
  ! wall 12.104470 comb 12.07 user 12.07 sys 0.00 (best of 3)
  ! wall  1.190563 comb  1.19 user  1.19 sys 0.00 (best of 9)
  
  $ hg perflrucachedict --size 1000 --gets 100 --sets 100 --mixed 
100 --costlimit 1 --mixedgetfreq 90
  ! inserts
  ! wall 0.711177 comb 0.71 user 0.71 sys 0.00 (best of 14)
  ! sets
  ! wall 0.846992 comb 0.85 user 0.85 sys 0.00 (best of 12)
  ! inserts w/ cost limit
  ! wall 25.963028 comb 25.96 user 25.96 sys 0.00 (best of 3)
  ! wall  2.184311 comb  2.18 user  2.18 sys 0.00 (best of 5)
  ! mixed
  ! wall 0.728256 comb 0.73 user 0.73 sys 0.00 (best of 14)
  ! mixed w/ cost limit
  ! wall 3.174256 comb 3.17 user 3.17 sys 0.00 (best of 4)
  ! wall 0.773186 comb 0.77 user 0.77 sys 0.00 (best of 13)
  
  $ hg perflrucachedict --size 10 --gets 100 --sets 100 --mixed 
100 --mixedgetfreq 90 --costlimit 500
  ! gets
  ! wall 1.191368 comb 1.19 user 1.19 sys 0.00 (best of 9)
  ! wall 1.195304 comb 1.19 user 1.19 sys 0.00 (best of 9)
  ! inserts
  ! wall 0.950995 comb 0.95 user 0.95 sys 0.00 (best of 11)
  ! inserts w/ cost limit
  ! wall 1.589732 comb 1.59 user 1.59 sys 0.00 (best of 7)
  ! sets
  ! wall 1.094941 comb 1.10 user 1.09 sys 0.01 (best of 9)
  ! mixed
  ! wall 0.936420 comb 0.94 user 0.93 sys 0.01 (best of 10)
  ! mixed w/ cost limit
  ! wall 0.882780 comb 0.87 user 0.87 sys 0.00 (best of 11)
  
  This puts us ~2x slower than caches without cost accounting. And for
  read-heavy workloads (the prime use cases for caches), performance is
  nearly identical.
  
  In the worst case (pure write workloads with cost accounting enabled),
  we're looking at ~1.5us per insert on large caches. That seems "fast
  enough."

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4505

AFFECTED FILES
  mercurial/util.py
  tests/test-lrucachedict.py

CHANGE DETAILS

diff --git a/tests/test-lrucachedict.py b/tests/test-lrucachedict.py
--- a/tests/test-lrucachedict.py
+++ b/tests/test-lrucachedict.py
@@ -310,10 +310,10 @@
 # Inserting new element should free multiple elements so we hit
 # low water mark.
 d.insert('e', 'vd', cost=25)
-self.assertEqual(len(d), 3)
+self.assertEqual(len(d), 2)
 self.assertNotIn('a', d)
 self.assertNotIn('b', d)
-self.assertIn('c', d)
+self.assertNotIn('c', d)
 self.assertIn('d', d)
 self.assertIn('e', d)
 
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -1472,11 +1472,21 @@
 # to walk the linked list and doing this in a loop would be
 # quadratic. So we find the first non-empty node and then
 # walk nodes until we free up enough capacity.
+#
+# If we only removed the minimum number of nodes to free enough
+# cost at insert time, chances are high that the next insert would
+# also require pruning. This would effectively constitute quadratic
+# behavior for insert-heavy workloads. To mitigate this, we set a
+# target cost that is a percentage of the max cost. This will tend
+# to free more nodes when the high water mark is reached, which
+# lowers the chances of needing to prune on the subsequent insert.
+targetcost = int(self.maxcost * 0.75)
+
 n = self._head.prev
 while n.key is _notset:
 n = n.prev
 
-while len(self) > 1 and self.totalcost > self.maxcost:
+while