Re: [patch] extend() does not change an item in a locked dictionary

Olaf Dabrunz Wed, 24 Jun 2015 06:48:28 -0700

This needed more time and work on more realistic tests, and to create a
fix that addresses all issues.  Sorry for the delay.


Summary:

    - Bug: locks can result in false error reports or missing error
      reports when the locks are not interpreted in the same way by all
      vim functions and commands.

      When this is fixed, locks
      
        - can reliably be used to find mistakes

        - do not terminate production runs (or test runs) because of
          unexpected false positives.

    - Bug: on an attempt to change a locked item, extend() is left with
      an error and all previous changes are kept (exit-on-error).

    - Bug: when the target dict is locked, extend() reports false errors
      for 'extend(d, {})' and 'extend({"a": 1}, {"a": 2}, "keep")'.

    - New, less artificial tests show that a lock check loop adds less
      overhead than suggested by previous tests, esp. for common use
      cases.

      For the more common use cases, a separate lock test loop adds an
      overhead of

        avg.:  17%  stddev: ± 16,   max.  40%.

      Across the whole range of use cases, the tests show an overhead of

        avg.:  34%  stddev: ± 27,   max. 100%.

      These tests are still somewhat artificial.  The overhead shrinks
      beyond these values when larger item values (e.g. real-life
      strings) are used.

    - Fix: all above bugs are fixed for extend() by using a lock check
      loop that is only used when the target dict has locks set.

      This implements all-or-nothing behavior at near-zero cost when no
      locks are used on the target dict.

      When locks are used on the target dict, the slowdown is as shown
      in above tests.

    - Open issue: are lock check loops needed in other situations (e.g.
      for lists), and should a similar lock count optimization be used?


On 02-May-15, Bram Moolenaar wrote:
> 
> Olaf Dabrunz wrote:
> 
> > > > Example:
> > > > 
> > > >     :let d = {'a': 99, 'b': 100}
> > > >     :lockvar 1 d
> > > >     :call extend(d, {'a': 123})
> > > >         E741: Value is locked: extend() argument
> > > >     :let d.a = 123
> > > >     :let d
> > > >         d                     {'a': 123, 'b': 100}
> > > > 
> > > > The dictionary is locked, but the 'a' item is not, so extend() on 'a'
> > > > should succeed in the same way as the 'let' does. According to ':he
> > > > lockvar', a dictionary lock protects against item deletion and addition,
> > > > but not against changing an existing item.
> > > 
> > > Hmm, one could argue that extend() adds the new key/value pair, thus
> > > that it should fail.
> > 
> > For lists, I agree -- extend(list, list2[, where]) does not touch
> > existing items.
> > 
> > For dicts, extend() modifies existing items and adds new items when
> > their keys do not exist yet.
> > 
> > > Zyx's argument that it may change one existing key/value but then fail
> > > on a following one that adds a key makes sense.
> > 
> > This was also discussed by you and ZyX for patch 7.4.419 ("when part of
> > a list is locked it's possible to make changes"), which is a related
> > situation.  ZyX proposed several approaches, and the result was "[list]
> > items will be removed [or changed] only if all items are not locked".
> > 
> > >                                                  Perhaps we should just
> > > state "extend() on a locked list or dict does not work". 

The implementation advantage is that the dict lock only needs to be
checked once.  This achieves all-or-nothing behavior and a lock checking
loop can be avoided.  And this is what the current implementation does,
it only needs to be documented.

But see below.

> > 
> > If we let extend() fail here, then ':let' (and map()) should fail as
> > well.  The lock should not have a different effect when the change is
> > attempted by different means.
> 
> Well, it depends on how you look at it.  Although extend() can be used
> to change the value of an existing item, it is not what it is normally
> used for.  Using an assignment to an item with a specific key is a more
> obvious way to change the value.  Of course, if the key does not exist
> the assignment would fail if the dict is locked.
>
> Also: It's really easy to document that extend() does not work on a
> locked dictionary.  That is also an indication that this is easy to
> understand.  Trying to explain that it might work in specific situations
> is more complicated, indicating it's also harder to understand for
> users.

Hmmm, yes, from the viewpoint of "how do I use extend()?" it is simple:
extend() does not change a locked dict.  This is easy to understand, and
should be easy to use.

But on the other hand, from the viewpoint of "how can I protect data
with locks?", extend() is a special case.  (Which also needs special
documentation in ':he extend()' and ':he lockvar'.)


The main problem is that we generally cannot predict whether extend() or
:let or some other function or command will be used on a dict.

So we cannot know if the dict will be protected according to extend()'s
interpretation of locks or according to the interpretation by :let or
others (or by both, at different points).

Example:

        :let d = {'x': 1, 'func': function('Userfunc')}
        :lockvar 1 d
        :lockvar d.func
        :
        :call SomeFunc(d)

    Is d.x locked or not?

    To find out if extend() or :let or something else is used on the
    dict, we need to analyze all possible code paths, including
    functions called in other scripts, triggered autocommands and used
    expressions ('statusline', 'diffexpr', ...).  The latter can also
    have been set up by any script.

    We cannot predict all scripts and script versions that a user has
    installed, or future code changes.  And even the analysis of the
    known code can be tedious (without tools or help by vimL).

    So we cannot know if d.x will be treated as locked or not.

But locks should help making a code analysis, rather than require one.

So to make locks helpful, the interpretation of locks should be the same
everywhere.


False positives by inconsistent locks can break production runs:
----------------------------------------------------------------

In a production situation, a new version of a foreign script may switch
from using :let to use extend().  If this touches a dict that was locked
by our script, this may generate false (= unintended) errors which could
terminate an expression ('diffexpr' or 'statusline', etc.), an 'abort'
function or a :try block.

So inconsistent locks can break production runs.


Example use case for a consistent dict lock that permits item changes:
----------------------------------------------------------------------

I had a 'statusline' expression that used a dict as a cache for
statusline items.

Sometimes bogus new items were added to the cache.  A dict lock would
throw an error on item addition, to detect the source of the bougs new
items.

But the dict lock also threw errors for item changes, when done by
extend().  And unfortunately, it was not easily possible to selectively
disable item updates while keeping the bogus item additions running.
(To do that, I would have needed to know where the bogus updates come
from.)

As the statusline expression terminates on the first error, item updates
did terminate it, and it did not get to the point where the bogus item
additions appeared.

    Note that termination on first error can also happen in 'abort'
    functions, in a :try block and in some other expressions.

I ended up debugging this with many ':echo's.

Lessons for locks:

    - Inconsistent locks may generate false positives, which may prevent
      reaching the point where the bug can be reproduced.

    - A more specific dict lock that does not lock items against changes
      is useful in finding specific mistakes.


Other problems:
---------------

    - When an item change is prevented by a dict item lock, then
      extend() returns with an error, keeping previously made changes.

        :let d = {'a': 1, 'b': 2}
        :lockvar d.b
        :call extend(d, {'a': 99, 'b': 101})
            E741: Value is locked: extend() argument
        :let d
            d                     {'a': 99, 'b': 2}

      For all-or-nothing behaviour, we need a separate lock checking
      loop.

      When this separate lock check loop is used, the main advantage of
      interpreting a dict lock (!) also as a change lock for all items
      is lost: a separate lock check loop is needed anyway.

    - extend(d1, d2, action) reports lock errors even when no changes
      would have been made.

      This happens when d2 is empty, and when d2 would only change
      existing items in d1 and 'action' is "keep".

        :let d = {'a': 1, 'b': 2}
        :lockvar 1 d
        :call extend(d, {})
            E741: Value is locked: extend() argument
        :call extend(d, {'a': 99, 'b': 101}, 'keep')
            E741: Value is locked: extend() argument

      When "keep" is used, d2 items that would change existing items in
      d1 should silently be ignored.

      The lock detects a false positive.

      Fix: add a loop over d2 items to find an addition to d1, then
      report a lock error.  If d2 has NO additions to d1, then return
      without error.  In both cases, d1 remains unchanged.

      For this fix, the checks *could* also be integrated into the
      normal change loop.  No changes need to be avoided as "keep" is
      used together with a dict lock, preventing all changes (even if
      the dict lock does not protect against item changes -- "keep"
      already does that).


Here are some use cases that show that extend() is useful when mixing
item addition and change:
---------------------------------------------------------------------

These can be found in scripts on http://www.vim.org/scripts and in my
own scripts.

    - Config dict: override / augment defaults with user settings.

    - Option dict for a function: override / augment defaults with
      caller-supplied options.

    - l:self: initialize or update a dict function's data.

    - Associative cache: add items, or update already cached items.

    - Transactional update (e.g. at the end of a function): compute
      several values, then update / augment the target dict in a single
      transaction: extend(target, {'keyA': valA, 'keyB': valB, ...}).

Reasons to use extend() here:

    - extend() combines add-or-change operation with all-or-nothing.

      No additional save and rollback is needed to implement
      transactional item addition / update with extend().

      (Unless several dictionaries need to be updated within a single
      transaction.  But I see this happen less often, and even in this
      case extend()s can be used to simplify the code.)

    - extend() makes code more readable and simplifies it:

        - It summarizes updates in one place: a temp dict can be created
          from plain local variables (which have been set up during the
          execution of the function) just when needed in the extend().

        - Items can be initialized and updated by the same code.

    - extend() is more efficient than using :let in a loop.  E.g. when
      the source dict is prepared in some other software layer, and its
      contents needs to be integrated into a target dict.


> > This would be a different lock model.
> > 
> > 
> > The all-or-nothing semantics you and ZyX mention can be implemented for
> > extend(dict, dict2[, how]) as well.  But note:
> > 
> >     - extend(dict, dict2[, how]) currently already has several
> >       exit-on-error conditions:
> > 
> >         - adding a function to scopes l: or g: that conflicts with a
> >           builtin function
> > 
> >         - adding or replacing an item in a variable scope, when the key
> >           is not a valid variable name
> > 
> >         - changing an existing dict item while 'how' is 'error'
> > 
> >         - changing a locked dict item (since 7.4.698).
> > 
> >       For all-or-nothing semantics, all of these need to be moved into a
> >       separate checking loop.
> > 
> >       (This is the reason why I considered all-or-nothing semantics
> >       beyond the scope of my patch: the current implementation has
> >       exit-on-error semantics.)
> > 
> > 
> >     - A separate checking loop needs to repeat all dictionary lookups.
> >       For my tests, this made extend(dict, ...) take twice as much time
> >       to complete.
> 
> Yes, this would be the reason to only use this for error conditions that
> do not require looping over all the items (in the dict and in the
> arguments).
>
> >       :let d = {}
> >       :for i in range(10000) | let d[i] = 't'.i | endfor
> >       :for t in range(3)
> >       : let start = reltime()
> >       : for i in range(10000) | call extend(d, d) | endfor
> >       : echomsg reltimestr(reltime(start))
> >       :endfor
> > 
> >       Unpatched:
> > 
> >           3.255444
> >           3.243517
> >           3.248523
> > 
> >       Old patch (exit-on-error semantics):
> > 
> >           3.249932
> >           3.256253
> >           3.255438
> > 
> >       New patch with separate checking loop (see below):
> > 
> >           6.452288
> >           6.390474
> >           6.420924
> > 
> >       I am not sure if taking twice as long is a good tradeoff for
> >       having all-or-nothing semantics.
> 
> For things where the script writer made an obvious mistake, such as
> trying to overwrite a builtin function, the slowness is not good.
> For the cases where the dict would be halfway a change, it was already
> modified when an unexpected error condition is encountered, it would be
> worth it.

Yes.  So for all-or-nothing, only locks need to be checked before
entering the change loop that does the remaining checks.

A separate checking loop is still needed though, for all-or-nothing
behavior when checking locks.

    The limitation to only do some checks saves only a small fraction of
    the overall time, as most of the time is spent on loading an item
    into the CPU caches.  After that, additional checks are fast, they
    do not need to wait for cache loads.

    Additionally, when dict items are mostly contiguous in memory or can
    be accessed in a striding pattern (because they were added to the
    dict without interjecting memory allocations of random size, or
    memory deallocations), then automatic cache prefetching should pick
    up the next item while we do more checks on the current item.  This
    means that our bottleneck -- cache fill -- runs at top speed while
    we make more checks.

    But to find out how helpful the speedup by prefetching is in
    practice, we would need to find out how prevalent striding patterns
    in dicts are.

    However, even if the checks are not completely "hidden" by
    concurrent cache preloads, the cache load latency should already
    dwarf the time spent in checks (as mentioned above).

Anyhow, none of us saw a necessity to do the more esoteric checks in
all-or-nothing mode in the separate check loop.  This simplifies the
code when the separate check loop is only run under some conditions (as
in my patch below), because checks do not have to be done in both the
separate check loop and the change loop.


> It's a tough decision.  In those cases I prefer to leave it as it is.

I hope that the new patch below and the new test results will help.

> >     - In case of an out of memory error, extend() skips that item and
> >       continues with the next one (continue-after-error semantics).
> > 
> >       For OOM, all-or-nothing semantics can only be achieved by
> >       implementing a rollback, because OOM can only happen when we
> >       actually execute the changes.
> > 
> >       A rollback requires copies of the changed items.  Making them is
> >       relatively expensive.
> > 
> >       Note that list functions do not implement this: 'extend(list,
> >       list2, how)' and ':let list[i:j] = list2' and probably others do
> >       have exit-on-OOM semantics.
> > 
> >       One reason for not implementing all-or-nothing for OOM may be that
> >       OOM rarely happens: as discussed in another thread, users tend to
> >       terminate other processes before OOM (to avoid a slow system that
> >       is swapping or thrashing), or an OOM-killer may kill some other
> >       process to avoid OOM.
> > 
> >       It may not be worth the slowdown just to have a rollback on OOM.
> > 
> > An updated patch that implements all-or-nothing semantics in
> > extend(dict, dict2[, how]) (except for OOM) is at the end of this mail.
> > 
> > 
> > On another note, while looking at the 7.4.419 patch again, I realized
> > that it adds checks to ':unlet' for item locks before it deletes any
> > items.  This is not needed under the locking model described in ':he
> > lockvar': item locks protect against item change, not against item
> > deletion.
> > 
> > The list lock, which protects against deletion, is checked before
> > already.
> > 
> > Here is a patch that removes the unneeded item lock checks.  Note that
> > no changes to the tests are needed, as the list lock already makes sure
> > the items cannot be deleted.
> 
> So this makes extend() much slower.  Not sure what part of this I would
> include.

New test results suggest that extend() is not slowed down so much in
actual use.

For overlaps of 0% and 10% between source and target dict keys, the
measured lock check loop overhead averages around 17%:

    avg.:  17%  stddev: ± 16,   max.  40%.

Many use cases seem to have dict overlaps in this range.

For overlaps from 0% to 100%, the tests show an overhead of

    avg.:  34%  stddev: ± 27,   max. 100%.

The worst case for realistic, but uncommon use cases is 83% overhead
(for 1 million items in both the source dict and the target dict, and
100% overlap between them).

The above results include measurements for dictionaries with numbers as
values.  With longer, real-life strings as values, the overhead tends to
shrink considerably (see also below).

    Last minute addition: some applications use long strings as keys. I
    have no test results, but the overhead of a lock check loop should
    grow in these cases: the dict lookup in both loops (lock check and
    change loop) needs to compare the keys, and long keys make this a
    more dominant factor of the overall time, making the time spent in
    item addition and value copying relatively smaller, so that both
    loops become more similar in running time, and the overhead gets
    closer to 100%.

    But this also seems to be used more for temporary data (counting
    string occurences, or an intermediate step when making a list of
    unique strings), where locks are less likely to be used.


Here are the test results for the lock check loop overhead for different
dict sizes and different overlaps.  These tests use the new patch (see
below):

    Using short strings (eval('"t".i')) as dict values:

        overlap between dicts:  100%     90%     50%     10%      0%  
        ----------------------------------------------------------------
        10000000 items           76.0%   65.8%   45.4%   39.9%   crash
         1000000 items           59.5%   58.4%   28.9%   24.7%   18.1%
          100000 items           66.6%   57.2%   43.6%   15.3%    9.5%
           10000 items           52.3%   26.6%   31.9%   23.9%   13.6%
            1000 items           38.7%   34.2%   29.6%   16.3%   16.3%
             100 items            6.4%   28.4%   31.5%   -2.7%  -16.9%
              10 items           10.0%    8.0%    8.1%   13.0%   21.2%

    Using numbers as dict values:

        overlap between dicts:  100%     90%     50%     10%      0%  
        ----------------------------------------------------------------
        10000000 items          100.3%   84.5%   53.7%   39.4%   crash
         1000000 items           82.6%   77.4%   28.5%   26.4%   23.6%
          100000 items           80.7%   79.2%   64.0%   16.6%   17.2%
           10000 items           69.5%   71.3%   58.0%   38.2%   36.2%
            1000 items           76.3%   61.1%   42.3%   34.7%   18.6%
             100 items           41.8%   32.4%   30.5%  -12.8%  -14.7%
              10 items            0.5%    5.1%   15.0%   -5.2%   -0.2%


    Note: the source dict and the target dict have the same preloaded
    size in each test.

    Disclaimer: test results varied to some extent, although I tried to
    minimize disturbing influences and removed outliers.  See also the
    attached test scripts.  Also, for various reasons, I could not run
    the tests on a dedicated test system.

    There is also a crash for 10 million items and 0% overlap.  So far,
    I did not put sufficient time into debugging that crash.  But it
    happens in both the patched vim and the unpatched vim.

Lock test loop overhead becomes smaller when the overlap between source
dict and target dict becomes smaller.

    With more items to be added to the target dict, more new items need
    to be allocated and filled (instead of just copying the item
    values).  The time spent in the creation of new items becomes more
    dominant, so that a smaller share of the total time is spent in the
    lock test loop.

Lock test loop overhead also shrinks when the values of dict items use
more memory, e.g. for longer strings.

    Similarly, copying larger values makes the time spent for that more
    dominant, so that a smaller portion of the total time is spent in
    the lock test loop.

    The worst lock check loop overheads appear when the values are plain
    numbers.

It also seems that lock test loop overhead is smaller when the looked up
items "just fit" into a CPU cache layer, while the change loop is "too
large" for that CPU cache layer, because it accesses more memory to add
items and to copy values between the dictionaries.

    In these cases, the lock check loop can benefit from CPU cache
    prefetches because it does not kick out prefetched cache lines
    before it needs them, while in the change loop the additional memory
    accesses contribute to kicking out more prefetched cache lines
    before they can be used.

    This seems to happen for different dict sizes, whenever the data
    needed in the lock check loop "just fits" into one CPU cache at one
    level in the cache hierarchy, while item creation and value copying
    in the change loop produces more cache collisions in that cache.


Note that my computer took about 60 seconds to create a dictionary
with 10 million items in an item-by-item add loop, and about 6 seconds
for 1 million items.  This seems to be the limit for useable dict sizes.

Please see the full test results in the attached file.

Also, the test scripts are in another attachment.  The scripts also
create a test log with more detailed results.


New patch:
----------

This patch adds a separate lock check loop to extend().  The loop is
only executed when needed.

When no locks are used on the target dict, it has near zero cost.
Otherwise, the slowdown is shown in the above test results.

The lock check loop only checks for lock violations and when "error"
mode is also used for item changes, to report the expected error.

Other error conditions are only checked later, in the change loop.


To quickly find out when the lock check loop can be skipped, a counter
for locked items is kept in each dictionary.  When the counter is
non-zero or the dict lock is set, then the lock check loop is executed.

    For variable scope dicts (v:, l:, ...), the counter is initialized
    to a base count of 1, so they are always checked.  These dicts can
    contain items with DI_FLAGS_RO/RO_SBX (which are not individually
    counted).

    User locks (VAR_LOCKED) can only be added or removed by
    :(un)lockvar.  Counting is done in these commands.

    All functions that add items to a dict already make sure that the
    locks on the added items are cleared.  (All functions that
    ultimately call hash_add() or hash_add_item() on a dict's hash.)
    So there is no need to count locks there.

    The counter also works when the dict is referenced from another
    variable, as the same dict header (dict_T) is referenced.


    Note that VAR_FIXED protects only against item deletion.  This is
    not relevant for extend().  And deletion with 'remove(dict, key)'
    and ':unlet dict.key ...' only removes one item from a dict at a
    time (:unlet essentially does exit-on-error when more than one
    'dict.key' is given to it), and 'filter(dict, "v:key != key")' does
    exit-on-error.

    So no need to count VAR_FIXED.


---
 src/eval.c            |   43 ++++++++++++++++++++++++++++++++++++++-----
 src/structs.h         |    1 +
 src/testdir/test55.in |   24 ++++++++++++++++++++++++
 src/testdir/test55.ok |    7 +++++++
 4 files changed, 70 insertions(+), 5 deletions(-)

Index: b/src/eval.c
===================================================================
--- a/src/eval.c        2015-06-21 15:43:04.278647940 +0200
+++ b/src/eval.c        2015-06-22 23:57:55.304056700 +0200
@@ -3810,8 +3810,11 @@ do_lock_var(lp, name_end, deep, lock)
        /* (un)lock a List item. */
        item_lock(&lp->ll_li->li_tv, deep, lock);
     else
+    {
        /* un(lock) a Dictionary item. */
        item_lock(&lp->ll_di->di_tv, deep, lock);
+       lp->ll_dict->dv_ilocked += lock ? 1 : -1;
+    }
 
     return ret;
 }
@@ -3879,6 +3882,7 @@ item_lock(tv, deep, lock)
                        {
                            --todo;
                            item_lock(&HI2DI(hi)->di_tv, deep - 1, lock);
+                           d->dv_ilocked += lock ? 1 : -1;
                        }
                    }
                }
@@ -7195,6 +7199,7 @@ dict_alloc()
 
        hash_init(&d->dv_hashtab);
        d->dv_lock = 0;
+       d->dv_ilocked = 0;
        d->dv_scope = 0;
        d->dv_refcount = 0;
        d->dv_copyID = 0;
@@ -10509,6 +10514,37 @@ dict_extend(d1, d2, action)
     int                todo;
     char_u     *arg_errmsg = (char_u *)N_("extend() argument");
 
+    /* When d1 has locks, first check for lock breaks and overwrite errors.  */
+    if (d1->dv_ilocked || d1->dv_lock & (VAR_LOCKED | VAR_FIXED))
+    {
+       todo = (int)d2->dv_hashtab.ht_used;
+       for (hi2 = d2->dv_hashtab.ht_array; todo > 0; ++hi2)
+       {
+           if (!HASHITEM_EMPTY(hi2))
+           {
+               --todo;
+               di1 = dict_find(d1, hi2->hi_key, -1);
+               if (di1 == NULL)
+               {
+                   if (tv_check_lock(d1->dv_lock, arg_errmsg, TRUE))
+                       return;
+               }
+               else if (*action == 'e')
+               {
+                   EMSG2(_("E737: Key already exists: %s"), hi2->hi_key);
+                   return;
+               }
+               else if (*action == 'f' && HI2DI(hi2) != di1)
+               {
+                   if (tv_check_lock(di1->di_tv.v_lock, arg_errmsg, TRUE)
+                         || var_check_ro(di1->di_flags, arg_errmsg, TRUE))
+                       return;
+               }
+           }
+       }
+    }
+
+    /* Check for errors and make changes.  No need to check locks again.  */
     todo = (int)d2->dv_hashtab.ht_used;
     for (hi2 = d2->dv_hashtab.ht_array; todo > 0; ++hi2)
     {
@@ -10542,9 +10578,6 @@ dict_extend(d1, d2, action)
            }
            else if (*action == 'f' && HI2DI(hi2) != di1)
            {
-               if (tv_check_lock(di1->di_tv.v_lock, arg_errmsg, TRUE)
-                     || var_check_ro(di1->di_flags, arg_errmsg, TRUE))
-                   break;
                clear_tv(&di1->di_tv);
                copy_tv(&HI2DI(hi2)->di_tv, &di1->di_tv);
            }
@@ -10608,8 +10641,7 @@ f_extend(argvars, rettv)
 
        d1 = argvars[0].vval.v_dict;
        d2 = argvars[1].vval.v_dict;
-       if (d1 != NULL && !tv_check_lock(d1->dv_lock, arg_errmsg, TRUE)
-               && d2 != NULL)
+       if (d1 != NULL && d2 != NULL)
        {
            /* Check the third argument. */
            if (argvars[2].v_type != VAR_UNKNOWN)
@@ -21330,6 +21362,7 @@ init_var_dict(dict, dict_var, scope)
 {
     hash_init(&dict->dv_hashtab);
     dict->dv_lock = 0;
+    dict->dv_ilocked = 1;      /* must expect DI_FLAGS_RO* on items */
     dict->dv_scope = scope;
     dict->dv_refcount = DO_NOT_FREE_CNT;
     dict->dv_copyID = 0;
Index: b/src/structs.h
===================================================================
--- a/src/structs.h     2015-06-21 15:42:20.922696781 +0200
+++ b/src/structs.h     2015-06-22 23:43:25.029495192 +0200
@@ -1219,6 +1219,7 @@ struct dictvar_S
     int                dv_refcount;    /* reference count */
     int                dv_copyID;      /* ID used by deepcopy() */
     hashtab_T  dv_hashtab;     /* hashtab that refers to the items */
+    long_u     dv_ilocked;     /* locked items + 1 if DI_FLAGS_RO* possible */
     dict_T     *dv_copydict;   /* copied dict used by deepcopy() */
     dict_T     *dv_used_next;  /* next dict in used dicts list */
     dict_T     *dv_used_prev;  /* previous dict in used dicts list */
Index: b/src/testdir/test55.in
===================================================================
--- a/src/testdir/test55.in     2015-06-21 15:42:20.922696781 +0200
+++ b/src/testdir/test55.in     2015-06-21 20:26:18.759475460 +0200
@@ -408,6 +408,30 @@ let l = [0, 1, 2, 3]
 :endtry
 :$put =string(d)
 :"
+:$put ='extend() after lock on dict:'
+:unlet! d
+:let d = {'a': 99, 'b': 100, 'd': 101}
+:lockvar 1 d
+:$put =string(extend(d, {'a': 123}))
+:try
+:  $put =string(extend(d, {'x': 'new'}))
+:  $put ='wrong: did extend() with new item'
+:catch
+:  $put =v:exception[:14]
+:endtry
+:$put =string(d)
+:unlet! d2
+:let d2 = {'a': 789, 'b': 42, 'c': 'new', 'd': 52}
+:$put ='key order is ok to recognize error modes: '.(keys(d2)[0] != 
'c').(keys(d2)[-1] != 'c')
+:try
+:  $put =string(extend(d, d2))
+:  $put ='wrong: did extend() with mixed-in new item'
+:catch
+:  $put =v:exception[:14]
+:endtry
+:$put =string(d)
+:unlet d d2
+:"
 :$put ='No remove() of write-protected scope-level variable:'
 :fun! Tfunc(this_is_a_loooooooooong_parameter_name)
 :  try
Index: b/src/testdir/test55.ok
===================================================================
--- a/src/testdir/test55.ok     2015-06-21 15:42:20.922696781 +0200
+++ b/src/testdir/test55.ok     2015-06-21 20:26:18.763475458 +0200
@@ -138,6 +138,13 @@ did map()
 No extend() after lock on dict item:
 Vim(put):E741: 
 {'a': 99, 'b': 100}
+extend() after lock on dict:
+{'a': 123, 'b': 100, 'd': 101}
+Vim(put):E741: 
+{'a': 123, 'b': 100, 'd': 101}
+key order is ok to recognize error modes: 11
+Vim(put):E741: 
+{'a': 123, 'b': 100, 'd': 101}
 No remove() of write-protected scope-level variable:
 Vim(put):E795: 
 No extend() of write-protected scope-level variable:

-- 
Olaf Dabrunz (oda <at> fctrace.org)

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Measurement of the overhead of the lock checking loop used in 'rollback'
mode (all-or-nothing) over 'stop' mode (exit-on-error) in vim's extend()
function.
------------------------------------------------------------------------

Test notes:
-----------

    - Tests for smaller dictionaries are done more often, keeping the
      product (test iterations * dict size) constant: 1000 tests * 10000
      items, 10000 test * 1000 items, ...

      This keeps the running time of a test loop large enough to be
      accurately measured.

    - There is some variance between test results on my test system,
      probably caused by L2/L3 cache utilization by other processes (on
      other cores / HTs) and by CPU frequency throttling due to heat (as
      done by the systems management BIOS).

      To minimize the influence of disturbances, each test loop was
      repeated 11 times and the results were averaged.

      Before averaging, outliers were removed: the measurements beyond a
      factor of 1.1 below the end of the first quarter of test results
      and above the start of the last quarter of test results.

      With 11 tests, two tests lie in the first quantile and two lie in
      the last quantile, and these can possibly be recognized as
      outliers.  In pratice, this seems sufficient to detect and remove
      outliers.

    - To derive the number of cache events caused by extend(), cache
      overhead for everything but the extend() is measured separately
      and subtracted from cache event counts measured when the full test
      is run (including extend()).

    - Tests for dict sizes of 1000 items or above measure the time spent
      in extend() directly.

      This prevents influences from variations in overhead (most notably
      in copy()) from distorting the measurements for extend().

      When dict sizes become smaller, the tests appear to become less
      susceptible to variations.

        Possibly because the whole test becomes more likely to run
        completely from L2 or L1 cache, reducing or eliminating memory
        bus and shared cache contention from other processes.

      Also, for small dict sizes, the running time spent in a single
      extend() became too small to measure it directly with high enough
      accuracy.


Testing the "check-when-needed" patch:
--------------------------------------

To test the overhead of the lock check loop, the vim test script creates
and locks a special item when needed to activate the lock test loop in
extend().


Using strings (eval('"t".i')) as dict values:

        overlap between dicts:  100%     90%     50%     10%      0%  
        ------------------------------------------------------------------
        10000000 items           76.0%   65.8%   45.4%   39.9%   crash
            L1 dcache-loads    -128.0%   33.2%  -37.1%   -0.6%   crash
            L1 dcache-stores     34.6%   27.4%   13.0%    9.9%   crash
            L1 dcache-l+s       242.2%   31.0%  -23.7%    3.4%   crash

        1000000 items            59.5%   58.4%   28.9%   24.7%   18.1%
            L1 dcache-loads      38.0%   33.6%   19.1%   13.2%   10.6%
            L1 dcache-stores     33.4%   28.5%   14.9%   10.9%    8.6%
            L1 dcache-l+s        36.4%   31.7%   17.5%   12.3%    9.8%

        100000 items             66.6%   57.2%   43.6%   15.3%    9.5%
            L1 dcache-loads      37.9%   32.1%   21.0%   12.9%   11.6%
            L1 dcache-stores     36.5%   29.4%   19.9%   11.5%   10.5%
            L1 dcache-l+s        37.4%   31.1%   20.6%   12.3%   11.2%

        10000 items              52.3%   26.6%   31.9%   23.9%   13.6%
            L1 dcache-loads      37.3%   30.4%   19.6%   12.5%   10.9%
            L1 dcache-stores     34.9%   26.7%   16.7%   11.0%    9.7%
            L1 dcache-l+s        36.4%   29.0%   18.4%   11.9%   10.4%

        1000 items               38.7%   34.2%   29.6%   16.3%   16.3%
            L1 dcache-loads      39.0%   32.7%   21.4%   15.4%   13.4%
            L1 dcache-stores     36.4%   29.3%   18.8%   16.5%   14.4%
            L1 dcache-l+s        38.1%   31.4%   20.4%   15.8%   13.8%

        100 items                 6.4%   28.4%   31.5%   -2.7%  -16.9%
            L1 dcache-loads      36.8%   33.8%   20.4%    6.4%    2.1%
            L1 dcache-stores     36.3%   31.7%   19.1%    4.3%   -2.0%
            L1 dcache-l+s        36.6%   33.0%   19.9%    5.6%    0.6%

        10 items                 10.0%    8.0%    8.1%   13.0%   21.2%
            L1 dcache-loads      45.1%   43.0%   27.5%   20.6%   18.7%
            L1 dcache-stores     47.9%   45.7%   27.5%   20.3%   20.3%
            L1 dcache-l+s        46.1%   44.0%   27.5%   20.5%   19.3%



    No priming with copy() (not needed as overlap is 100%):
    -------------------------------------------------------

        overlap between dicts:  100%  
        ----------------------------------
        10000000 items           58.7%
            L1 dcache-loads      40.3%
            L1 dcache-stores     34.6%
            L1 dcache-l+s        38.2%

        1000000 items            50.5%
            L1 dcache-loads      38.2%
            L1 dcache-stores     33.7%
            L1 dcache-l+s        36.6%

        100000 items             67.2%
            L1 dcache-loads      37.0%
            L1 dcache-stores     33.4%
            L1 dcache-l+s        35.7%

        10000 items              44.7%
            L1 dcache-loads      36.1%
            L1 dcache-stores     33.4%
            L1 dcache-l+s        35.1%

        1000 items               36.0%
            L1 dcache-loads      36.8%
            L1 dcache-stores     33.3%
            L1 dcache-l+s        35.5%

        100 items                17.9%
            L1 dcache-loads      34.1%
            L1 dcache-stores     32.6%
            L1 dcache-l+s        33.5%

        10 items                 20.6%
            L1 dcache-loads      24.9%
            L1 dcache-stores     25.4%
            L1 dcache-l+s        25.1%


        extend dict with itself:
        ---------------------------------

        10000000 items          117.5%
            L1 dcache-loads      98.0%
            L1 dcache-stores    100.0%
            L1 dcache-l+s        98.6%

        1000000 items           107.9%
            L1 dcache-loads      98.9%
            L1 dcache-stores    101.3%
            L1 dcache-l+s        99.7%

        100000 items             42.5%
            L1 dcache-loads      97.6%
            L1 dcache-stores     99.0%
            L1 dcache-l+s        98.0%

        10000 items             128.2%
            L1 dcache-loads      98.5%
            L1 dcache-stores    100.5%
            L1 dcache-l+s        99.2%

        1000 items               87.5%
            L1 dcache-loads      97.5%
            L1 dcache-stores     99.1%
            L1 dcache-l+s        98.0%

        100 items                41.7%
            L1 dcache-loads      90.5%
            L1 dcache-stores     91.9%
            L1 dcache-l+s        91.0%

        10 items                  1.1%
            L1 dcache-loads      53.8%
            L1 dcache-stores     55.0%
            L1 dcache-l+s        54.2%


Same tests with numbers as dict values:

        overlap between dicts:  100%     90%     50%     10%      0%  
        ------------------------------------------------------------------
        10000000 items          100.3%   84.5%   53.7%   39.4%   crash
            L1 dcache-loads      92.5%   68.0%   32.7%   19.7%   crash
            L1 dcache-stores     89.6%   61.0%   24.7%   15.3%   crash
            L1 dcache-l+s        91.5%   65.5%   29.5%   17.9%   crash

        1000000 items            82.6%   77.4%   28.5%   26.4%   23.6%
            L1 dcache-loads      81.2%   68.6%   29.3%   19.8%   16.6%
            L1 dcache-stores     75.4%   59.3%   21.8%   15.9%   13.4%
            L1 dcache-l+s        79.2%   65.2%   26.2%   18.2%   15.3%

        100000 items             80.7%   79.2%   64.0%   16.6%   17.2%
            L1 dcache-loads      90.4%   68.0%   35.7%   19.1%   16.7%
            L1 dcache-stores     91.7%   67.1%   33.6%   16.5%   14.1%
            L1 dcache-l+s        90.8%   67.6%   34.9%   18.0%   15.6%

        10000 items              69.5%   71.3%   58.0%   38.2%   36.2%
            L1 dcache-loads      85.7%   61.2%   33.0%   19.4%   17.1%
            L1 dcache-stores     83.7%   53.3%   28.5%   17.0%   14.9%
            L1 dcache-l+s        85.0%   58.2%   31.2%   18.5%   16.2%

        1000 items               76.3%   61.1%   42.3%   34.7%   18.6%
            L1 dcache-loads      82.3%   65.7%   35.4%   22.7%   16.4%
            L1 dcache-stores     76.2%   56.5%   30.9%   22.2%   14.6%
            L1 dcache-l+s        80.1%   62.2%   33.7%   22.5%   15.7%

        100 items                41.8%   32.4%   30.5%  -12.8%  -14.7%
            L1 dcache-loads      89.1%   69.9%   35.8%    6.1%    4.2%
            L1 dcache-stores     88.0%   66.4%   33.7%    2.7%    1.7%
            L1 dcache-l+s        88.7%   68.5%   35.0%    4.9%    3.3%

        10 items                  0.5%    5.1%   15.0%   -5.2%   -0.2%
            L1 dcache-loads      79.5%   72.6%   42.9%   29.0%   23.5%
            L1 dcache-stores     85.9%   78.5%   43.6%   28.5%   25.5%
            L1 dcache-l+s        81.9%   74.7%   43.2%   28.8%   24.3%


    No priming with copy() (not needed as overlap is 100%):
    -------------------------------------------------------

        overlap between dicts:  100%  
        ----------------------------------
        10000000 items           89.0%
            L1 dcache-loads      83.1%
            L1 dcache-stores     76.2%
            L1 dcache-l+s        80.8%

        1000000 items            87.4%
            L1 dcache-loads      84.7%
            L1 dcache-stores     79.5%
            L1 dcache-l+s        82.9%

        100000 items             89.4%
            L1 dcache-loads      87.6%
            L1 dcache-stores     83.9%
            L1 dcache-l+s        86.3%

        10000 items             110.3%
            L1 dcache-loads      86.7%
            L1 dcache-stores     83.3%
            L1 dcache-l+s        85.5%

        1000 items               78.7%
            L1 dcache-loads      86.0%
            L1 dcache-stores     82.3%
            L1 dcache-l+s        84.7%

        100 items                44.6%
            L1 dcache-loads      79.4%
            L1 dcache-stores     77.0%
            L1 dcache-l+s        78.5%

        10 items                 19.2%
            L1 dcache-loads      46.4%
            L1 dcache-stores     47.8%
            L1 dcache-l+s        46.9%


        extend dict with itself:
        ---------------------------------

        10000000 items           98.1%
            L1 dcache-loads      96.3%
            L1 dcache-stores     97.2%
            L1 dcache-l+s        96.6%

        1000000 items            96.1%
            L1 dcache-loads      97.9%
            L1 dcache-stores     99.4%
            L1 dcache-l+s        98.4%

        100000 items            114.2%
            L1 dcache-loads      98.3%
            L1 dcache-stores    100.1%
            L1 dcache-l+s        98.9%

        10000 items             154.1%
            L1 dcache-loads      98.0%
            L1 dcache-stores    100.1%
            L1 dcache-l+s        98.7%

        1000 items              109.1%
            L1 dcache-loads      97.6%
            L1 dcache-stores     99.3%
            L1 dcache-l+s        98.2%

        100 items                40.0%
            L1 dcache-loads      90.5%
            L1 dcache-stores     91.8%
            L1 dcache-l+s        90.9%

        10 items                  0.9%
            L1 dcache-loads      53.7%
            L1 dcache-stores     54.9%
            L1 dcache-l+s        54.1%

testscripts-for-vim-extend-error-mode.tar
Description: Unix tar archive

Re: [patch] extend() does not change an item in a locked dictionary

Raspunde prin e-mail lui