Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-29 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Sep 29, 2020 at 09:56:41AM +, ikod via Digitalmars-d-learn wrote:
> Hello,
> 
> Sorry if I unintentionally hurt anybody in this thread.
> I'll try to implement sane and correct iteration behavior for AA
> without noticeable performance loss, and propose it if I succeed.

No feelings were hurt, don't worry. :-)  I just wanted to point out that
it's quite a tricky issue, and solving it may not be quite as easy as it
appears.  But if you manage to break new ground in this area, I'd be
very interested to hear!


T

-- 
It's amazing how careful choice of punctuation can leave you hanging:


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-29 Thread ikod via Digitalmars-d-learn

Hello,

Sorry if I unintentionally hurt anybody in this thread.
I'll try to implement sane and correct iteration behavior for AA 
without noticeable performance loss, and propose it if I succeed.


Igor


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-29 Thread Mike Parker via Digitalmars-d-learn

On Sunday, 27 September 2020 at 13:02:04 UTC, Per Nordlöw wrote:
Is it safe to remove AA-elements from an `aa` I'm iterating 
over via aa.byKeyValue?


I'm currently doing this:

foreach (ref kv; aa.byKeyValue)
{
if (pred(kv.key))
aa.remove(kv.key); // ok?
}
if (aa.length == 0)
aa = null;

Is there a better way?


If you're okay with the allocation that comes with it:

foreach(k; aa.keys)
{
if(pred(key)) aa.remove(key);
}





Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread Steven Schveighoffer via Digitalmars-d-learn

On 9/28/20 1:18 PM, ikod wrote:

On Monday, 28 September 2020 at 14:58:15 UTC, Steven Schveighoffer wrote:

On 9/27/20 4:43 PM, Per Nordlöw wrote:

On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
No.  Modifying a container while iterating over it is, in general, a 
bad idea (unless the container is designed to be


So the major problem of removing keys while iterating is that the AA 
can decide to rehash and shrink. If it does this, your iterator is 
invalidated -- you might skip elements that are




How do you "detect" this at compile time?


I think idea was to have some attributes that collection programmer may 
attach to implementation, like enums: SafeToRemoveOnIteration, 
SafeToAddOnIteration and so on, which may be checked at compile time 
when foreach is handled by compiler. Then in the foreach body compiler 
can refuse to call some unsafe methods. I do not know if it is worth of 
implementation, just this this was an idea.


auto sneaky = aa;
foreach(k, v; aa)
{
   sneaky.remove(k); // oops
}

It doesn't work. You can't restrict this at compile time. Possibly 
runtime, but not compile time.






One could write a specific function to iterate and remove. I


This looks like dead end to me, as you may not only remove items from 
arbitrary positions while iterating over collection, but also insert 
items. Also this can happens not only inside foreach body, but in other 
kinds of iteration. And also there can be nested iterations over same 
collection.


What do you mean dead end? You provide what is supported through a 
controlled interface. It works well, I wrote and used it in 
dcollections. Using the standard insertion and removal functions is not 
allowed (at least without invalidating the range).


If you want insertion, then potentially you could expose those methods 
via the specific iteration construct as well. But you must be prepared 
to do some really funky things.


If we are talking about AAs, insertion may end up expanding the bucket 
list, which means rehashing. That leads to iterating over the same keys 
again. Unless you do some kind of copy-on-write, or combination of 
iterated container + true container? I don't know.


However, deletion is possible (either delay the rehash until after, or 
don't do a rehash during that iteration) via the iteration construct 
(range). You still need to disallow removal via the original type, or 
you might get a rehash.


-Steve


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Sep 28, 2020 at 08:04:49PM +, ikod via Digitalmars-d-learn wrote:
> On Monday, 28 September 2020 at 19:18:20 UTC, H. S. Teoh wrote:
[...]
> > The problem with arbitrary, unrestricted modification of a container
> > while iterating over it, is that it inevitably leads to
> > counterintuitive results.
> 
> Thanks for your reply, and sorry if I was not clear, but I meant AA.
> AA is unordered container, so for me intuitive behavior for mutations
> visibility during iteration is next:
> 
> 1) you must not see removed keys. Let our keys are (unordered) A D C E
> F, you stay on D and remove E. Then E must not be seen on any future
> iteration steps.
> 
> 2) you may or may not see any inserted keys. As AA is unordered
> container there is no reason to expect anything about position of key
> you just inserted - it can fall before or after current iteration
> position, so implementation is free to show inserted keys or not. I
> prefer not to see new keys.
> 
> 3) you expect to visit all not removed keys exactly once.
> 
> Is it sane?
[...]

It sounds reasonable, but it does constrain your implementation. For
example, (3) is likely to break during a rehash. But not rehashing may
lead to other problems, depending on your hashing implementation.


T

-- 
Time flies like an arrow. Fruit flies like a banana.


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread ikod via Digitalmars-d-learn

On Monday, 28 September 2020 at 19:18:20 UTC, H. S. Teoh wrote:
On Mon, Sep 28, 2020 at 05:18:41PM +, ikod via 
Digitalmars-d-learn wrote:
On Monday, 28 September 2020 at 14:58:15 UTC, Steven 
Schveighoffer wrote:

[...]

> One could write a specific function to iterate and remove. I

This looks like dead end to me, as you may not only remove 
items from arbitrary positions while iterating over 
collection, but also insert items.  Also this can happens not 
only inside foreach body, but in other kinds of iteration. And 
also there can be nested iterations over same collection.


The problem with arbitrary, unrestricted modification of a 
container while iterating over it, is that it inevitably leads 
to counterintuitive results.


Thanks for your reply, and sorry if I was not clear, but I meant 
AA. AA is unordered container, so for me intuitive behavior for 
mutations visibility during iteration is next:


1) you must not see removed keys. Let our keys are (unordered) A 
D C E F, you stay on D and remove E. Then E must not be seen on 
any future iteration steps.


2) you may or may not see any inserted keys. As AA is unordered 
container there is no reason to expect anything about position of 
key you just inserted - it can fall before or after current 
iteration position, so implementation is free to show inserted 
keys or not. I prefer not to see new keys.


3) you expect to visit all not removed keys exactly once.

Is it sane?



For example, suppose you have a container containing the 
elements A, B, C, D, E, and you're iterating over it in that 
order.


1) Suppose you're at element C, and you decide that you need to 
insert F. Then there's the question: should F be included in

...


2) Suppose you're at element C, and you decide to delete D. 
"Obviously", the current iteration should not include D. Or

...

Sure, everything you said is correct for ordered collection.


Also, what if the act of deleting D requires an internal 
reorganization of the container?  If this reorg changes the 
iteration order, how should the current iteration proceed?


Intuitive behavior should follow mentioned three rules regardless 
of internal implementation.





Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Sep 28, 2020 at 05:18:41PM +, ikod via Digitalmars-d-learn wrote:
> On Monday, 28 September 2020 at 14:58:15 UTC, Steven Schveighoffer wrote:
[...]
> > One could write a specific function to iterate and remove. I
> 
> This looks like dead end to me, as you may not only remove items from
> arbitrary positions while iterating over collection, but also insert
> items.  Also this can happens not only inside foreach body, but in
> other kinds of iteration. And also there can be nested iterations over
> same collection.

The problem with arbitrary, unrestricted modification of a container
while iterating over it, is that it inevitably leads to counterintuitive
results.

For example, suppose you have a container containing the elements A, B,
C, D, E, and you're iterating over it in that order.

1) Suppose you're at element C, and you decide that you need to insert
F. Then there's the question: should F be included in the current
iteration or not?  What if F sorts before C in the current sort order?
What if F sorts after C?  Should the two cases behave similarly (i.e.,
new elements consistently get included in the current iteration, or they
consistently don't get included in the current iteration)?  What if the
act of inserting F requires an internal reordering of elements in the
container?

2) Suppose you're at element C, and you decide to delete D.
"Obviously", the current iteration should not include D. Or should it?
If you delete A while you're at C, you've already iterated over A, so
there's an inconsistency: deleted elements sometimes get included in the
current iteration, sometimes not.  The problem is, which case is the
"correct" one depends on the user's purpose, which is information that
the container may not have.

Also, what if the act of deleting D requires an internal reorganization
of the container?  If this reorg changes the iteration order, how should
the current iteration proceed? According to the old order, or the new
order?  Note that proceeding with the new order may lead to
counterintuitive results: suppose the new order becomes E -> C -> B ->
A.  That means the current iteration will proceed with B and A, which
have already been encountered before, and skips E.

However, proceeding with the old order may be inefficient: it may
require allocating extra storage to preserve the current iteration order
because the reorganized container no longer has that information.  Or it
may require the container itself to maintain extra information in order
to retain the current iteration order.

3) Suppose you're at element A, and you decide to insert F, with the
resulting order A -> B -> C -> D -> E -> F.  So you proceed as usual.
But at B, you decide to delete F, but that causes the container to
reorganize itself to B -> A -> C -> D -> E.  So in the next iteration
you encounter A again, and insert F again, which causes the container to
reorganize itself to A -> B -> C -> D -> E -> F.  So you get stuck in a
loop between A and B, and termination is no longer guaranteed in
iteration.

Basically, the whole concept of iteration is undermined when the
container can mutate while you're iterating.  It leads to inconsistent
behaviour -- sometimes new elements show up in the current iteration,
sometimes they don't; or some elements may get visited multiple times or
missed altogether -- and iteration may not terminate if your sequence of
operations interacts badly with the container's internal
reorganizations.

Furthermore, trying to work around this by postponing container
reorganizations may lead to other problems, like container inconsistency
(some data structures require immediate reorganization in order to
maintain correctness, etc.).

//

The only way to support container mutation while iterating over it, is
if (1) the allowed operations is constrained (e.g., you can only delete
the current element, not any arbitrary element), and (2) the container
itself must explicitly support such an operation (otherwise deleting the
current element may lead to premature termination of the current
iteration, or errors like dangling pointers or some compromise of
container consistency).

Mixing iteration with arbitrary, unrestricted container mutation is a
road that leads to madness.


T

-- 
Recently, our IT department hired a bug-fix engineer. He used to work for 
Volkswagen.


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread ikod via Digitalmars-d-learn
On Monday, 28 September 2020 at 14:58:15 UTC, Steven 
Schveighoffer wrote:

On 9/27/20 4:43 PM, Per Nordlöw wrote:

On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
No.  Modifying a container while iterating over it is, in 
general, a bad idea (unless the container is designed to be


So the major problem of removing keys while iterating is that 
the AA can decide to rehash and shrink. If it does this, your 
iterator is invalidated -- you might skip elements that are

...


How do you "detect" this at compile time?


I think idea was to have some attributes that collection 
programmer may attach to implementation, like enums: 
SafeToRemoveOnIteration, SafeToAddOnIteration and so on, which 
may be checked at compile time when foreach is handled by 
compiler. Then in the foreach body compiler can refuse to call 
some unsafe methods. I do not know if it is worth of 
implementation, just this this was an idea.




One could write a specific function to iterate and remove. I


This looks like dead end to me, as you may not only remove items 
from arbitrary positions while iterating over collection, but 
also insert items. Also this can happens not only inside foreach 
body, but in other kinds of iteration. And also there can be 
nested iterations over same collection.


did it for dcollections (this was actually a very important 
functionality that I missed from C++ std::map, which is why I 
created dcollections in the first place).


I don't think it's worth worrying about this at compile time, 
or even at runtime. Just identify that if you do it, you are 
subject to peculiarities.


Yes, compile-time check is just nice option.



Then, we can introduce a specific mechanism to do this (total 
strawman, have no idea what the best thing looks like):




I have an idea on how to implement this, trading up memory for 
performance in some rare cases. Just need some time to estimate 
how it works.


Igor


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread Steven Schveighoffer via Digitalmars-d-learn

On 9/27/20 4:43 PM, Per Nordlöw wrote:

On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
No.  Modifying a container while iterating over it is, in general, a 
bad idea (unless the container is designed to be used that way, but 
even then, such removal is generally restricted), because it often 
leads to highly counterintuitive results.  In the case of AA's, 
removing an element may lead to a rehashing, which reorders elements, 
and your iteration may miss some elements or repeat some elements or 
terminate prematurely. Even without a rehashing, you may encounter 
inconsistent behaviours, like some elements going "missing".


I believe it's high time we start thinking about detecting these 
violations at compile-time. I recall it's in the spec somewhere so we 
should start a deprecation process at least for AAs.


So the major problem of removing keys while iterating is that the AA can 
decide to rehash and shrink. If it does this, your iterator is 
invalidated -- you might skip elements that are actually in the AA, and 
you might encounter elements that have already been iterated. 
Essentially, the ordering has completely been shuffled. Aside from that, 
there's the performance aspect that you have to re-run the hash lookup 
for no reason (you literally have a pointer to the element you are 
iterating).


How do you "detect" this at compile time?

One could write a specific function to iterate and remove. I did it for 
dcollections (this was actually a very important functionality that I 
missed from C++ std::map, which is why I created dcollections in the 
first place).


I don't think it's worth worrying about this at compile time, or even at 
runtime. Just identify that if you do it, you are subject to peculiarities.


Then, we can introduce a specific mechanism to do this (total strawman, 
have no idea what the best thing looks like):


auto r = aa.byKeyValuePurging;
while(!r.empty)
{
   if(shouldPurge(r.front.key, r.front.value)) r.popFrontAndDelete;
   else r.popFront;
}

where popFrontAndDelete will move to the next element, remove the prior 
element, and ensure that a rehash does not occur.


Again, this is not a formal proposal. Just something like this could be 
possible.


In dcollections, I used a trick of opApply to provide the boolean of 
whether to remove as a parameter to foreach:


foreach(k, v, ref doPurge; )
   doPurge = shouldPurge(k, v);

-Steve


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread Per Nordlöw via Digitalmars-d-learn

On Monday, 28 September 2020 at 10:01:23 UTC, ikod wrote:
Is it specific to some types? What if collection supports 
stable "foreach"?


Yes it depends on how collection members (such as insert, find, 
replace, erase, etc) are implemented.


I presume we need attributes on mutating collection members that 
describes if the operation invalidates ranges or not.


It is relatively simple to implement checks for range 
invalidation at run-time. This is basically Rust-style borrow 
checking at run-time instead of compile-time.


In C++ this is called "iterator invalidation". This attributes 
are not formally defined in code but part of the documentation.


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread ikod via Digitalmars-d-learn

On Monday, 28 September 2020 at 10:10:10 UTC, Per Nordlöw wrote:

On Monday, 28 September 2020 at 10:01:23 UTC, ikod wrote:
Is it specific to some types? What if collection supports 
stable "foreach"?


Yes it depends on how collection members (such as insert, find, 
replace, erase, etc) are implemented.


I presume we need attributes on mutating collection members 
that describes if the operation invalidates ranges or not.





Agree, nice idea!



Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread ikod via Digitalmars-d-learn

On Monday, 28 September 2020 at 09:41:02 UTC, Per Nordlöw wrote:

On Monday, 28 September 2020 at 07:15:27 UTC, Imperatorn wrote:

Yes, this should be a compile-time error


Spec here: 
https://dlang.org/spec/statement.html#foreach_restrictions


Is it specific to some types? What if collection supports stable 
"foreach"?




Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread Per Nordlöw via Digitalmars-d-learn

On Monday, 28 September 2020 at 07:15:27 UTC, Imperatorn wrote:

Yes, this should be a compile-time error


Spec here: 
https://dlang.org/spec/statement.html#foreach_restrictions


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread Imperatorn via Digitalmars-d-learn

On Sunday, 27 September 2020 at 20:43:19 UTC, Per Nordlöw wrote:

On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:

[...]


I believe it's high time we start thinking about detecting 
these violations at compile-time. I recall it's in the spec 
somewhere so we should start a deprecation process at least for 
AAs.


Yes, this should be a compile-time error


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-28 Thread ikod via Digitalmars-d-learn

On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
On Sun, Sep 27, 2020 at 01:02:04PM +, Per Nordlöw via 
Digitalmars-d-learn wrote:
Is it safe to remove AA-elements from an `aa` I'm iterating 
over via aa.byKeyValue?


No.  Modifying a container while iterating over it is, in 
general, a bad idea (unless the container is designed to be 
used that way, but even then, such removal is generally 
restricted), because it often leads to highly counterintuitive 
results.  In the case of AA's, removing an element may lead to 
a rehashing, which reorders elements, and your iteration may 
miss some elements or repeat some elements or terminate 
prematurely. Even without a rehashing, you may encounter 
inconsistent behaviours, like some elements going "missing".


I think sane rule for AA iterator cane be like this:

1. you must not visit keys removed during iteration regardless of 
their position.
2. you must be able to visit all keys available when iteration 
started.
3. you may not visit keys added during iteration (as AA is 
unordered container)


Looks like this can be implemented for open addressing hash map 
without performance loss (in terms of speed). My has hmap 
implementation support rules 2 and 3, I'll try to add rule 1 to 
the list (right now it iterate over immutable copy of keys).




Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-27 Thread Per Nordlöw via Digitalmars-d-learn

On Sunday, 27 September 2020 at 14:23:11 UTC, H. S. Teoh wrote:
No.  Modifying a container while iterating over it is, in 
general, a bad idea (unless the container is designed to be 
used that way, but even then, such removal is generally 
restricted), because it often leads to highly counterintuitive 
results.  In the case of AA's, removing an element may lead to 
a rehashing, which reorders elements, and your iteration may 
miss some elements or repeat some elements or terminate 
prematurely. Even without a rehashing, you may encounter 
inconsistent behaviours, like some elements going "missing".


I believe it's high time we start thinking about detecting these 
violations at compile-time. I recall it's in the spec somewhere 
so we should start a deprecation process at least for AAs.


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-27 Thread H. S. Teoh via Digitalmars-d-learn
On Sun, Sep 27, 2020 at 01:02:04PM +, Per Nordlöw via Digitalmars-d-learn 
wrote:
> Is it safe to remove AA-elements from an `aa` I'm iterating over via
> aa.byKeyValue?

No.  Modifying a container while iterating over it is, in general, a bad
idea (unless the container is designed to be used that way, but even
then, such removal is generally restricted), because it often leads to
highly counterintuitive results.  In the case of AA's, removing an
element may lead to a rehashing, which reorders elements, and your
iteration may miss some elements or repeat some elements or terminate
prematurely. Even without a rehashing, you may encounter inconsistent
behaviours, like some elements going "missing".

You probably want to build a list of keys to remove, then remove them
after the iteration instead.


T

-- 
Why can't you just be a nonconformist like everyone else? -- YHL


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-27 Thread Imperatorn via Digitalmars-d-learn

On Sunday, 27 September 2020 at 13:02:04 UTC, Per Nordlöw wrote:
Is it safe to remove AA-elements from an `aa` I'm iterating 
over via aa.byKeyValue?


I'm currently doing this:

foreach (ref kv; aa.byKeyValue)
{
if (pred(kv.key))
aa.remove(kv.key); // ok?
}
if (aa.length == 0)
aa = null;

Is there a better way?


What you could do is find all matches (pred) and remove range of 
indices


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-27 Thread Anonymouse via Digitalmars-d-learn

On Sunday, 27 September 2020 at 13:02:04 UTC, Per Nordlöw wrote:
Is it safe to remove AA-elements from an `aa` I'm iterating 
over via aa.byKeyValue?


I'm currently doing this:

foreach (ref kv; aa.byKeyValue)
{
if (pred(kv.key))
aa.remove(kv.key); // ok?
}
if (aa.length == 0)
aa = null;

Is there a better way?


The boring way is to store an array of the spent keys and remove 
afterwards.


KeyType!(typeof(aa))[] garbage;
garbage.reserve(aa.length);

foreach (ref kv; aa.byKeyValue)
{
if (pred(kv.key))
garbage ~= kv.key;
}

foreach (const key; garbage)
{
aa.remove(key);
}

This works with normal arrays too (and 
std.algorithm.mutation.remove), if you foreach_reverse the 
garbage array.


Re: Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-27 Thread Imperatorn via Digitalmars-d-learn

On Sunday, 27 September 2020 at 13:02:04 UTC, Per Nordlöw wrote:
Is it safe to remove AA-elements from an `aa` I'm iterating 
over via aa.byKeyValue?


I'm currently doing this:

foreach (ref kv; aa.byKeyValue)
{
if (pred(kv.key))
aa.remove(kv.key); // ok?
}
if (aa.length == 0)
aa = null;

Is there a better way?


Normally this is not advisable since you're modifying the 
iterated source.


Same thing in C#, there it's a compiler error.


Safe to remove AA elements while iterating over it via .byKeyValue?

2020-09-27 Thread Per Nordlöw via Digitalmars-d-learn
Is it safe to remove AA-elements from an `aa` I'm iterating over 
via aa.byKeyValue?


I'm currently doing this:

foreach (ref kv; aa.byKeyValue)
{
if (pred(kv.key))
aa.remove(kv.key); // ok?
}
if (aa.length == 0)
aa = null;

Is there a better way?