Everything you say is true, but I don't consider it nearly as serious a
trap as you say, because I wouldn't generally expect a "user" (that is, a
developer not explicitly working on Pharo itself) to modify the #=/#hash
implementation of a system class (that is, one included in the base image
and instances of which may be expected to exist at all times). A user
working on their own application and modifying a #=/#hash implementation
there has a very straightforward way to resolve the problem—"reboot" their
own application, tearing down and rebuilding any structures it uses. For
the applications I work on this is very easy, because it needs to be in
order for tests to set up a clean environment each run.

For a developer who *is* explicitly working on Pharo, I think appropriate
comments in the #=/#hash implementations of classes critical to the
functioning of the image (most obviously the default implementation in
[Proto]Object) would be enough (and may already exist). I know in Dolphin
Smalltalk there are comments in the #identityHash methods explaining the
related point that a binary-filed collection may need to be rehashed on
load as its contents will not have the same identities and thus
#identityHash-es. Dolphin also changed the number of hash bits similar to
what stephan mentioned, and this definitely involved jumping through some
hoops to perform safely.

More generally, I think this is something anyone with a solid computer
science background and/or experience working on language implementations
would be expected to pick up by osmosis and/or reasoning from first
principles. I don't mean this as an insult to people who just want to write
cool apps, but, those are exactly the people who are unlikely to encounter
the problem, or if they do, will shrug, restart everything, and the problem
will go away. That said, there's certainly nothing wrong with some good
comments.

The thing is that the `Set/Dictionary allSubInstances do: #rehash` approach
is extremely heavy-handed. I suppose it should be largely safe, though to
be certain it would need to be done at very high priority to avoid the
possibility of concurrent modification (and even then there is a
theoretical window of vulnerability). But the cost/benefit analysis feels
pretty marginal IMO for making it a part of the IDE. One refinement, if it
were to be added—if there are no instances of the class which is being
modified, no rehashing can possibly be needed, and in fact in most cases I
would expect this to be true. One could additionally scan Sets for
instances of that class before rehashing them, which might not actually be
vastly more performant but would reduce the heavy-handedness of the
operation.

On Tue, Aug 30, 2022 at 1:01 PM <s...@clipperadams.com> wrote:

> From Pharo Discord:
>
> IIUC when I change the comparison behavior of a class (i.e. reimplement #=
> and #hash), sets containing those objects start acting oddly e.g. happily
> storing duplicates of equal objects and failing to find objects they
> contain. A solution seems to be Set allSubInstances do: #rehash.
>
> Is my understanding correct? If so, I find it odd that I've been using and
> researching Smalltalk for well over a decade and I never remember
> reading/hearing about this. I wonder if it can be baked into the IDE
> somehow e.g. via a critic...
>
> And stephan responded:
>
> The mailing list posts about that I remember were at the time where the
> number of object bits used for the hash were increased.
>
> ------------------------------
>
> Following up on his clue, I found this old post
> <https://lists.pharo.org/empathy/thread/LKCE5UXYPXPLVOGNZYW64BWHYJY5NMHQ?hash=LKCE5UXYPXPLVOGNZYW64BWHYJY5NMHQ#LKCE5UXYPXPLVOGNZYW64BWHYJY5NMHQ>,
> which seems to confirm my suspicion. This seems like a serious trap for
> users to fall into (I’m having flashbacks to my C++ days lol). So,
> can/should we somehow handle this in the IDE? Something like a refactoring
> (that has a confirmation window) that gives you the choice to rehash image
> collections when redefining these methods?
>
> In a 1GB image, `Set allSubInstances do: #rehash` is almost instantaneous
> and `Dictionary allSubInstances do: #rehash` takes several seconds.
>

Reply via email to