Everything you say is true, but I don't consider it nearly as serious a trap as you say, because I wouldn't generally expect a "user" (that is, a developer not explicitly working on Pharo itself) to modify the #=/#hash implementation of a system class (that is, one included in the base image and instances of which may be expected to exist at all times). A user working on their own application and modifying a #=/#hash implementation there has a very straightforward way to resolve the problem—"reboot" their own application, tearing down and rebuilding any structures it uses. For the applications I work on this is very easy, because it needs to be in order for tests to set up a clean environment each run.
For a developer who *is* explicitly working on Pharo, I think appropriate comments in the #=/#hash implementations of classes critical to the functioning of the image (most obviously the default implementation in [Proto]Object) would be enough (and may already exist). I know in Dolphin Smalltalk there are comments in the #identityHash methods explaining the related point that a binary-filed collection may need to be rehashed on load as its contents will not have the same identities and thus #identityHash-es. Dolphin also changed the number of hash bits similar to what stephan mentioned, and this definitely involved jumping through some hoops to perform safely. More generally, I think this is something anyone with a solid computer science background and/or experience working on language implementations would be expected to pick up by osmosis and/or reasoning from first principles. I don't mean this as an insult to people who just want to write cool apps, but, those are exactly the people who are unlikely to encounter the problem, or if they do, will shrug, restart everything, and the problem will go away. That said, there's certainly nothing wrong with some good comments. The thing is that the `Set/Dictionary allSubInstances do: #rehash` approach is extremely heavy-handed. I suppose it should be largely safe, though to be certain it would need to be done at very high priority to avoid the possibility of concurrent modification (and even then there is a theoretical window of vulnerability). But the cost/benefit analysis feels pretty marginal IMO for making it a part of the IDE. One refinement, if it were to be added—if there are no instances of the class which is being modified, no rehashing can possibly be needed, and in fact in most cases I would expect this to be true. One could additionally scan Sets for instances of that class before rehashing them, which might not actually be vastly more performant but would reduce the heavy-handedness of the operation. On Tue, Aug 30, 2022 at 1:01 PM <s...@clipperadams.com> wrote: > From Pharo Discord: > > IIUC when I change the comparison behavior of a class (i.e. reimplement #= > and #hash), sets containing those objects start acting oddly e.g. happily > storing duplicates of equal objects and failing to find objects they > contain. A solution seems to be Set allSubInstances do: #rehash. > > Is my understanding correct? If so, I find it odd that I've been using and > researching Smalltalk for well over a decade and I never remember > reading/hearing about this. I wonder if it can be baked into the IDE > somehow e.g. via a critic... > > And stephan responded: > > The mailing list posts about that I remember were at the time where the > number of object bits used for the hash were increased. > > ------------------------------ > > Following up on his clue, I found this old post > <https://lists.pharo.org/empathy/thread/LKCE5UXYPXPLVOGNZYW64BWHYJY5NMHQ?hash=LKCE5UXYPXPLVOGNZYW64BWHYJY5NMHQ#LKCE5UXYPXPLVOGNZYW64BWHYJY5NMHQ>, > which seems to confirm my suspicion. This seems like a serious trap for > users to fall into (I’m having flashbacks to my C++ days lol). So, > can/should we somehow handle this in the IDE? Something like a refactoring > (that has a confirmation window) that gives you the choice to rehash image > collections when redefining these methods? > > In a 1GB image, `Set allSubInstances do: #rehash` is almost instantaneous > and `Dictionary allSubInstances do: #rehash` takes several seconds. >