[Nick Coghlan <ncogh...@gmail.com>] > I took Larry's request a slightly different way: he has a use case where > he wants order preservation (so built in sets aren't good), but combined > with low cost duplicate identification and elimination and removal of > arbitrary elements (so lists and collections.deque aren't good). Organising > a work queue that way seems common enough ...
Is it? I confess I haven't thought of a plausible use case. Larry didn't really explain his problem, just suggested that an ordered set would be "a solution" to it. The problem: whether there are duplicates "in the queue" is a question about an implementation detail, hard for me to translate to a question about the _task_ to be solved. For example, suppose "the task" is to make a deep copy of a web site. A "job" consists of following a URL, sucking down the page text, and adding new jobs for contained URLs on the page. We probably don't want to suck down the page text multiple times for a given URL, but checking whether a URL is currently already in the job queue is asking a question about an implementation detail that misses the point: we want to know whether that URL has already been chased, period. Whether the URL is currently in the queue is irrelevant to that. The only logical relationship is that a job that has finished _was_ in the queue at some time before the job finished. So, ya, I've seen and implemented lots of work queues along these lines - but an OrderedSet would be an "attractive nuisance" (offhandedly appearing to solve a problem it doesn't actually address): jobs = some_kind_of_queue() finished_jobs = set() ... while jobs: job = jobs.get() if job in finished_jobs: continue try: work on the job possibly adding (sub)jobs to the `jobs` queue except TransientExceptions: jobs.put(job) # try again later else: finished_jobs.add(job) Clear? There is a queue here, and a set, but they can't be combined in a useful way: the set contents have nothing to do with what's currently in the queue - the set consists of jobs that have been successfully completed; the queue doesn't care whether it contains duplicates, and merely skips over jobs that have been completed by the time the queue spits them out (which strikes me as more robust design than duplicating logic everywhere a job is added to a queue to try to prevent adding already-done jobs to begin with). Similarly, in other places I've used a "finished" flag on the object instead of a set, or a dict mapping jobs to info about job status and results. But in all these cases, the vital info associated with a job really has little to do with whether the job is currently sitting on a queue. If "is it currently on the queue?" _is_ a common question to ask, perhaps someone could give a semi-realistic example? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2XDEVSO5S5ZLVZTLX43UHBOJWMXYJIIB/ Code of Conduct: http://python.org/psf/codeofconduct/