[Python-Dev] Re: Should set objects maintain insertion order too?

Tim Peters Fri, 27 Dec 2019 19:50:55 -0800

[Nick Coghlan <ncogh...@gmail.com>]
> I took Larry's request a slightly different way: he has a use case where
> he wants order preservation (so built in sets aren't good), but combined
> with low cost duplicate identification and elimination and removal of
> arbitrary elements (so lists and collections.deque aren't good). Organising
> a work queue that way seems common enough ...


Is it?  I confess I haven't thought of a plausible use case.  Larry
didn't really explain his problem, just suggested that an ordered set
would be "a solution" to it.

The problem:  whether there are duplicates "in the queue" is a
question about an implementation detail, hard for me to translate to a
question about the _task_ to be solved.

For example, suppose "the task" is to make a deep copy of a web site.
A "job" consists of following a URL, sucking down the page text, and
adding new jobs for contained URLs on the page.

We probably don't want to suck down the page text multiple times for a
given URL, but checking whether a URL is currently already in the job
queue is asking a question about an implementation detail that misses
the point:  we want to know whether that URL has already been chased,
period.  Whether the URL is currently in the queue is irrelevant to
that.  The only logical relationship is that a job that has finished
_was_ in the queue at some time before the job finished.

So, ya, I've seen and implemented lots of work queues along these
lines - but an OrderedSet would be an "attractive nuisance"
(offhandedly appearing to solve a problem it doesn't actually
address):

    jobs = some_kind_of_queue()
    finished_jobs = set()
    ...
    while jobs:
        job = jobs.get()
        if job in finished_jobs:
            continue
        try:
            work on the job
            possibly adding (sub)jobs to the `jobs` queue
        except TransientExceptions:
            jobs.put(job)  # try again later
        else:
            finished_jobs.add(job)

Clear?  There is a queue here, and a set, but they can't be combined
in a useful way:  the set contents have nothing to do with what's
currently in the queue - the set consists of jobs that have been
successfully completed; the queue doesn't care whether it contains
duplicates, and merely skips over jobs that have been completed by the
time the queue spits them out (which strikes me as more robust design
than duplicating logic everywhere a job is added to a queue to try to
prevent adding already-done jobs to begin with).

Similarly, in other places I've used a "finished" flag on the object
instead of a set, or a dict mapping jobs to info about job status and
results.  But in all these cases, the vital info associated with a job
really has little to do with whether the job is currently sitting on a
queue.

If "is it currently on the queue?" _is_ a common question to ask,
perhaps someone could give a semi-realistic example?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2XDEVSO5S5ZLVZTLX43UHBOJWMXYJIIB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

Reply via email to