[Python-ideas] Re: Set operations with Lists

Andrew Barnert via Python-ideas Thu, 19 Sep 2019 12:59:26 -0700

> On Sep 19, 2019, at 12:07, Richard Higginbotham <higgi...@gmail.com> wrote:
> 
> Its faster than the built in set type.

Not according to timeit. If you want to dispute that, you need a proper
benchmark, not just repeating the claims from your inaccurate one.

I’m not sure what your relationtype is supposed to do (especially given than 1
and 2 produce the same values), and all versions seem like unusual use cases
rather than anything approaching realistic, but I left that all alone and just
replaced your benchmarks with timeit, repeated them enough times to actually be
meaningful, and changed the test to the operation you claim it’s about rather
than a different one. The output pretty clearly shows that a pre-existing set
beats a pre-sorted list, and building a set on the fly beats sorting a list on
the fly, and even comparing blatantly unfairly with building a set on the fly
vs. using a pre-sorted list, the set still wins except in the smallest test.

I didn’t include tests against more realistic datasets where your algorithm
does significantly worse; this shows that even your chosen input is slower with
your algorithm if you measure it properly.

And, while we’re at it, I added a version of your algorithm that works with
arbitrary iterables and also takes a key function (which you always want when
you’re doing anything related to sorting in Python), and it’s about 10-20%
faster than your list version, which shows that you aren’t getting any
performance benefit from restricting yourself to lists only.

See https://pastebin.com/BC2zgtN6 for the code.

> Uniqueness of values isn't a requirement.

Of course. But non-unique values make the set algorithm faster, and they make
your algorithm slower. And nearly-unique values don’t affect the set time, but
they make your algorithm slower. So you can’t just ignore it.

And, again, you have to decide what happens with non-unique values, how many
times they appear in the output.

> The values need to be comparable so that they can be sorted. This allows a
> linear time complexity.

Except that sorting is log-linear, not linear, so it doesn’t.

Meanwhile, set actually _does_ allow linear time complexity.

Again, none of this means your code is useless. I think a step-compare
intersection is useful enough to belong in itertools, at least as a recipe. But
it’s not faster than set, or simpler, or anything else you’re arguing.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/B5NTB5GPZLQDECNZUX2CVHVJIB724MNL/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Set operations with Lists

Reply via email to