New submission from Jack Nguyen <jackyeengu...@gmail.com>:

I noticed that the set.issubset cpython implementation casts its iterable 
argument to a set. In some cases, casting the whole iterable to a set is 
unnecessary (see https://bugs.python.org/issue18032). Although the latter 
suggestion is to perform early termination, my suggestion is to use the 
intersection instead.

    # PyAnySet_Check coming from the cpython source code.
    def issubset(self, other):
        # Intersection suggestion:
        if not PyAnySet_Check(other):
            return len(self.intersection(other)) == len(self)
        # Usual implementation for sets.
        else:
            return ...

The main advantage that this implementation has is its memory performance, 
using only O(min(len(self), len(other))) memory, since it never stores elements 
it does not need.

I'm assuming that set construction costs O(n) set.__contains__ calls. This 
implementation uses len(other) calls to self.__contains__ and tmp.__contains__, 
where tmp = set(other). The current implementation uses len(self) + len(other) 
calls to tmp.__contains__.

Thus, I suspect the current implementation only has a chance at running 
noticeably faster when len(self) << len(other), where it performs fewer calls 
to set.__contains__. This is, however, also where the proposed implementation 
has significantly superior memory performance.

----------
components: Interpreter Core
messages: 412966
nosy: panda1200
priority: normal
severity: normal
status: open
title: Memory optimization for set.issubset
type: performance
versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46705>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to