On 2 March 2017 at 21:06, Wolfgang Maier < wolfgang.ma...@biologie.uni-freiburg.de> wrote:
> On 02.03.2017 06:46, Nick Coghlan wrote: > >> The proposal in this thread then has the significant downside of only >> covering the "nested side effect" case: >> >> for item in iterable: >> if condition(item): >> break >> except break: >> operation(item) >> else: >> condition_was_never_true(iterable) >> >> While being even *less* amenable to being pushed down into a helper >> function (since converting the "break" to a "return" would bypass the >> "except break" clause). >> > > I'm actually not quite buying this last argument. If you wanted to > refactor this to "return" instead of "break", you could simply put the > return into the except break block. In many real-world situations with > multiple breaks from a loop this could actually make things easier instead > of worse. > Fair point - so that would be even with the "single nested side effect" case, but simpler when you had multiple break conditions (and weren't already combined them with "and"). > Personally, the "nested side effect" form makes me uncomfortable every > time I use it because the side effects on breaking or not breaking the loop > don't end up at the same indentation level and not necessarily together. > However, I'm gathering from the discussion so far that not too many people > are thinking like me about this point, so maybe I should simply adjust my > mind-set. > This is why I consider the "search only" form of the loop, where the else clause either sets a default value, or else prevents execution of the code after the loop body (via raise, return, or continue), to be the preferred form: there aren't any meaningful side effects hidden away next to the break statement. If I can't do that, I'm more likely to switch to a classic flag variable that gets checked post-loop execution than I am to push the side effect inside the loop body: search_result = _not_found = object() for item in iterable: if condition(item): search_result = item break if search_result is _not_found: # Handle the "not found" case else: # Handle the "found" case > All that said, this is a very nice abstract view on things! I really > learned quite a bit from this, thank you :) > > As always though, reality can be expected to be quite a bit more > complicated than theory so I decided to check the stdlib for real uses of > break. This is quite a tedious task since break is used in many different > ways and I couldn't come up with a good automated way of classifying them. > So what I did is just go through stdlib code (in reverse alphabetical > order) containing the break keyword and put it into categories manually. I > only got up to socket.py before losing my enthusiasm, but here's what I > found: > > - overall I looked at 114 code blocks that contain one or more breaks > Thanks for doing that research :) > Of the remaining 19 non-trivial cases > > - 9 are variations of your classical search idiom above, i.e., there's an > else clause there and nothing more is needed > > - 6 are variations of your "nested side-effects" form presented above with > debatable (see above) benefit from except break > > - 2 do not use an else clause currently, but have multiple breaks that do > partly redundant things that could be combined in a single except break > clause > Those 8 cases could also be reviewed to see whether a flag variable might be clearer than relying on nested side effects or code repetition. > - 1 is an example of breaking out of two loops; from sre_parse._parse_sub: > > [...] > # check if all items share a common prefix > while True: > prefix = None > for item in items: > if not item: > break > if prefix is None: > prefix = item[0] > elif item[0] != prefix: > break > else: > # all subitems start with a common "prefix". > # move it out of the branch > for item in items: > del item[0] > subpatternappend(prefix) > continue # check next one > break > [...] > This is a case where a flag variable may be easier to read than loop state manipulations: may_have_common_prefix = True while may_have_common_prefix: prefix = None for item in items: if not item: may_have_common_prefix = False break if prefix is None: prefix = item[0] elif item[0] != prefix: may_have_common_prefix = False break else: # all subitems start with a common "prefix". # move it out of the branch for item in items: del item[0] subpatternappend(prefix) Although the whole thing could likely be cleaned up even more via itertools.zip_longest: for first_uncommon_idx, aligned_entries in enumerate(itertools.zip_longest(*items)): if not all_true_and_same(aligned_entries): break else: # Everything was common, so clear all entries first_uncommon_idx = None for item in items: del item[:first_uncommon_idx] (Batching the deletes like that may even be slightly faster than deleting common entries one at a time) Given the following helper function: def all_true_and_same(entries): itr = iter(entries) try: first_entry = next(itr) except StopIteration: return False if not first_entry: return False for entry in itr: if not entry or entry != first_entry: return False return True > > - finally, 1 is a complicated break dance to achieve sth that clearly > would have been easier with except break; from typing.py: > > [...] > def __subclasscheck__(self, cls): > if cls is Any: > return True > if isinstance(cls, GenericMeta): > # For a class C(Generic[T]) where T is co-variant, > # C[X] is a subclass of C[Y] iff X is a subclass of Y. > origin = self.__origin__ > if origin is not None and origin is cls.__origin__: > assert len(self.__args__) == len(origin.__parameters__) > assert len(cls.__args__) == len(origin.__parameters__) > for p_self, p_cls, p_origin in zip(self.__args__, > cls.__args__, > origin.__parameters__): > if isinstance(p_origin, TypeVar): > if p_origin.__covariant__: > # Covariant -- p_cls must be a subclass of > p_self. > if not issubclass(p_cls, p_self): > break > elif p_origin.__contravariant__: > # Contravariant. I think it's the opposite. > :-) > if not issubclass(p_self, p_cls): > break > else: > # Invariant -- p_cls and p_self must equal. > if p_self != p_cls: > break > else: > # If the origin's parameter is not a typevar, > # insist on invariance. > if p_self != p_cls: > break > else: > return True > # If we break out of the loop, the superclass gets a > chance. > if super().__subclasscheck__(cls): > return True > if self.__extra__ is None or isinstance(cls, GenericMeta): > return False > return issubclass(cls, self.__extra__) > [...] > I think is another case that is asking for the inner loop to be factored out to a named function, not for reasons of re-use, but for reasons of making the code more readable and self-documenting :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/