[issue17005] Add a topological sort algorithm
Zahari Dim added the comment: I would like to suggest a `dependency_resolver` API that I have been using that goes in line with what Tim Peters proposes in https://bugs.python.org/issue17005#msg359702 A DAG would be an object that can be iterated in topological order with __iter__ (for simple sequential usage) or have a way of managing all the tasks that can be run in parallel. The later is done with a generator function: ``` def dependency_resolver(self): """Yield the set of nodes that have all dependencies satisfied (which could be an empty set). Send the next completed task.""" ``` which is used with something like: ``` deps = dag.dependency_resolver() pending_tasks = deps.send(None) if not pending_tasks: #Graph empty return #Note this is a can be done in parallel/async while True: some_task = pending_tasks.pop() complete_task_somehow(some_task) try: more_tasks = deps.send(some_task) except StopIteration: #Exit when we have sent in all the nodes in the graph break else: pending_tasks |= more_tasks ``` An implementation I have used for some time is here: https://github.com/NNPDF/reportengine/blob/master/src/reportengine/dag.py although I'd make simpler now. In practice I have found that the function I use most of the time to build the graph is: dag.add_or_update_node(node=something_hashable, inputs={set of existing nodes}, outputs={set of existing nodes}). which adds the node to the graph if it was not there and maps updates the dependencies to add inputs and outputs, which in my experience matches the way one discovers dependencies for things like packages. -- nosy: +Zahari.Dim ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13349] Non-informative error message in index() and remove() functions
Changes by Zahari Dim : -- pull_requests: +1418 ___ Python tracker <http://bugs.python.org/issue13349> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34586] collections.ChainMap should have a get_where method
New submission from Zahari Dim : When using ChainMap I have frequently needed to know the mapping inside the list that contains the effective instance of a particular key. I have needed this when using ChainMap to contain a piece of configuration with multiple sources, like for example ``` from mycollections import ChainMap configsources = ["Command line", "Config file", "Defaults"] config = ChainMap(config_from_commandline(), config_from_file(), default_config()) class BadConfigError(Exception): pass def get_key(key): try: index, value = config.get_where(key) except KeyError as e: raise BadConfigError(f"No such key: '{key}'") from e try: result = validate(key, value) except ValidationError as e: raise BadConfigError(f"Key '{key}' defined in {configsources[index] }" f"is invalid: {e}") from e return result ``` I have also needed this when implementing custom DSLs (e.g. specifying which context is a particular construct allowed to see). I think this method would be generally useful for the ChainMap class and moreover the best way of implementing it I can think of is by copying the `__getitem__` method and retaining the index: ``` class ChainMap(collections.ChainMap): def get_where(self, key): for i, mapping in enumerate(self.maps): try: return i, mapping[key] # can't use 'key in mapping' with defaultdict except KeyError: pass return self.__missing__(key)# support subclasses that define __missing__ ``` I'd be happy to write a patch that does just this. -- components: Library (Lib) messages: 324632 nosy: Zahari.Dim priority: normal severity: normal status: open title: collections.ChainMap should have a get_where method type: enhancement versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue34586> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34586] collections.ChainMap should have a get_where method
Zahari Dim added the comment: I believe an argument for including this functionality in the standard library is that it facilitates writing better error messages and thus better code. Some results that are returned when one searches for *python ChainMap* are: - <https://stackoverflow.com/questions/23392976/what-is-the-purpose-of-collections-chainmap> - <http://www.blog.pythonlibrary.org/2016/03/29/python-201-what-is-a-chainmap/> - <http://rahmonov.me/posts/python-chainmap/> All of these mention prominently a layered configuration of some kind. I would argue that all of the examples would benefit from error checking done along the lines of the snippet above. An additional consideration is that the method is best implemented by copying the `__getitem__` method, which, while short, contains a couple of non trivial details. One analog could be `re.search`, which returns an object with information of both the value that is found and its location, though the `span` attribute of the Match object. Maybe the method could be called ChainMap.search? On Thu, Sep 6, 2018 at 6:07 AM Raymond Hettinger wrote: > > > Raymond Hettinger added the comment: > > I haven't run across this requirement before but it does seem plausible that > a person might want to know which underlying mapping found a match (compare > with the "which" utility in Bash). On the other hand, we haven't had requests > for anything like this for other lookup chains such as determining where a > variable appears in the sequence > locals-to-nested-scopes-to-globals-to-builtins. > > Also, I'm not sure I like the proposed API (the method name and signature). > Perhaps, this should be a docs recipe for a ChainMap subclass or be an > example of a standalone search function that the takes the *maps* attribute > as one of its arguments. Will discuss this with the other core devs to get > their thoughts. > > -- > > ___ > Python tracker > <https://bugs.python.org/issue34586> > ___ -- ___ Python tracker <https://bugs.python.org/issue34586> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34586] collections.ChainMap should have a get_where method
Zahari Dim added the comment: > ISTM that this is the wrong stage to perform validation of allowable values. > That should occur upstream when the underlying mappings are first created. > At that earlier stage it possible to give a more timely response to erroneous > input and there is access to more information (such as the line and row > number of an error in a configuration file). > > It doesn't make sense to me to defer value validation downstream after a > ChainMap instance has been formed and after a successful lookup has occurred. > That just complicates the task of tracing back to the root cause. This is certainly the case in the situation where the validation only depends on the value of the corresponding configuration entry, as it admittedly does in the example above. However the example was oversimplified insofar non trivial validation depends on the whole ensemble configuration settings. For example taking the example described at the top of <http://rahmonov.me/posts/python-chainmap/> I think it would be useful to have an error message of the form: f"User '{db_username}', defined in {configsetttings[user_index]} is not found in database '{database}', defined in {configsettings[database_index]}' > > > Maybe the method could be called ChainMap.search? > > That would be better than get_where(). > > -- > > ___ > Python tracker > <https://bugs.python.org/issue34586> > ___ -- ___ Python tracker <https://bugs.python.org/issue34586> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34586] collections.ChainMap should have a get_where method
Zahari Dim added the comment: On Sat, Sep 8, 2018 at 1:15 PM Serhiy Storchaka wrote: > > > Serhiy Storchaka added the comment: > > I concur with Raymond. The purpose of ChainMap is providing a mapping that > hides the implementation detail of using several mappings as fallbacks. If > you don't want to hide the implementation detail, you don't need to use > ChainMap. > > ChainMap exposes underlying mappings as the maps attribute, so you can use > this implementation detail if you know that it is a ChainMap an not a general > mapping. It is easy to write a code for searching what mapping contains the > specified key. I don't know where the idea that the underlying mappings are an implementation detail comes from. It certainly isn't from the documentation, which mentions uses such as nested scopes and templates, which cannot be attained with a single mapping. It also doesn't match my personal usage, where as discussed, even the simpler cases benefit from information on the underlying mappings. It is a surprising claim to make given than the entirety of the public interface specific to ChainMap (maps, new_child and parents) deals with the fact that there is more structure than one mapping. I also have a hard time discerning this idea from Raymond's messages. > > for m in cm.maps: > if key in m: > found = m > break > else: > # raise an error or set a default, > # what is appropriate for your concrete case This "trivial snatch of code" contains at least two issues that make it fail in situations where the actual implementation of `__getitem__` would work, opening the door for hard to diagnose corner cases. If anything, in my opinion, the fact that this code is being proposed as an alternative reinforces the idea that the implementation of the searching method should be in the standard library. -- ___ Python tracker <https://bugs.python.org/issue34586> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34586] collections.ChainMap should have a get_where method
Zahari Dim added the comment: > > I've discussed this with other core devs and spent a good deal of time > evaluating this proposal. I'm going to pass on the this one but do think it > was a inspired suggestion. Thank you for the proposal. Thank you for taking the time to consider it. I understand that there are many proposals. > > -- > > Note, the original get_where() recipe has an issue. Upon successful lookup, > it returns a 2-tuple but on failure it calls __missing__ which typically > returns a scalar (if it doesn't raise an exception). FWIW this was intended to work when `__missing__` was subclassed to raise a more specific exception. The case where it is made to return a value clearly doesn't play well with the proposed method, and would likely need to be subclassed as well. I consider this an acceptable trade off because I find this use case rather esoteric: the same functionality could be achieved in an arguably clearer way by passing a mapping with the desired missing semantics as the outermost scope. -- ___ Python tracker <https://bugs.python.org/issue34586> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19737] Documentation of globals() and locals() should be improved
New submission from Zahari Dim: The globals() notification states: Return a dictionary representing the current global symbol table.[...] This doc and the fact that globals() is called as a function made me think that globals() returns a copy of the global namespace dict, rather than an object that could be used to actually modify the namespace. I don't find obvious the meaning of "representing" in this context. This of course led to a very nasty and sneaky bug in my code. The docs of locals() don't seem clear to me either, thought at least it seems to imply that it is actually modifying the namespace. -- assignee: docs@python components: Documentation messages: 204052 nosy: Zahari.Dim, docs@python priority: normal severity: normal status: open title: Documentation of globals() and locals() should be improved ___ Python tracker <http://bugs.python.org/issue19737> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19737] Documentation of globals() and locals() should be improved
Zahari Dim added the comment: I am looking at the docs of the built-in functions: http://docs.python.org/2/library/functions.html -- ___ Python tracker <http://bugs.python.org/issue19737> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24500] provide context manager to redirect C output
Zahari Dim added the comment: Considering Python is used often to interact with lower level languages, it seems interesting to have the ability to control the "real" standard output and error that those languages use. Note that redirecting to /dev/null is only one possible application of this feature. Others would be for example linking the stout to the logging module. Specifically regarding redirecting to /dev/null, in my experience this would be fairly useful In scientific software where low level code tends to be used on scientific merits rather than on how much control it has over verbosity. On Sun, May 8, 2016 at 12:04 AM, Martin Panter wrote: > > Martin Panter added the comment: > > Is it really common to have a C wrapper with undesirable output? I suspect > there is not much demand for this feature. Maybe this would be better outside > of Python’s standard library. > > -- > nosy: +martin.panter > status: open -> languishing > > ___ > Python tracker > <http://bugs.python.org/issue24500> > ___ -- ___ Python tracker <http://bugs.python.org/issue24500> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27399] ChainMap.keys() is broken
New submission from Zahari Dim: When trying to see if the keys() of a collections.ChainMap object are empty, it tries to compute the hash of the dicts that compose the ChainMap, giving rise to an error: In [1]: from collections import ChainMap In [2]: m = ChainMap([{'a':1}, {'b':2}]) In [3]: bool(m.keys()) --- TypeError Traceback (most recent call last) in () > 1 bool(m.keys()) /home/zah/anaconda3/lib/python3.5/_collections_abc.py in __len__(self) 633 634 def __len__(self): --> 635 return len(self._mapping) 636 637 def __repr__(self): /home/zah/anaconda3/lib/python3.5/collections/__init__.py in __len__(self) 865 866 def __len__(self): --> 867 return len(set().union(*self.maps)) # reuses stored hash values if possible 868 869 def __iter__(self): TypeError: unhashable type: 'dict' Also, I can't ask if 'a' is in keys: In [6]: m.keys() Out[6]: KeysView(ChainMap([{'a': 1}, {'b': 2}])) In [9]: ks = m.keys() In [17]: 'a' in ks Out[17]: False -- components: Library (Lib) messages: 269370 nosy: Zahari.Dim priority: normal severity: normal status: open title: ChainMap.keys() is broken versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue27399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27399] ChainMap.keys() is broken
Changes by Zahari Dim : -- resolution: -> not a bug status: open -> closed ___ Python tracker <http://bugs.python.org/issue27399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24475] The docs never define what a pool "task" is
New submission from Zahari Dim: See: http://stackoverflow.com/questions/30943161/multiprocessing-pool-with-maxtasksperchild-produces-equal-pids The documentation never makes clear what a "task" in the context of Pool.map. At best, it says: "This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer." in the map documentation. However it does not say how this chunks are calculated by default, making the maxtasksperchild argument not very useful. The fact that a function evaluated by map is not a "task" should be much clearer in the documentation. Also, in the examples, such as: with multiprocessing.Pool(PROCESSES) as pool: # # Tests # TASKS = [(mul, (i, 7)) for i in range(10)] + \ [(plus, (i, 8)) for i in range(10)] results = [pool.apply_async(calculate, t) for t in TASKS] imap_it = pool.imap(calculatestar, TASKS) imap_unordered_it = pool.imap_unordered(calculatestar, TASKS) TASKS are not actually "tasks" but rather "task groups". -- assignee: docs@python components: Documentation messages: 245509 nosy: Zahari.Dim, docs@python priority: normal severity: normal status: open title: The docs never define what a pool "task" is versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue24475> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24500] xontextlib.redirect_stdout should redirect C output
New submission from Zahari Dim: It is common to have an inflexible C wrapper with lots of undesired output. However it is not so trivial to supress (or redirect) that output from Python in a selective way. contextlib.redirect_stdout doesn't help, since it only changes sys.sdout, without touching the actual file descriptor. The following worked for my use case, which I adapted from here http://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/: import sys import os from contextlib import contextmanager, redirect_stdout @contextmanager def supress_stdout(): devnull = open(os.devnull, 'wb') try: stdout_flieno = sys.stdout.fileno() except ValueError: redirect = False else: redirect = True sys.stdout.flush() #sys.stdout.close() devnull_fileno = devnull.fileno() saved_stdout_fd = os.dup(stdout_flieno) os.dup2(devnull_fileno, stdout_flieno) with redirect_stdout(devnull): yield if redirect: os.dup2(stdout_flieno, saved_stdout_fd) -- components: Extension Modules, Library (Lib) messages: 245760 nosy: Zahari.Dim priority: normal severity: normal status: open title: xontextlib.redirect_stdout should redirect C output type: enhancement versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue24500> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24500] contextlib.redirect_stdout should redirect C output
Changes by Zahari Dim : -- title: xontextlib.redirect_stdout should redirect C output -> contextlib.redirect_stdout should redirect C output ___ Python tracker <http://bugs.python.org/issue24500> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24500] provide context manager to redirect C output
Zahari Dim added the comment: Well, the simple minded example I posted has so many bugs (many of which I don't understand, for example why it destroys the stdout of an interpreter permanently) that I really think this feature is necessary. -- ___ Python tracker <http://bugs.python.org/issue24500> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24519] multiprocessing.Pool with maxtasksperchild starts too many processes
New submission from Zahari Dim: The following example should start two processes, but instead it starts three, even though only two do_work(). A third process is incorrectly started after the first one finishes. import os import time from multiprocessing import Pool def initprocess(): print("Starting PID: %d" % os.getpid()) def do_work(x): print("Doing work in %d" % os.getpid()) time.sleep(x**2) if __name__ == '__main__': p = Pool(2, initializer=initprocess,maxtasksperchild=1) results = p.map(do_work, (1,2), chunksize=1) -- components: Library (Lib) messages: 245878 nosy: Zahari.Dim priority: normal severity: normal status: open title: multiprocessing.Pool with maxtasksperchild starts too many processes type: resource usage versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue24519> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24519] multiprocessing.Pool with maxtasksperchild starts too many processes
Changes by Zahari Dim : -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue24519> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com