David K. Hess added the comment:

Ok, I followed @r.david.murray's advice and decided to take a shot at this.

First, I noticed that I couldn't reproduce the non-deterministic behavior that 
I reported above on the latest code (i.e. pre-3.7). After doing some research 
it appears this was the sequence of events:

1) Pre-3.3, hashing was stable and this wasn't a problem.
2) Hash randomization became the default in version 3.3 and this 
non-determinism showed up.
3) A new dict implementation was introduced in 3.6 and key orders became stable 
between runs and this non-determinism was gone. However, as the notes on the 
new dict implementation indicate, this ordering should not be relied upon.

I also looked at some other issues:

* 6626 - The patch here basically rewrote the module. I agreed with the last 
comment on that issue that it probably doesn't need that.
* 24527 - Related to the .init() problems discussed here in r.david.murray's 
excellent analysis of the init behavior.
* 1043134 - Where the preferred extension issue was addressed via a proposed 
new map.

My approach with this patch is to address the init problem, the non-determinism 
and the preferred extension issue.

For the init, I made two changes:

1) I added new references to the initial values of the maps so they could be 
retained between init() calls. I also modified MimeTypes.__init__ to refer to 
these.

2) I modified the init() function to check the files argument as r.david.murray 
suggested. If it is supplied, then the existing database is used and the files 
are added to it. If it is not supplied, then the module reinitializes from 
scratch. I'll update the documentation to reflect this if the commit passes 
muster.

For the non-determinism and preferred extension, I changed the two extension 
type maps to be OrderedDicts. I then sorted the entries to the OrderedDict 
constructor by mime type and then placed the preferred extension as the first 
extension to be processed. This guarantees that it will be the extension 
returned for guess_type. The OrderedDict also guarantees that 
guess_all_extensions will always build and return the same value.

The commit can be reviewed here:

https://github.com/davidkhess/cpython/commit/ecabb1cb57e7e066a693653f485f2f687dcc7f6b

I'll open a PR if and when this approach gets enough positive feedback.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4963>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to