[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2021-08-12 Thread Andrei Kulakov


Change by Andrei Kulakov :


--
keywords: +patch
nosy: +andrei.avk
nosy_count: 8.0 -> 9.0
pull_requests: +26227
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/27750

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2020-07-17 Thread David K. Hess


David K. Hess  added the comment:

@michael-lazar a documentation change seems the path of least resistance given 
the complicated history of this module. +1 from me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2020-07-16 Thread Michael Lazar

Michael Lazar  added the comment:

Greetings,

I just encountered this issue [0] and I agree with the sentiment that the 
documentation is currently misleading.

Particularly,

> By default, it provides access to the same database as the rest of this 
> module. The initial database is a copy of that provided by the module, and 
> may be extended by loading additional mime.types-style files into the 
> database using the read() or readfp() methods. The mapping dictionaries may 
> also be cleared before loading additional data if the default data is not 
> desired.

“as the rest of the module” implies to me that it should behave the same way as 
mimetypes.guess_type() does. The documentation only has one other reference to 
this built-in list of mimetypes, and the default values are hidden behind 
underscored variable names. I would re-word this as

"By default, it provides access to a database of well-known values defined 
internally by the python module. Unlike the other mimetypes convenience 
functions, it does not include definitions from the list of 
mimetypes.knownfiles. The initial database may be extended by loading 
additional mime.types-style files into the database using the read() or 
readfp() methods. The mapping dictionaries may also be cleared before loading 
additional data if the default data is not desired."

I would be happy to submit a PR if others agree.

[0] https://github.com/michael-lazar/jetforce/issues/38

--
nosy: +michael-lazar

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread David K. Hess

David K. Hess  added the comment:

The documentation you quoted does read to me as compatible? The database it is 
referring to is the one hardcoded in the module – not the one assembled from 
that and the host OS. But, maybe this is just the vagaries of language and 
perspective at play.

Anyway I do agree it is an unexpected behavior change from the perspective of a 
user of the MimeTypes class directly. To get the best context for this change, 
it's useful to run through the long history of the issue that drove it:

https://bugs.python.org/issue4963

Note, that discussion never touched on the use case of instantiating a 
MimeTypes class directly and there are apparently no test cases covering this 
particular scenario either. With no awareness of this perspective/use case it 
didn't get directly addressed.

Perhaps all MimeTypes instances should auto-load system files unless a new 
__init__ param selects for this new "clean" behavior?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread Florian Bruhin


Change by Florian Bruhin :


--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread Florian Bruhin

Florian Bruhin  added the comment:

Ah, I only saw dhess' comment after already submitting mine.

> By historical design, instantiating a MimeTypes class instance directly will 
> not use host OS system mime type files.

Yet that wasn't what happened before that commit, and it's also not the 
behaviour which was (and is) documented - from 
https://docs.python.org/3.6/library/mimetypes.html#mimetypes.MimeTypes

By default, it provides access to the same database as the rest of this 
module. The initial database is a copy of that provided by the module, and may 
be extended by loading additional mime.types-style files into the database 
using the read() or readfp() methods. The mapping dictionaries may also be 
cleared before loading additional data if the default data is not desired.

The optional filenames parameter can be used to cause additional files to 
be loaded “on top” of the default database.

You might be right in that the new behaviour is in some way more correct - but 
it's wildly backwards-incompatible, and it's contrary to everything the 
documentation says.

I've only skimmed over bpo-4963 though - maybe I missing something?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread Florian Bruhin


Florian Bruhin  added the comment:

Ah, I think I see what's happening now.

Before that commit, when doing "mt = mimetypes.MimeTypes()", its self.types_map 
is populated as follows:

- Its __init__ method calls the mimetypes.init() function.
- That then reads all the files in mimetypes.knownfiles into a temporary 
MimeTypes object
- The resulting types_map is saved as a module global (mimetypes.types_map).
- The __init__ of our "mt" object continues and picks up all the types from 
that global types_map.

After the change, instead this happens:

- Its __init__ method calls the mimetypes.init() function.
- Like above, mimetypes.init() populates mimetypes.types_map
- However, MimeTypes.__init__ now uses _types_map_default instead of the (now 
reassigned) types_map, i.e. it never reads the entries from knownfiles.

In other words, it only picks up the hardcoded types in the module, but never 
reads the files it's (according to the documentation) supposed to read - thus 
the difference between using "mimetypes.guess_type('E01.mkv')" (which uses the 
correctly initialized global object) and using 
"mimetypes.MimeTypes().guess_type('E01.mkv')" (which doesn't know about mkv, as 
it's defined in one of the mimes.types files, not hardcoded in the module).

As a workaround, this results in the same behavior as before:

mt = mimetypes.MimeTypes()
for fn in mimetypes.knownfiles:
if os.path.isfile(fn):
mt.read(fn)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread David K. Hess


David K. Hess  added the comment:

Hi, I'm the author of the commit that's been fingered. Some comments about the 
behavior being reported

First, as pointed out by @xtreak, indeed the mimetypes module uses mimetypes 
files present on the platform to add to the built in list of mimetypes. In this 
case, "video/x-mastroska" and ".mkv" are not found in the mimetypes module and 
were never there - they are coming from the host OS.

Also, for better or worse, the mimetypes module has an internal "init" method 
that does more than just instantiates a MimeTypes instance for default use:

https://github.com/python/cpython/blob/5c0c325453a175350e3c18ebb10cc10c37f9595c/Lib/mimetypes.py#L345

It also loads in these system files (and also Windows Registry entries on 
Win32) into a fresh MimeTypes instance. So, addressing what @The Compiler is 
seeing, properly resetting the mimetypes module really involves calling 
mimetypes.init(). By historical design, instantiating a MimeTypes class 
instance directly will not use host OS system mime type files.

As to why this commit is causing a change in the observed behavior, the problem 
that was corrected in this commit was that the mimetypes module had 
non-deterministic behavior related to initialization. In the original init 
code, the module level mime types tables are changed (really corrupted) after 
first load and you can never reinitialize the module back to a known good state 
(i.e. to original module defaults without information from the host OS system).

So, realistically, the behavior currently observed is the correct behavior 
given the presence and historical nature of the init function. The fact that a 
fresh MimeTypes instance without having been init()'d or with no filenames 
provided is returning an OS entry prior to this commit is really part of the 
initialization bug which was fixed.

Regarding the ranger bug, the main thing is you should not use a MimeTypes 
instance directly unless you run it through the same initializations that the 
init code does.

Anyway, that's my perspective having waded through all of that during the 
original BPO. I don't claim it's the correct one but that's where we are at.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread Florian Bruhin


Florian Bruhin  added the comment:

I now bisected this with the following script:

#!/bin/bash
git clean -dxf
./configure || exit 125
make -j2 || exit 125
output=$(./python -c "import mimetypes; mt = mimetypes.MimeTypes(); 
print(mt.guess_type('E01.mkv')[0])")
echo "$output"
echo "$(git describe) $output" >> ../bisect-results.txt
[[ $output == None ]] && exit 1 || exit 0

This shows 9fc720e5e4f772598013ea48a3f0d22b2b6b04fa as the commit which broke 
this (bpo-4963, GH-3062).

--
nosy: +dhess, steve.dower

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread Florian Bruhin


Florian Bruhin  added the comment:

I'm seeing the same in ranger and I'm currently trying to debug this - I'm still
not quite sure what I'm seeing as there seem to be various issues/weirdnesses
which overlap each other.

This strikes me as odd:

>>> import mimetypes
>>> mimetypes.guess_type('E01.mkv')
('video/x-matroska', None)
>>> mimetypes.types_map['.mkv']
'video/x-matroska'

>>> mt = mimetypes.MimeTypes()
>>> mt.guess_type('E01.mkv')
(None, None)
>>> mt.types_map
({'.rtf': 'application/rtf', [redacted for brevity]}, {'.js': 
'application/javascript', [redacted for brevity]})
>>> mt.types_map[0]['.mkv']
Traceback (most recent call last):
  File "", line 1, in 
KeyError: '.mkv'
>>> mt.types_map[1]['.mkv']
Traceback (most recent call last):
  File "", line 1, in 
KeyError: '.mkv'

The Python documentation claims: "This class represents a MIME-types database.
By default, it provides access to the same database as the rest of this module.
The initial database is a copy of that provided by the module" - yet that
apparently isn't the case.

I see this with both Python 3.7.5 and 3.8.0, but with 3.6.9 I get the correct
output for both module- and class-level access.

--
nosy: +The Compiler

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-06 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

It seems that there is a list of files from which the mime types are also added 
at 
https://github.com/python/cpython/blob/5c0c325453a175350e3c18ebb10cc10c37f9595c/Lib/mimetypes.py#L42.
 "video/x-matroska" is not present in CPython repo's list of suffixes so it 
should be inferring from the list of known files. Can you please run the below 
script on 3.7.4 and 3.7.5 on the same machine? I am using Mac and 3.7.4 and 
3.7.5 report video/x-matroska correctly.

import mimetypes
print(mimetypes.guess_type('E01.mkv'))
print(mimetypes.types_map['.mkv'])

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-01 Thread toonn


toonn  added the comment:

The result is the same for 3.7.4, on my mac.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-10-31 Thread Ammar Askar


Ammar Askar  added the comment:

This is what I get on master, will try 3.7.5+ as noted in the Github issue:


Python 3.9.0a0 (heads/noopt-dirty:f3b170812d, Oct  1 2019, 20:15:53) [MSC 
v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import mimetypes
>>> print(mimetypes.guess_type('E01.mkv'))
('video/x-matroska', None)

--
nosy: +ammar2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-10-31 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

I couldn't find mkv in mimetypes with search. Can you please post the output of 
the mimetypes query in 3.7.4 and 3.7.5 for the regression? In the attached 
GitHub issue the user reports mkv returns None and mp4 is detected.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-10-31 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-10-31 Thread toonn


New submission from toonn :

A user reported an error to us which seems to derive from the ``mimetypes`` 
library failing to guess the mime type for ``.mkv`` matroska video files:
https://github.com/ranger/ranger/issues/1744#issuecomment-548514373

This is a regression because the same query successfully identifies the 
filename as being of the ``video/x-mastroska`` mime type.

--
components: Library (Lib)
messages: 355763
nosy: toonn
priority: normal
severity: normal
status: open
title: mimetypes for python 3.7.5 fails to detect matroska video
type: behavior
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com