[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-27 Thread Ethan Furman


Ethan Furman  added the comment:

Cool.  I appreciate all the work in this area, thank you!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-27 Thread Brett Cannon


Brett Cannon  added the comment:

> When was this change made?

It has always been this way, so no change here. We have actually improved the 
situation over the years with attempts at better locking in importlib itself.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-25 Thread Xavier de Gaye


Xavier de Gaye  added the comment:

This fixed the bug.

To reproduce the bug, substitute the "from _lzma import *" statement with 
"raise ImportError"
in Lib/lzma.py to simulate that the _lzma module is missing and run the test.py 
script written and uploaded by Maciej.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-25 Thread Xavier de Gaye


Xavier de Gaye  added the comment:

In msg360620 Serhiy wrote:
> When the interpreter encounters "import foo" in bar.py, the import machinery 
> takes the module foo from sys.modules.
> So you break an infinite cycle and can import modules with cyclic 
> dependencies.

The following pdb session that is run when executing the foo.py script, shows 
that this is not quite accurate.  When the interpreter encounters "import foo" 
in bar.py, the import machinery instead starts executing foo.py once again and 
detects that at line 9 of foo.py the 'import bar' statement does not have to be 
executed since the import machinery is already in the process of importing this 
module.  So it is at that point that the infinite cycle is broken.
Later on line 12 in foo.py the import machinery detects that the AttributeError 
is raised because of a circular import and warns that "partially initialized 
module 'bar' has no attribute 'BAR' (most likely due to a circular import)".

$ cat -n foo.py
 1  import pdb
 2  debug = 1
 3
 4  # Prevent starting a new pdb session when foo is imported by bar.
 5  if debug and not hasattr(pdb, 'is_pdb_started'):
 6  pdb.Pdb(skip=['importlib*']).set_trace()
 7  pdb.is_pdb_started = True
 8
 9  import bar
10  FOO = 'foo'
11  try:
12  print(bar.BAR)
13  except AttributeError as e:
14  # The exception is raised with an explicit reason when foo is 
imported by
15  # bar due to partially initialized module 'bar'.
16  print(e)

$ cat -n bar.py
 1  import foo
 2  BAR = 'bar'
 3  print(foo.FOO)

$ python foo.py
> /tmp/foo.py(7)()
-> pdb.is_pdb_started = True
(Pdb) step
> /tmp/foo.py(9)()
-> import bar
(Pdb) step
--Call--
> /tmp/bar.py(1)()
-> import foo
(Pdb) step
> /tmp/bar.py(1)()
-> import foo
(Pdb) step
--Call--
> /tmp/foo.py(1)()
-> import pdb
(Pdb) step
> /tmp/foo.py(1)()
-> import pdb
(Pdb) step
> /tmp/foo.py(2)()
-> debug = 1
(Pdb) step
> /tmp/foo.py(5)()
-> if debug and not hasattr(pdb, 'is_pdb_started'):
(Pdb) step
> /tmp/foo.py(9)()
-> import bar
(Pdb) step
> /tmp/foo.py(10)()
-> FOO = 'foo'
(Pdb) step
> /tmp/foo.py(11)()
-> try:
(Pdb) step
> /tmp/foo.py(12)()
-> print(bar.BAR)
(Pdb) step
AttributeError: partially initialized module 'bar' has no attribute 'BAR' (most 
likely due to a circular import)
> /tmp/foo.py(12)()
-> print(bar.BAR)
(Pdb) step
> /tmp/foo.py(13)()
-> except AttributeError as e:
(Pdb) step
> /tmp/foo.py(16)()
-> print(e)
(Pdb) step
partially initialized module 'bar' has no attribute 'BAR' (most likely due to a 
circular import)
--Return--
> /tmp/foo.py(16)()->None
-> print(e)
(Pdb) step
> /tmp/bar.py(2)()
-> BAR = 'bar'
(Pdb) step
> /tmp/bar.py(3)()
-> print(foo.FOO)
(Pdb) step
foo
--Return--
> /tmp/bar.py(3)()->None
-> print(foo.FOO)
(Pdb) step
> /tmp/foo.py(10)()
-> FOO = 'foo'
(Pdb) step
> /tmp/foo.py(11)()
-> try:
(Pdb) step
> /tmp/foo.py(12)()
-> print(bar.BAR)
(Pdb) step
bar
--Return--
> /tmp/foo.py(12)()->None
-> print(bar.BAR)
(Pdb) step
$

--
nosy: +xdegaye

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I suppose it fixed the bug. But I cannot confirm because I cannot reproduce the 
original bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread miss-islington


miss-islington  added the comment:


New changeset 1a274359283d3d1f4f60dd527843f72e0368caf3 by Miss Islington (bot) 
in branch '3.7':
bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161)
https://github.com/python/cpython/commit/1a274359283d3d1f4f60dd527843f72e0368caf3


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread miss-islington


miss-islington  added the comment:


New changeset ea4a61fec842c94107eef46e5030b89a086f94bb by Miss Islington (bot) 
in branch '3.8':
bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161)
https://github.com/python/cpython/commit/ea4a61fec842c94107eef46e5030b89a086f94bb


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Ethan Furman


Ethan Furman  added the comment:

Brett:
--
> It's covered in the language reference where import semantics are
> explained. It isn't explicitly called out in higher-level docs that
> I'm aware of because it only really comes up in edge cases like
> importing in a thread which we actively discourage unless there's a
> compelling reason not to.

When was this change made?  Assuming it is, in fact, a change.  It seems like 
most of the stdlib could be called in a thread, so the practice of lazily 
importing a module wholesale might need to be audited and proactively fixed.  
Maybe that could be a newcomer's easy(ish) project?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Ethan Furman


Ethan Furman  added the comment:


New changeset 9017e0bd5e124ae6d2ed94b9e9cacb2e86270980 by Ethan Furman (Serhiy 
Storchaka) in branch 'master':
bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161)
https://github.com/python/cpython/commit/9017e0bd5e124ae6d2ed94b9e9cacb2e86270980


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread miss-islington


Change by miss-islington :


--
pull_requests: +17556
pull_request: https://github.com/python/cpython/pull/18171

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread miss-islington


Change by miss-islington :


--
pull_requests: +17557
pull_request: https://github.com/python/cpython/pull/18172

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Ethan Furman


Ethan Furman  added the comment:

Thanks, Serhiy!  I had no idea of those changes to import.

This does raise the question, though, of whether accessing an empty module's 
attributes should either:
- wait for the module to finish initializing; or
- raise an exception immediately with better information
?

It seems like this is a trap waiting for anybody using threaded code and 
on-demand imports.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Brett Cannon


Brett Cannon  added the comment:

It's covered in the language reference where import semantics are explained. It 
isn't explicitly called out in higher-level docs that I'm aware of because it 
only really comes up in edge cases like importing in a thread which we actively 
discourage unless there's a compelling reason not to.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Maciej Gol

Maciej Gol  added the comment:

This is a HUGE eye opener! Didn't know of that 'import' vs 'from x import
y' difference. Thanks a lot! Is it documented somewhere ?

pt., 24 sty 2020, 15:08 użytkownik Serhiy Storchaka 
napisał:

>
> Serhiy Storchaka  added the comment:
>
> It is intended to support circular imports. Let foo.py contains "import
> bar" and bar.py contains "import foo". When you execute "import foo", the
> import machinery first creates an empty module foo, adds it to sys.modules,
> reads foo.py and executes it in the namespace of module foo. When the
> interpreter encounters "import bar" in foo.py, the import machinery creates
> an empty module bar, adds it to sys.modules, reads bar.py and executes it
> in the namespace of module bar. When the interpreter encounters "import
> foo" in bar.py, the import machinery takes the module foo from sys.modules.
> So you break an infinite cycle and can import modules with cyclic
> dependencies.
>
> You can argue that cyclic import does not look as a good practice, but
> actually it is pretty common case when you import a submodule in a package.
> If foo/__init__.py contains "from .bar import Bar", the foo module must be
> imported before you import foo.bar, but is not completely initialized at
> that time yet.
>
> --
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It is intended to support circular imports. Let foo.py contains "import bar" 
and bar.py contains "import foo". When you execute "import foo", the import 
machinery first creates an empty module foo, adds it to sys.modules, reads 
foo.py and executes it in the namespace of module foo. When the interpreter 
encounters "import bar" in foo.py, the import machinery creates an empty module 
bar, adds it to sys.modules, reads bar.py and executes it in the namespace of 
module bar. When the interpreter encounters "import foo" in bar.py, the import 
machinery takes the module foo from sys.modules. So you break an infinite cycle 
and can import modules with cyclic dependencies.

You can argue that cyclic import does not look as a good practice, but actually 
it is pretty common case when you import a submodule in a package. If 
foo/__init__.py contains "from .bar import Bar", the foo module must be 
imported before you import foo.bar, but is not completely initialized at that 
time yet.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Maciej Gol


Maciej Gol  added the comment:

By the way, thanks a lot for the fix <3

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Maciej Gol


Maciej Gol  added the comment:

> PR 18161 fixes race condition by using "from ... import ..."
> which waits until the module be completely initialized if the specified
> names are not set.

Correct me if I'm wrong, but I think the behavior of 'import lzma' in
this case (vulnerable to race conditions) is not as intended? Shouldn't we also 
fix
the 'import' statement itself?

In general, I understand that due to how dynamic Python is, it might not be
possible to load every single name at import time using `import` and using
`from x import y` brings more determinism (because we have a safeguard now).

But, looking at the stacktrace from ipython the problem lies in a sequence of
import statements, not dynamic python coding. Shouldn't the importing mechanism
be more deterministic in such case? For sure, it should not return an empty 
module
(this is the case when race condition occurs).

I think a race condition caused by simply using `import` statements
(not `from x import y`) is a big caveat in the statement itself and how python
imports work.

Haven't checked if the cause is isolated to how tarfile works, or works in 
general, though :-(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +ethan.furman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
nosy: +brett.cannon, eric.snow, lars.gustaebel, ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

PR 18161 fixes race condition by using "from ... import ..." which waits until 
the module be completely initialized if the specified names are not set.

--
components: +Library (Lib)
nosy: +serhiy.storchaka
type: crash -> behavior
versions: +Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-24 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +17548
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18161

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-23 Thread Maciej Gol


Maciej Gol  added the comment:

Uploading fixed file (the former had a typo)

--
Added file: https://bugs.python.org/file48861/test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-23 Thread Maciej Gol


Change by Maciej Gol :


Removed file: https://bugs.python.org/file48860/test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39430] tarfile.open(mode="r") race condition when importing lzma

2020-01-23 Thread Maciej Gol


New submission from Maciej Gol :

Hey guys,

We have a component that archives and unarchives multiple files in separate 
threads that started to
misbehave recently.

We have noticed a bunch of `AttributeError: module 'lzma' has no attribute 
'LZMAFile'` errors, which are
unexpected because our python is not compiled with LZMA support.

What is unfortunate, is that given the traceback:

Traceback (most recent call last):
  File "test.py", line 18, in 
list(pool.map(test_lzma, range(100)))
  File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 
598, in result_iterator
yield fs.pop().result()
  File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 
428, in result
return self.__get_result()
  File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 
384, in __get_result
raise self._exception
  File "/opt/lang/python37/lib/python3.7/concurrent/futures/thread.py", 
line 57, in run
result = self.fn(*self.args, **self.kwargs)
  File "test.py", line 14, in test_lzma
tarfile.open(fileobj=buf, mode="r")
  File "/opt/lang/python37/lib/python3.7/tarfile.py", line 1573, in open
return func(name, "r", fileobj, **kwargs)
  File "/opt/lang/python37/lib/python3.7/tarfile.py", line 1699, in xzopen
fileobj = lzma.LZMAFile(fileobj or name, mode, preset=preset)
AttributeError: module 'lzma' has no attribute 'LZMAFile'


the last line of the traceback is right AFTER this block (tarfile.py:1694):

try:
import lzma
except ImportError:
raise CompressionError("lzma module is not available")


Importing lzma in ipython fails properly:

In [2]: import lzma 
  
---
ModuleNotFoundError   Traceback (most recent call last)
 in 
> 1 import lzma

/opt/lang/python37/lib/python3.7/lzma.py in 
25 import io
26 import os
---> 27 from _lzma import *
28 from _lzma import _encode_filter_properties, 
_decode_filter_properties
29 import _compression

ModuleNotFoundError: No module named '_lzma'

When trying to debug the problem, we have noticed it's not deterministic. In 
order to reproduce it,
we have created a test python that repeatedly writes an archive to BytesIO and 
then reads from it.
Using it with 5 threads and 100 calls, gives very good chances of reproducing 
the issue. For us it
was almost every time.

Race condition occurs both on Python 3.7.3 and 3.7.6.
Test script used to reproduce it attached.

I know that the test script writes uncompressed archives and during opening 
tries to guess the compression.
But I guess this is a legitimate scenario and should not matter in this case.

--
files: test.py
messages: 360551
nosy: Maciej Gol
priority: normal
severity: normal
status: open
title: tarfile.open(mode="r") race condition when importing lzma
type: crash
versions: Python 3.7
Added file: https://bugs.python.org/file48860/test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com