Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19665#discussion_r149030699
--- Diff: dev/run-tests.py ---
@@ -289,7 +289,7 @@ def exec_sbt(sbt_args=()):
stdin=echo_proc.stdout,
stdout=subprocess.PIPE)
echo_proc.wait()
- for line in iter(sbt_proc.stdout.readline, ''):
+ for line in iter(sbt_proc.stdout.readline, b''):
--- End diff --
This previous code causes an infinite loop in Python 3 because `''` is
`str`; however, `sbt_proc.stdout.readline()` returns `b''`, `bytes` at the end:
This can be tested as below:
```python
import subprocess
sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
print(type(sbt_proc.stdout.readline()))
```
In Python 2:
```
>>> import subprocess
>>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
>>> print(type(sbt_proc.stdout.readline()))
<type 'str'>
```
In Python 3:
```
>>> import subprocess
>>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
>>> print(type(sbt_proc.stdout.readline()))
<class 'bytes'>
```
however,
In Python 2:
```python
>>> b'' == ''
True
>>> print(type(b''), type(''))
(<type 'str'>, <type 'str'>)
```
In Python 3:
```python
>>> b'' == ''
False
>>> print(type(b''), type(''))
<class 'bytes'> <class 'str'>
```
The infinite loop can be tested as below, in Python 3:
```python
import subprocess
sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
for line in iter(sbt_proc.stdout.readline, ''):
print(line)
```
In Python 2, the codes above does not cause the infinite loop. This is also
fine if we use `b''` for the sentinel, because `bytes` is an alias for `str` in
Python 2.
```python
import subprocess
sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
for line in iter(sbt_proc.stdout.readline, b''):
print(line)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]