GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/19835
[SPARK-21866][ML][PYTHON] Few cleanups and fix test case for Python 3.6.0 /
NumPy 1.13.3
## What changes were proposed in this pull request?
Image test seems failed in Python 3.6.0 / NumPy 1.13.3. I manually tested
as below:
```
======================================================================
ERROR: test_read_images (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/.../spark/python/pyspark/ml/tests.py", line 1831, in
test_read_images
self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]),
first_row)
File "/.../spark/python/pyspark/ml/image.py", line 149, in toImage
data = bytearray(array.astype(dtype=np.uint8).ravel())
TypeError: only integer scalar arrays can be converted to a scalar index
----------------------------------------------------------------------
Ran 1 test in 7.606s
```
To be clear, I think the error is from NumPy -
https://github.com/numpy/numpy/blob/75b2d5d427afdb1392f2a0b2092e0767e4bab53d/numpy/core/src/multiarray/number.c#L947
but with some other changes.
For a smaller scope:
```python
>>> import numpy as np
>>> bytearray(np.array([1]).astype(dtype=np.uint8))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
```
In Python 2.7 / NumPy 1.13.1, it prints:
```
bytearray(b'\x01')
```
So, here, I simply worked around it by wrapping an iter as below:
```python
>>> bytearray(iter(np.array([1]).astype(dtype=np.uint8)))
bytearray(b'\x01')
```
Also, while looking into it again, I realised few arguments could be quite
confusing, for example, `Row` that needs some specific attributes and
`numpy.ndarray`. I added few type checking and added some tests accordingly.
So, it shows an error message as below:
```
TypeError: array argument should be numpy.ndarray; however, it got [<class
'str'>].
```
## How was this patch tested?
Manually tested with `./python/run-tests`.
And also:
```
PYSPARK_PYTHON=python3 SPARK_TESTING=1 bin/pyspark pyspark.ml.tests
ImageReaderTest
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-21866-followup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19835.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19835
----
commit cbff0fc10a9915f964b99862e8a801c61210dd31
Author: hyukjinkwon <[email protected]>
Date: 2017-11-28T16:31:47Z
Clean up and fix tests for Python 3.6.0
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]