GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/19835

    [SPARK-21866][ML][PYTHON] Few cleanups and fix test case for Python 3.6.0 / 
NumPy 1.13.3

    ## What changes were proposed in this pull request?
    
    Image test seems failed in Python 3.6.0 / NumPy 1.13.3. I manually tested 
as below:
    
    ```
    ======================================================================
    ERROR: test_read_images (pyspark.ml.tests.ImageReaderTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/.../spark/python/pyspark/ml/tests.py", line 1831, in 
test_read_images
        self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), 
first_row)
      File "/.../spark/python/pyspark/ml/image.py", line 149, in toImage
        data = bytearray(array.astype(dtype=np.uint8).ravel())
    TypeError: only integer scalar arrays can be converted to a scalar index
    
    ----------------------------------------------------------------------
    Ran 1 test in 7.606s
    ```
    
    To be clear, I think the error is from NumPy - 
https://github.com/numpy/numpy/blob/75b2d5d427afdb1392f2a0b2092e0767e4bab53d/numpy/core/src/multiarray/number.c#L947
 but with some other changes.
    
    For a smaller scope:
    
    ```python
    >>> import numpy as np
    >>> bytearray(np.array([1]).astype(dtype=np.uint8))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: only integer scalar arrays can be converted to a scalar index
    ```
    
    In Python 2.7 / NumPy 1.13.1, it prints:
    
    ```
    bytearray(b'\x01')
    ```
    
    So, here, I simply worked around it by wrapping an iter as below:
    
    ```python
    >>> bytearray(iter(np.array([1]).astype(dtype=np.uint8)))
    bytearray(b'\x01')
    ```
    
    Also, while looking into it again, I realised few arguments could be quite 
confusing, for example, `Row` that needs some specific attributes and 
`numpy.ndarray`. I added few type checking and added some tests accordingly. 
So, it shows an error message as below:
    
    ```
    TypeError: array argument should be numpy.ndarray; however, it got [<class 
'str'>].
    ```
    
    ## How was this patch tested?
    
    Manually tested with `./python/run-tests`.
    
    And also:
    
    ```
    PYSPARK_PYTHON=python3 SPARK_TESTING=1 bin/pyspark pyspark.ml.tests 
ImageReaderTest
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-21866-followup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19835
    
----
commit cbff0fc10a9915f964b99862e8a801c61210dd31
Author: hyukjinkwon <[email protected]>
Date:   2017-11-28T16:31:47Z

    Clean up and fix tests for Python 3.6.0

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to