Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18277
  
    Wanted to make a clarification on what we will change here to myself 
because it's quite confusing to me.
    
    In Python 3, it's declared above `basestring = unicode = str`. So, it won't 
change anything. I think this is not our concern.
    
    In Python 2,
    
    ### Before:
    
    ```
    str(obj).encode("utf8")
    ```
    
    **When `obj` is `unicode`**:
    
    1. `str(obj)`: encoded to bytes by system default (`ascii`)
    
    2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and 
then encoded to bytes by UTF8.
    
    
    **When `obj` is `str`**:
    
    1. `str(obj)`: bytes as are
    
    2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and 
then encoded to bytes by UTF8
    
    
    **When `obj` is other types**:
    
    1. `str(obj)`: call `__str__()`
    
    2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and 
then encoded to bytes by UTF8
    
    
    ### After:
    
    ```
    unicode(obj).encode("utf8")
    ```
    
    **When `obj` is `unicode`**:
    
    1. `unicode(obj)`: unicodes as are
    
    2. `.encode("utf-8")`: encoded to bytes by UTF8
    
    
    **When `obj` is `str`**
    
    1.`unicode(obj)`: decoded to unicode by system default (`ascii`)
    
    2.`.encode("utf-8")`: encoded to bytes by UTF8
    
    
    **When `obj` is other types**
    
    1. `unicode(obj)`: call `__unicode__()`. It falls back to `__str__()` if 
`__unicode__()` is not defined.
    
    2. `.encode("utf-8")`: encoded to bytes by UTF8


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to