Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18277
Wanted to make a clarification on what we will change here to myself
because it's quite confusing to me.
In Python 3, it's declared above `basestring = unicode = str`. So, it won't
change anything. I think this is not our concern.
In Python 2,
### Before:
```
str(obj).encode("utf8")
```
**When `obj` is `unicode`**:
1. `str(obj)`: encoded to bytes by system default (`ascii`)
2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and
then encoded to bytes by UTF8.
**When `obj` is `str`**:
1. `str(obj)`: bytes as are
2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and
then encoded to bytes by UTF8
**When `obj` is other types**:
1. `str(obj)`: call `__str__()`
2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and
then encoded to bytes by UTF8
### After:
```
unicode(obj).encode("utf8")
```
**When `obj` is `unicode`**:
1. `unicode(obj)`: unicodes as are
2. `.encode("utf-8")`: encoded to bytes by UTF8
**When `obj` is `str`**
1.`unicode(obj)`: decoded to unicode by system default (`ascii`)
2.`.encode("utf-8")`: encoded to bytes by UTF8
**When `obj` is other types**
1. `unicode(obj)`: call `__unicode__()`. It falls back to `__str__()` if
`__unicode__()` is not defined.
2. `.encode("utf-8")`: encoded to bytes by UTF8
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]