[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86450/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #86450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86450/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #86450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86450/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
Let me merge this one only into master considering the concerns - 
https://github.com/apache/spark/pull/18277#pullrequestreview-90007120 and 
https://github.com/apache/spark/pull/18277#issuecomment-358876719. Adding a 
note could be fine. I don't feel strongly about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
Let me merge this one in few days if there's no more comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
cc @ueshin too. I think we were in several PRs related with encoding / 
decoding stuff.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
So, after this change, we will get rid of system default roundtrip in 
**When `obj`: `unicode`** and **When `obj`: other types**.

In case of **When `obj`: other types**, we _might_ have a behaviour change 
if `__unicode__()` is defined differently with `__str__()` but I believe it's 
quite rare.

So, LGTM but I want a double check from you @holdenk and @viirya if I 
missed anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
Wanted to make a clarification on what we will change here to myself 
because it's quite confusing to me.

In Python 3, it's declared above `basestring = unicode = str`. So, it won't 
change anything. I think this is not our concern.

In Python 2,

### Before:

```
str(obj).encode("utf8")
```

**When `obj` is `unicode`**:

1. `str(obj)`: encoded to bytes by system default (`ascii`)

2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and 
then encoded to bytes by UTF8.


**When `obj` is `str`**:

1. `str(obj)`: bytes as are

2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and 
then encoded to bytes by UTF8


**When `obj` is other types**:

1. `str(obj)`: call `__str__()`

2. `.encode("utf-8")`: decoded to unicodes by system default (`ascii`) and 
then encoded to bytes by UTF8


### After:

```
unicode(obj).encode("utf8")
```

**When `obj` is `unicode`**:

1. `unicode(obj)`: unicodes as are

2. `.encode("utf-8")`: encoded to bytes by UTF8


**When `obj` is `str`**

1.`unicode(obj)`: decoded to unicode by system default (`ascii`)

2.`.encode("utf-8")`: encoded to bytes by UTF8


**When `obj` is other types**

1. `unicode(obj)`: call `__unicode__()`. It falls back to `__str__()` if 
`__unicode__()` is not defined.

2. `.encode("utf-8")`: encoded to bytes by UTF8


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86375/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #86375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86375/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18277
  
This change looks reasonable to me for now. But I'm also concerned about 
the behavior change. A note into release notes should be good or maybe we need 
a note at migration guide in `RDD Programming Guide`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #86375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86375/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18277
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18277
  
Jenkins OK to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-12-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #85574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85574/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-12-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18277
  
**[Test build #85574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85574/testReport)**
 for PR 18277 at commit 
[`8c88595`](https://github.com/apache/spark/commit/8c88595125fbd328a3ed2383a9e96db7ad96f0e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
it seems okay without a close look. Let me take the close look if I can 
take the look first soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18277
  
What do you think @HyukjinKwon ? I think this is probably a reasonable fix, 
but we might break some peoples code who have been depending on the bug.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-10-15 Thread sasameti
Github user sasameti commented on the issue:

https://github.com/apache/spark/pull/18277
  
how do I apply the patch?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-06-15 Thread chaoslawful
Github user chaoslawful commented on the issue:

https://github.com/apache/spark/pull/18277
  
Well, the difference comes from repr()'s divergent default behaviors 
between Python2 and Python3. And the previous code does no better than the 
patched one but causing troubles while processing unicode strings.

On the other hand, pipe() action involved implicit serialization from any 
type to bytes by its definition, so IMHO the application itself should take 
care of consistent serialization/deserialization of data before/after pipe() 
action, IF it wants to always get the same behavior in different environments.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-06-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18277
  

When you try to do this on a rdd of array of unicode string. The result of 
Python2 looks a bit weird.

Using Python version 2.7.12 (default, Jul  1 2016 15:12:24)
SparkSession available as 'spark'.
>>> data = [u'\u6d4b\u8bd5', '1']
>>> rdd = sc.parallelize(data)
>>> result = rdd.pipe('cat').collect()
>>> result   
[u'\u6d4b\u8bd5', u'1']
>>> data = [[u'\u6d4b\u8bd5', '1'], ['1', '2']] 
   
>>> rdd = sc.parallelize(data)  
  
>>> rdd.collect()
[[u'\u6d4b\u8bd5', '1'], ['1', '2']] 
>>> result = rdd.pipe('cat').collect()  
   
>>> result
[u"[u'\\u6d4b\\u8bd5', '1']", u"['1', '2']"] # looks weird and 
different to Python3.
>>> 

Using Python version 3.5.2 (default, Nov 17 2016 17:05:23)
SparkSession available as 'spark'.
>>> data = [u'\u6d4b\u8bd5', '1']
>>> rdd = sc.parallelize(data)
>>> result = rdd.pipe('cat').collect()
>>> result
['\u6d4b\u8bd5', '1']
>>> data = [[u'\u6d4b\u8bd5', '1'], ['1', '2']]
>>> rdd = sc.parallelize(data)
>>> rdd.collect()
[['\u6d4b\u8bd5', '1'], ['1', '2']]
>>> result = rdd.pipe('cat').collect()
>>> result
["['\u6d4b\u8bd5', '1']", "['1', '2']"]
>>>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org