And are they running with the same Python version? What is the Python version?

_____________________________
From: moon soo Lee <m...@apache.org<mailto:m...@apache.org>>
Sent: Thursday, April 20, 2017 11:53 AM
Subject: Re: UnicodeDecodeError in zeppelin 0.7.1
To: <users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>


Hi,

0.7.1 didn't changed any encoding type as far as i know.
One difference is 0.7.1 official artifact has been built with JDK8 while 0.7.0 
built with JDK7 (we'll use JDK7 to build upcoming 0.7.2 binary). But i'm not 
sure that can make pyspark and spark encoding type changes.

Do you have exactly the same interpreter setting in 0.7.1 and 0.7.0?

Thanks,
moon

On Wed, Apr 19, 2017 at 5:30 AM Meethu Mathew 
<meethu.mat...@flytxt.com<mailto:meethu.mat...@flytxt.com>> wrote:
Hi,

I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing this 
error while creating an RDD(in pyspark).

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid 
start byte

I was able to create the RDD without any error after adding use_unicode=False 
as follows
sc.textFile("file.csv",use_unicode=False)

​But it fails when I try to stem the text. I am getting similar error when 
trying to apply stemming to the text using python interpreter.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal 
not in range(128)

All these code is working in 0.7.0 version. There is no change in the dataset 
and code. ​Is there any change in the encoding type in the new version of 
zeppelin?


Regards,

Meethu Mathew



Reply via email to