Hi,

0.7.1 didn't changed any encoding type as far as i know.
One difference is 0.7.1 official artifact has been built with JDK8 while
0.7.0 built with JDK7 (we'll use JDK7 to build upcoming 0.7.2 binary). But
i'm not sure that can make pyspark and spark encoding type changes.

Do you have exactly the same interpreter setting in 0.7.1 and 0.7.0?

Thanks,
moon

On Wed, Apr 19, 2017 at 5:30 AM Meethu Mathew <meethu.mat...@flytxt.com>
wrote:

> Hi,
>
> I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing this
> error while creating an RDD(in pyspark).
>
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
>> invalid start byte
>
>
> I was able to create the RDD without any error after adding
> use_unicode=False as follows
>
>> sc.textFile("file.csv",use_unicode=False)
>
>
> ​But it fails when I try to stem the text. I am getting similar error when
> trying to apply stemming to the text using python interpreter.
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
>> ordinal not in range(128)
>
> All these code is working in 0.7.0 version. There is no change in the
> dataset and code. ​Is there any change in the encoding type in the new
> version of zeppelin?
>
> Regards,
>
>
> Meethu Mathew
>
>

Reply via email to