Re: apache-spark doesn't work correktly with russian alphabet

Sergey B. Wed, 18 Jan 2017 06:15:14 -0800

Try to make encoding right.
E.g,, if you read from `csv` or other sources, specify encoding, which is
most probably `cp1251`:


df = sqlContext.read.csv(filePath, encoding="cp1251")

On Linux cli encoding can be found with `chardet` utility

On Wed, Jan 18, 2017 at 3:53 PM, AlexModestov <aleksandrmodes...@gmail.com>
wrote:

> I want to use Apache Spark for working with text data. There are some
> Russian
> symbols but Apache Spark shows me strings which look like as
> "...\u0413\u041e\u0420\u041e...". What should I do for correcting them.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/apache-spark-doesn-t-work-correktly-
> with-russian-alphabet-tp28316.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: apache-spark doesn't work correktly with russian alphabet

Reply via email to