Try to make encoding right. E.g,, if you read from `csv` or other sources, specify encoding, which is most probably `cp1251`:
df = sqlContext.read.csv(filePath, encoding="cp1251") On Linux cli encoding can be found with `chardet` utility On Wed, Jan 18, 2017 at 3:53 PM, AlexModestov <aleksandrmodes...@gmail.com> wrote: > I want to use Apache Spark for working with text data. There are some > Russian > symbols but Apache Spark shows me strings which look like as > "...\u0413\u041e\u0420\u041e...". What should I do for correcting them. > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/apache-spark-doesn-t-work-correktly- > with-russian-alphabet-tp28316.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >