Re: Spark Utf 8 encoding
My Terminal can display UTF-8 encoded characters. I already verified that. But will double check again. Thanks! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark Utf 8 encoding
Is the original file indeed utf-8? Especially Windows environments tend to mess up the files (E.g. Java on Windows does not use by default UTF-8). However, also the software that processed the data before could have modified it. > Am 10.11.2018 um 02:17 schrieb lsn24 : > > Hello, > > Per the documentation default character encoding of spark is UTF-8. But > when i try to read non ascii characters, spark tend to read it as question > marks. What am I doing wrong ?. Below is my Syntax: > > val ds = spark.read.textFile("a .bz2 file from hdfs"); > ds.show(); > > The string "KøBENHAVN" gets displayed as "K�BENHAVN" > > I did the testing on spark shell, ran it the same command as a part of spark > Job. Both yields the same result. > > I don't know what I am missing . I read the documentation, I couldn't find > any explicit config etc. > > Any pointers will be greatly appreciated! > > Thanks > > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark Utf 8 encoding
That doesn't necessarily look like a Spark-related issue. Your terminal seems to be displaying the glyph with a question mark because the font lacks that symbol, maybe? On Fri, Nov 9, 2018 at 7:17 PM lsn24 wrote: > > Hello, > > Per the documentation default character encoding of spark is UTF-8. But > when i try to read non ascii characters, spark tend to read it as question > marks. What am I doing wrong ?. Below is my Syntax: > > val ds = spark.read.textFile("a .bz2 file from hdfs"); > ds.show(); > > The string "KøBENHAVN" gets displayed as "K�BENHAVN" > > I did the testing on spark shell, ran it the same command as a part of spark > Job. Both yields the same result. > > I don't know what I am missing . I read the documentation, I couldn't find > any explicit config etc. > > Any pointers will be greatly appreciated! > > Thanks > > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Spark Utf 8 encoding
Hello, Per the documentation default character encoding of spark is UTF-8. But when i try to read non ascii characters, spark tend to read it as question marks. What am I doing wrong ?. Below is my Syntax: val ds = spark.read.textFile("a .bz2 file from hdfs"); ds.show(); The string "KøBENHAVN" gets displayed as "K�BENHAVN" I did the testing on spark shell, ran it the same command as a part of spark Job. Both yields the same result. I don't know what I am missing . I read the documentation, I couldn't find any explicit config etc. Any pointers will be greatly appreciated! Thanks -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org