Re: Spark Utf 8 encoding

2018-11-12 Thread lsn24
My Terminal can display UTF-8 encoded characters. I already verified that.
But will double check again.
Thanks!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Utf 8 encoding

2018-11-10 Thread Jörn Franke
Is the original file indeed utf-8? Especially Windows environments tend to mess 
up the files (E.g. Java on Windows does not use by default UTF-8). However, 
also the software that processed the data before could have modified it.

> Am 10.11.2018 um 02:17 schrieb lsn24 :
> 
> Hello,
> 
> Per the documentation default character encoding of spark is UTF-8. But
> when i try to read non ascii characters, spark tend to read it as question
> marks. What am I doing wrong ?. Below is my Syntax:
> 
> val ds = spark.read.textFile("a .bz2 file from hdfs");
> ds.show();
> 
> The string "KøBENHAVN"  gets displayed as "K�BENHAVN"
> 
> I did the testing on spark shell, ran it the same command as a part of spark
> Job. Both yields the same result.
> 
> I don't know what I am missing . I read the documentation, I couldn't find
> any explicit config etc.
> 
> Any pointers will be greatly appreciated!
> 
> Thanks
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Utf 8 encoding

2018-11-09 Thread Sean Owen
That doesn't necessarily look like a Spark-related issue. Your
terminal seems to be displaying the glyph with a question mark because
the font lacks that symbol, maybe?
On Fri, Nov 9, 2018 at 7:17 PM lsn24  wrote:
>
> Hello,
>
>  Per the documentation default character encoding of spark is UTF-8. But
> when i try to read non ascii characters, spark tend to read it as question
> marks. What am I doing wrong ?. Below is my Syntax:
>
> val ds = spark.read.textFile("a .bz2 file from hdfs");
> ds.show();
>
> The string "KøBENHAVN"  gets displayed as "K�BENHAVN"
>
> I did the testing on spark shell, ran it the same command as a part of spark
> Job. Both yields the same result.
>
> I don't know what I am missing . I read the documentation, I couldn't find
> any explicit config etc.
>
> Any pointers will be greatly appreciated!
>
> Thanks
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark Utf 8 encoding

2018-11-09 Thread lsn24
Hello,

 Per the documentation default character encoding of spark is UTF-8. But
when i try to read non ascii characters, spark tend to read it as question
marks. What am I doing wrong ?. Below is my Syntax:

val ds = spark.read.textFile("a .bz2 file from hdfs");
ds.show();

The string "KøBENHAVN"  gets displayed as "K�BENHAVN"

I did the testing on spark shell, ran it the same command as a part of spark
Job. Both yields the same result.

I don't know what I am missing . I read the documentation, I couldn't find
any explicit config etc.

Any pointers will be greatly appreciated!

Thanks




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org