Printing UTF-8 in C

2014-01-12 Thread Ori Idan
I need to print several Hebrew characters (UTF-8) to the terminal.
My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however
printing from C gives me Chinese characters.
My question is how to print one character such as 'א' to the terminal.

-- 
Ori Idan
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Eli Zaretskii
 From: Ori Idan o...@helicontech.co.il
 Date: Sun, 12 Jan 2014 20:34:07 +0200
 
 I need to print several Hebrew characters (UTF-8) to the terminal.
 My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however
 printing from C gives me Chinese characters.
 My question is how to print one character such as 'א' to the terminal.

Is the C source stored on disk in UTF-8 encoding?

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Vassilii Khachaturov

On 12.01.2014 20:34, Ori Idan wrote:

I need to print several Hebrew characters (UTF-8) to the terminal.
My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, 
however printing from C gives me Chinese characters.

My question is how to print one character such as 'א' to the terminal.

Where does the character come from, is it a verbatim literal in the 
source? Unfortunately, this is not portable, even though gcc would 
support it. See the docs for GNU CPP, section Implementation details, 
Implementation-defined behavior. If you want portable solution, you 
must escape the chars, best done with something like #define ALEPH 
\x... to concatenate into a larger literal string.


Here is a nice stackoverflow thread with sample code that reads and 
outputs utf-8 from C, w/o any literals in it:
http://stackoverflow.com/questions/1373463/handling-special-characters-in-c-utf-8-encoding 
.


V.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Dov Grobgeld
Writing hebrew to the terminal is a bad idea because terminals do not
support BiDi reordering.

That said, doing cat small-hello.utf8[1] works for me in gnome-term
(though it is reversed). No special environment variables were defined.

Regards,
Dov

[1] http://paps.sourceforge.net/small-hello.utf8



On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote:

 I need to print several Hebrew characters (UTF-8) to the terminal.
 My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal,
 however printing from C gives me Chinese characters.
 My question is how to print one character such as 'א' to the terminal.

 --
 Ori Idan


 ___
 Linux-il mailing list
 Linux-il@cs.huji.ac.il
 http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Baruch Siach
Hi Dov,

On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote:
 Writing hebrew to the terminal is a bad idea because terminals do not
 support BiDi reordering.
 
 That said, doing cat small-hello.utf8[1] works for me in gnome-term
 (though it is reversed). No special environment variables were defined.

But Ori has specifically asked about sending just one character to terminal.  
cat treats everything like binary data.

baruch

 On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote:
  I need to print several Hebrew characters (UTF-8) to the terminal.
  My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal,
  however printing from C gives me Chinese characters.
  My question is how to print one character such as 'א' to the terminal.
 
  --
  Ori Idan

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Ori Idan
On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote:

 Hi Dov,

 On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote:
  Writing hebrew to the terminal is a bad idea because terminals do not
  support BiDi reordering.
 
  That said, doing cat small-hello.utf8[1] works for me in gnome-term
  (though it is reversed). No special environment variables were defined.

 But Ori has specifically asked about sending just one character to
 terminal.
 cat treats everything like binary data.

 baruch

I don't care at this stage about bidi.
I still could not find out how to print even once character, I tried printf
and putwchar.

-- 
Ori Idan



  On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote:
   I need to print several Hebrew characters (UTF-8) to the terminal.
   My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal,
   however printing from C gives me Chinese characters.
   My question is how to print one character such as 'א' to the terminal.
  
   --
   Ori Idan

 --
  http://baruch.siach.name/blog/  ~. .~   Tk Open
 Systems
 =}ooO--U--Ooo{=
- bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Dov Grobgeld
The most unixy way is to treat everything as binary UTF-8 and then forget
about encodings. The following program works just fine:

#include stdio.h
int main()
{
  printf(Hello שלום!\n);
}

Compile with:

cc -o hello hello.c
./hello
Hello שלום!

(Though שלום is inversed in the terminal).




On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote:

 Hi Dov,

 On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote:
  Writing hebrew to the terminal is a bad idea because terminals do not
  support BiDi reordering.
 
  That said, doing cat small-hello.utf8[1] works for me in gnome-term
  (though it is reversed). No special environment variables were defined.

 But Ori has specifically asked about sending just one character to
 terminal.
 cat treats everything like binary data.

 baruch

  On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote:
   I need to print several Hebrew characters (UTF-8) to the terminal.
   My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal,
   however printing from C gives me Chinese characters.
   My question is how to print one character such as 'א' to the terminal.
  
   --
   Ori Idan

 --
  http://baruch.siach.name/blog/  ~. .~   Tk Open
 Systems
 =}ooO--U--Ooo{=
- bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Ori Idan
On Sun, Jan 12, 2014 at 9:26 PM, Dov Grobgeld dov.grobg...@gmail.comwrote:

 The most unixy way is to treat everything as binary UTF-8 and then forget
 about encodings. The following program works just fine:

 #include stdio.h
 int main()
 {
   printf(Hello שלום!\n);
 }

 Compile with:

 cc -o hello hello.c
 ./hello
 Hello שלום!

 (Though שלום is inversed in the terminal).


That works, but I need one character such as 'א' to be printed and to be
able to print 'ב' as 'א' + 1
Does someone have any idea how to do it?

-- 
Ori Idan






 On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote:

 Hi Dov,

 On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote:
  Writing hebrew to the terminal is a bad idea because terminals do not
  support BiDi reordering.
 
  That said, doing cat small-hello.utf8[1] works for me in gnome-term
  (though it is reversed). No special environment variables were defined.

 But Ori has specifically asked about sending just one character to
 terminal.
 cat treats everything like binary data.

 baruch

  On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il
 wrote:
   I need to print several Hebrew characters (UTF-8) to the terminal.
   My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal,
   however printing from C gives me Chinese characters.
   My question is how to print one character such as 'א' to the terminal.
  
   --
   Ori Idan

 --
  http://baruch.siach.name/blog/  ~. .~   Tk Open
 Systems

 =}ooO--U--Ooo{=
- bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -



___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Dov Grobgeld
Create a list of all hebrew characters and dereference the list according
to the index of the character.

const char **alefbet = {
  \327\220,
  \327\221,
  :
}

printf(%s\n, alefbet[index]);  // For index in 0..26

Am I missing something?

Dov



On Sun, Jan 12, 2014 at 9:29 PM, Ori Idan o...@helicontech.co.il wrote:




 On Sun, Jan 12, 2014 at 9:26 PM, Dov Grobgeld dov.grobg...@gmail.comwrote:

 The most unixy way is to treat everything as binary UTF-8 and then forget
 about encodings. The following program works just fine:

 #include stdio.h
 int main()
 {
   printf(Hello שלום!\n);
 }

 Compile with:

 cc -o hello hello.c
 ./hello
 Hello שלום!

 (Though שלום is inversed in the terminal).


 That works, but I need one character such as 'א' to be printed and to be
 able to print 'ב' as 'א' + 1
 Does someone have any idea how to do it?

 --
 Ori Idan






 On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote:

 Hi Dov,

 On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote:
  Writing hebrew to the terminal is a bad idea because terminals do not
  support BiDi reordering.
 
  That said, doing cat small-hello.utf8[1] works for me in gnome-term
  (though it is reversed). No special environment variables were defined.

 But Ori has specifically asked about sending just one character to
 terminal.
 cat treats everything like binary data.

 baruch

  On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il
 wrote:
   I need to print several Hebrew characters (UTF-8) to the terminal.
   My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal,
   however printing from C gives me Chinese characters.
   My question is how to print one character such as 'א' to the
 terminal.
  
   --
   Ori Idan

 --
  http://baruch.siach.name/blog/  ~. .~   Tk Open
 Systems

 =}ooO--U--Ooo{=
- bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -




___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Omer Zak
You may want to review the following StackOverflow item:
http://stackoverflow.com/questions/4607413/c-library-to-convert-unicode-code-points-to-utf8
One answer describes how to do it yourself.
Another answer uses the iconv library.

On Sun, 2014-01-12 at 21:29 +0200, Ori Idan wrote:
 
 
 
 On Sun, Jan 12, 2014 at 9:26 PM, Dov Grobgeld dov.grobg...@gmail.com
 wrote:
 The most unixy way is to treat everything as binary UTF-8 and
 then forget about encodings. The following program works just
 fine:
 
 #include stdio.h
 
 int main()
 {
   printf(Hello שלום!\n);
 }
 
 Compile with:
 
 cc -o hello hello.c
 ./hello
 Hello שלום!
 
 (Though שלום is inversed in the terminal).
 
 
 
 That works, but I need one character such as 'א' to be printed and to
 be able to print 'ב' as 'א' + 1
 Does someone have any idea how to do it?


-- 
cal 09 1752
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Printing UTF-8 in C

2014-01-12 Thread Eli Zaretskii
 From: Ori Idan o...@helicontech.co.il
 Date: Sun, 12 Jan 2014 20:46:50 +0200
 
  Is the C source stored on disk in UTF-8 encoding?
 
 ‎Yes but what's the difference? latin characters in UTF-8 are the same in
 latin1 encoding and UTF-8

No, Latin-1 and UTF-8 encodings for Latin characters are different.
You are mixing UTF-8 encoding with Unicode codepoints that UTF-8
encodes.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il