I've include the man page for tcs utility in Plan9 OS. Would something
like this do what you want?  If so, I'll post the instructions on how
to get the sources to this list.

Tcs has been ported to Posix environments like BSD, Linux, Cygwin (for


     TCS(1)                                                     TCS(1)

          tcs - translate character sets

          tcs [ -slcv ] [ -f ics ] [ -t ocs ] [ file ... ]

          Tcs interprets the named file(s) (standard input default) as
          a stream of characters from the ics character set or format,
          converts them to runes, and then converts them into a stream
          of characters from the ocs character set or format on the
          standard output.  The default value for ics and ocs is utf,
          the UTF encoding described in utf(6). The -l option lists
          the character sets known to tcs. Processing continues in the
          face of conversion errors (the -s option prevents reporting
          of these errors).  The -c option forces the output to con-
          tain only correctly converted characters; otherwise, 0x80
          characters will be substituted for UTF encoding errors and
          0xFFFD characters will substituted for unknown characters.

          The -v option generates various diagnostic and summary
          information on standard error, or makes the -l output more

          Tcs recognizes an ever changing list of character sets.  In
          particular, it supports a variety of Russian and Japanese
          encodings.  Some of the supported encodings are

          utf        The Plan 9 UTF encoding, known by ISO as UTF-8
          utf1       The deprecated original UTF encoding from ISO
          ascii      7-bit ASCII
          8859-1     Latin-1 (Central European)
          8859-2     Latin-2 (Czech .. Slovak)
          8859-3     Latin-3 (Dutch .. Turkish)
          8859-4     Latin-4 (Scandinavian)
          8859-5     Part 5 (Cyrillic)
          8859-6     Part 6 (Arabic)
          8859-7     Part 7 (Greek)
          8859-8     Part 8 (Hebrew)
          8859-9     Latin-5 (Finnish .. Portuguese)
          koi8       KOI-8 (GOST 19769-74)
          jis-kanji  ISO 2022-JP
          ujis       EUC-JX: JIS 0208
          ms-kanji   Microsoft, or Shift-JIS
          jis        (from only) guesses between ISO 2022-JP, EUC or
          gb         Chinese national standard (GB2312-80)
          big5       Big 5 (HKU version)
          unicode    Unicode Standard 1.0

     Page 1                       Plan 9              (printed 1/3/04)

     TCS(1)                                                     TCS(1)

          tis        Thai character set plus ASCII (TIS 620-1986)
          msdos      IBM PC: CP 437
          atari      Atari-ST character set

          tcs -f 8859-1
               Convert 8859-1 (Latin-1) characters into UTF format.

          tcs -s -f jis
               Convert characters encoded in one of several shift JIS
               encodings into UTF format.  Unknown Kanji will be con-
               verted into 0xFFFD characters.

          tcs -lv
               Print an up to date list of the supported character


          ascii(1), rune(2), utf(6).

     Page 2                       Plan 9              (printed 1/3/04)
--- Begin Message ---
I'm looking for IranSystem to Unicode(UTF-8) converter. If you have one or interested to develop one for me , please tell me.
Ebadat A.R.
PersianComputing mailing list

--- End Message ---
PersianComputing mailing list

Reply via email to