I've include the man page for tcs utility in Plan9 OS. Would something
like this do what you want? If so, I'll post the instructions on how
to get the sources to this list.
Tcs has been ported to Posix environments like BSD, Linux, Cygwin (for
Windows)
-Fariborz
TCS(1) TCS(1)
NAME
tcs - translate character sets
SYNOPSIS
tcs [ -slcv ] [ -f ics ] [ -t ocs ] [ file ... ]
DESCRIPTION
Tcs interprets the named file(s) (standard input default) as
a stream of characters from the ics character set or format,
converts them to runes, and then converts them into a stream
of characters from the ocs character set or format on the
standard output. The default value for ics and ocs is utf,
the UTF encoding described in utf(6). The -l option lists
the character sets known to tcs. Processing continues in the
face of conversion errors (the -s option prevents reporting
of these errors). The -c option forces the output to con-
tain only correctly converted characters; otherwise, 0x80
characters will be substituted for UTF encoding errors and
0xFFFD characters will substituted for unknown characters.
The -v option generates various diagnostic and summary
information on standard error, or makes the -l output more
verbose.
Tcs recognizes an ever changing list of character sets. In
particular, it supports a variety of Russian and Japanese
encodings. Some of the supported encodings are
utf The Plan 9 UTF encoding, known by ISO as UTF-8
utf1 The deprecated original UTF encoding from ISO
10646
ascii 7-bit ASCII
8859-1 Latin-1 (Central European)
8859-2 Latin-2 (Czech .. Slovak)
8859-3 Latin-3 (Dutch .. Turkish)
8859-4 Latin-4 (Scandinavian)
8859-5 Part 5 (Cyrillic)
8859-6 Part 6 (Arabic)
8859-7 Part 7 (Greek)
8859-8 Part 8 (Hebrew)
8859-9 Latin-5 (Finnish .. Portuguese)
koi8 KOI-8 (GOST 19769-74)
jis-kanji ISO 2022-JP
ujis EUC-JX: JIS 0208
ms-kanji Microsoft, or Shift-JIS
jis (from only) guesses between ISO 2022-JP, EUC or
Shift-Jis
gb Chinese national standard (GB2312-80)
big5 Big 5 (HKU version)
unicode Unicode Standard 1.0
Page 1 Plan 9 (printed 1/3/04)
TCS(1) TCS(1)
tis Thai character set plus ASCII (TIS 620-1986)
msdos IBM PC: CP 437
atari Atari-ST character set
EXAMPLES
tcs -f 8859-1
Convert 8859-1 (Latin-1) characters into UTF format.
tcs -s -f jis
Convert characters encoded in one of several shift JIS
encodings into UTF format. Unknown Kanji will be con-
verted into 0xFFFD characters.
tcs -lv
Print an up to date list of the supported character
sets.
SOURCE
/sys/src/cmd/tcs
SEE ALSO
ascii(1), rune(2), utf(6).
Page 2 Plan 9 (printed 1/3/04)
--- Begin Message ---
ï
Salam,
I'm looking for IranSystem to Unicode(UTF-8) converter. If you
have one or interested to develop one for me , please tell me.
Regards,
Ebadat A.R.
|
_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing
--- End Message ---
_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing