"Jörg Knappen" <[email protected]> wrote:
| Is there a ready made tool that decodes "UTF-8 twice" while keeping
| UTF-8 proper in place?
Isn't a shell script with a truly validating iconv(1) enough?
This works for me if in utf8.1 there is 'ÄEIÖÜ' in UTF-8 and i run
?0[steffen@sherwood tmp]$ iconv -f latin1 -t utf8 < utf8.1 > utf8.2
As in
for i in utf8.1 utf8.2; do
if iconv -f utf8 -t latin1 < ${i} |
iconv -f utf8 -t utf8 >/dev/null 2>&1; then
echo ${i}: bummer, going home by one
iconv -f utf8 -t latin1 < ${i} > ${i}.new 2>&1
else
echo ${i}: valid UTF-8
fi
done
i'll end up as
?0[steffen@sherwood tmp]$ sh utf8dec.sh
utf8.1: valid UTF-8
utf8.2: bummer, going home by one
?0[steffen@sherwood tmp]$
Ciao,
| --Jörg Knappen
--steffen
--- Begin Message ---
I have a database with broken encoding, containing a lot of "UTF-8 twice"
(that infamous encoding that arises when UTF-8 is interpreted as latin-1 and
converted to UTF-8 again) encoding besides ASCII and UTF-8 proper.
Is there a ready made tool that decodes "UTF-8 twice" while keeping UTF-8 proper in place?
--Jörg Knappen
--- End Message ---