Jörg, by any chance would this do what you need? http://www.kreativekorp.com/software/recode/#reinterpret
-- Rebecca Bettencourt On Mon, Oct 28, 2013 at 9:48 AM, Buck Golemon <[email protected]> wrote: > > > > On Mon, Oct 28, 2013 at 6:06 AM, "Jörg Knappen" <[email protected]> wrote: >> >> Hi Steffen, >> >> data aren't that easy. There are non-latin1-characters encoded in the UTF8 >> part. I expect >> among others typographic apostrophes, polish characters, some mediaevalist >> characters like >> ũ (u with tilde). Maybe, there is also some greek inside, but I am not >> sure about that. >> >> --Jörg Knappen >> >> Gesendet: Montag, 28. Oktober 2013 um 12:34 Uhr >> Von: "Steffen \"Daode\" Nurpmeso" <[email protected]> >> An: "Jörg Knappen" <[email protected]> >> Cc: [email protected] >> Betreff: Re: Do you know a tool to decode "UTF-8 twice" >> "Jörg Knappen" <[email protected]> wrote: >> | Is there a ready made tool that decodes "UTF-8 twice" while keeping >> | UTF-8 proper in place? >> >> Isn't a shell script with a truly validating iconv(1) enough? >> This works for me if in utf8.1 there is 'ÄEIÖÜ' in UTF-8 and i run >> >> ?0[steffen@sherwood tmp]$ iconv -f latin1 -t utf8 < utf8.1 > utf8.2 >> >> As in >> >> for i in utf8.1 utf8.2; do >> if iconv -f utf8 -t latin1 < ${i} | >> iconv -f utf8 -t utf8 >/dev/null 2>&1; then >> echo ${i}: bummer, going home by one >> iconv -f utf8 -t latin1 < ${i} > ${i}.new 2>&1 >> else >> echo ${i}: valid UTF-8 >> fi >> done >> >> i'll end up as >> >> ?0[steffen@sherwood tmp]$ sh utf8dec.sh >> utf8.1: valid UTF-8 >> utf8.2: bummer, going home by one >> ?0[steffen@sherwood tmp]$ >> >> Ciao, >> >> | --Jörg Knappen >> >> --steffen > > > Jörg: There's no ready-made tool, but it's easy to write in python. > I'll provide you a well-tested function in a few minutes. > > >

