Hi old friends (and new),
I'm quite enjoying getting back to scripting, and like Perl a lot,
especially with Affrus. While I'm probably inefficient, it's nice to
have a language actually designed for text processing (search engine
logs, in my case). However, I've got some Unicode issues and that
seems to be platform-specific, so thought I'd ask here.
I've done enough research to know that I should avoid hardcoded
counting with positions and use the perl functions which will
automatically handle utf8 characters properly. That's cool. I'm
pretty sure I'm reading in utf8 and comparisons seem to work.
What I can't do is generate readable cross-platform output to show my
clients. Even opening the output in BBEdit as UTF8 doesn't convert
the codes into properly rendered extended characters, and by the time
it gets into Excel on their Windows workstation, all hope is pretty
much gone.
The stuff that looks like HTML entities is fine when viewed in a browser:
#1575;#1604;#1578;#1593;#1575;#1585;#1601;
s#305;emens
And if necessary, I can deliver in HTML.
But my logs have characters like this in them:
(from BBEdit as UTF8:)
áááááááááááááâ á°üì ¶è¨ áî¶ùâ
atualiza§£o
carreo
(from BBEdit as Mac Roman)
É íáßÓ Ô¯É
atualizaɬßɬ£o
torunn tømmervold
löschen
I can tell they mean something, but I can't figure out how to make
them readable. Help?
TIA,
Avi
--
Complete Guide to Search Engines for Web Sites and Intranets
http://www.searchtools.com