2008/5/12 Yannick Warnier <[EMAIL PROTECTED]>: >> Why are you removing the accents? Why not store/process the data as >> UTF-8, which supports all the accents in all the languages, and even >> non-latin languages. You mention Arabic, which does not use accented >> latin characters (Maybe you are thinking of Turkish, Ubek or Tadjic). >> UTF-8 supports Arabic, Russian, Greek, Latin including modified >> accented letters, and almost everything else save CJK. >> >> What is your end goal? Why are you removing the accents? > > Hi Dotan, > > I'm trying to give a universally-manageable directory name to an item > using a free-text title. I want to avoid every type of accentuated > character and everything outside of pure ASCII to make it the most > portable possible. > Generating a random hash is not acceptable as we want to be the most > user-friendly possible.
I suppose that is a good reason. I actually tried to come up with a user case that justifies the removal of latin accents, and couldn't. I'll remember that. Tell me, what are you doing with Hebrew, Russian, Arabic, and other non-latin scripts? If you want, I have some code that roughly transliterates Hebrew <-> Latin on the http://gibberish.co.il website. > I'm talking about Arabic not to remove accentuated characters, but in > case there would be a transliteration function that allows me to turn an > Arabic character into something similar in terms of pronunciation but in > ASCII. If it needs to be transliterated back to Arabic you will have fun with the letter forms! I can give you code that does it for Hebrew, but Hebrew only has 5 final letters, and no explicit first- or middle- forms. > So the goal is to create a directory name that is both intuitive *and* > portable for the user and the admin. The title is kept for the user, but > there is a generic shortened code that is generated following the given > title. > We used to ask for a title in a webform, but realised our users liked it > much better when we give them the possibility to generate the code > themselves, but generating one ourselves by default. > I just realised that the developer who did it seemed to make it using > html codes directly, so we end up with codes like "EACUTETEACUTE" for an > item called "été", while "ETE" would be far better. > > Yannick > > Dotan Cohen http://what-is-what.com http://gibberish.co.il א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-נ-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?