----- Original Message ----- From: "Zoltán Németh" <[EMAIL PROTECTED]>


In formal english, it's not allowed to use 've 'm etc, I'm should be
written as I am. So that's not gonna work i think.
But words like and are really english i think :)
Keep in mind that this is quite a hard way i think, but i don't have a
better solution.
Just for example, Dutch and Afrikaans are not very different, so it's
really hard to see which of the 2 the text is written in.

Tijnema

ps. If you can't get the difference between Dutch and Afrikaans, guess
for Dutch :) It's a lot more used then Afrikaans.

yeah, looking for very frequently used words seems better idea.

greets
Zoltán Németh

In Spanish, as it happens with many languages that use diacritical marks, in informal chatting you often skip them. This has a long tradition in the internet since years ago the support for those extra characters was non-existent and today it is still somewhat patchy. I used to have two modes of writing in Spanish, formal writing with all proper accents, tilde and umlauts and email mode, without any of those. Nowadays, with support for languages using the Roman alphabet widely available, there is no need to omit diacritical marks, but you will often find them missing, particularly in comments to blogs and other informal writing, just because of laziness or carelessness or simply lack of formal education and in that I include foreigners who more or less handle the language but not the minor details. If English had accents, I would probably skip them.

So, using a spelling dictionary is not a good idea unless you can count your input to be properly written. A text in Spanish with its accents missing will give you lots of errors, and we use just one sort of accent (acute) plus tilde and umlaut. The French use three sorts of accents, there is a far higher chance of getting misspellings. I don't know how abundant accents are in Magyar, for me Zoltan Nemeth is the same as Zoltán Németh, but the first is a misspelling.

This problem also affect the frequency of individual letters. Should you first convert accented vowels to their plain version? Because if you find accented letters, it is a sure sign that it is not English, but if there is none, it doesn't mean it is English, it might be some non-English text without the correct accents. Should you count 'a' and 'á' separate or add them together because people often omit the accent?

So, I also vote for the frequently used words approach and against the lowest number of misspellings. And I would first convert everything to plain, with no accents, both for the needle and the haystack.

Satyam

PS: also, it is accepted practice to omit accents on uppercase letters such as in headings. It is not gramatically correct but a typographical convention which the printing industry has been using for ages: the accents simply don't fit nicely.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to