Hello Alastair, On Tue, 20 Nov 2001 15:12:25 -0000 GMT (20/11/2001, 23:12 +0800 GMT), Alastair Scott wrote:
AS> The frequency analysis is actually very subtle - two other languages which AS> have lots of "z"s that come to mind are German and Polish. The huge mass of AS> rules needed to differentiate one language from another would probably be AS> just as slow as the dictionary lookup. I just came across some "language guessers" on the internet: http://www.xrce.xerox.com/research/mltt/tools/guesser/ This one identified the text Jan originally posted as Turkish_iso9. http://odur.let.rug.nl/~vannoord/TextCat/Demo/textcat.html This one identified the language as "unkown", even though Turkish is in their list of supported languages. However, it is open source and - yes, a Perl script! - so you can run it in TB v2. Oh, and I just saw that he gives a comprehensive list of "competitors", i.e. links to other language identifiers. -- Cheers, Thomas. Moderator der deutschen The Bat! Beginner Liste. It was so hot during football practice that a lot of kids keeled over from nervous prostitution. Message reply created with The Bat! 1.54/10 under Chinese Windows 98 4.10 Build 2222 A using an AMD Athlon K7 1.2GHz, 128MB RAM -- ________________________________________________________ Archives : http://tbudl.thebat.dutaint.com Moderators : mailto:[EMAIL PROTECTED] TBTech List: mailto:[EMAIL PROTECTED] Unsubscribe: mailto:[EMAIL PROTECTED] Latest Vers: 1.53d FAQ : http://faq.thebat.dutaint.com

