Re: Language Detection Library/Code

2010-12-28 Thread Katie T
On Tue, Dec 28, 2010 at 12:42 AM, Shashwat Anand
anand.shash...@gmail.com wrote:
 Regarding dictionary lookup+n-gram approach I didn't quite understand what
 you wanted to say.

Run through trigram analysis first, if it identified multiple
languages as being matches within the error margin then split the text
into words, and look up each word in the respective dictionaries to
get a second opinion.

Katie
-- 
CoderStack
http://www.coderstack.co.uk/python-jobs
The Software Developer Job Board
-- 
http://mail.python.org/mailman/listinfo/python-list


Language Detection Library/Code

2010-12-27 Thread Shashwat Anand
Can anyone suggest a *language detection library* in python which works on a
phrase of say 2-5 words.


-- 
~l0nwlf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Katie T
On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand
anand.shash...@gmail.com wrote:
 Can anyone suggest a language detection library in python which works on a
 phrase of say 2-5 words.

Generally such libraries work by bi/trigram frequency analysis, which
means you're going to have a fairly high error rate with such small
phrases. If you're only dealing with a handful of languages it may
make more sense to combine an existing library with a simple
dictionary lookup model to improve accuracy.

Katie
-- 
CoderStack
http://www.coderstack.co.uk/perl-jobs-in-london
The Software Developer Job Board
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Shashwat Anand
On Tue, Dec 28, 2010 at 6:03 AM, Katie T ka...@coderstack.co.uk wrote:

 On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand
 anand.shash...@gmail.com wrote:
  Can anyone suggest a language detection library in python which works on
 a
  phrase of say 2-5 words.

 Generally such libraries work by bi/trigram frequency analysis, which
 means you're going to have a fairly high error rate with such small
 phrases. If you're only dealing with a handful of languages it may
 make more sense to combine an existing library with a simple
 dictionary lookup model to improve accuracy.

 Katie


Infact I'm dealing with very few languages - German, French, Italian,
Portugese and Russian.
I read papers mentioning bi/tri gram frequency but was unable to find any
library.
'guess-language' doesn't perform at all.  The cld (Compact Language
Detection) module of
Google chrome performs well but it is not a standalone library ( I hope
someone ports it ).

Regarding dictionary lookup+n-gram approach I didn't quite understand what
you wanted to say.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Santhosh Kumar
Hi I already Developed a language detection with Python Here is the Link.



With Regards,
Santhosh V.Kumar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Santhosh Kumar
 Hi I already Developed a language detection with Python Here is the Link.
 http://code.google.com/p/langdet/


 
 With Regards,
 Santhosh V.Kumar


-- 
http://mail.python.org/mailman/listinfo/python-list