Could you tell me where Marathi is used and what script (a set of
letters) is used
to write it? Does Marathi use spaces to separate words?

If so, I don't see much problem from the architectural point of view.
You just write 
the analyzer plugin (not very easy for some languages but do-able).
 
But if it doesn't use spaces, like Japanese (also Korean and Chinese?),
then you'd have a problem.  Currently, the Query expressions analysis
assumes that
words are separated by spaces for non-CJK (Chinese, Japanese and Korean)
characters,
and a single CJK character forms a word, an invalid assumption. The
analysis part of
the Query expression is not made plugable yet. (I'm trying to come up
with some proposal.)

Oh, by the way, you'd need a dev version of Nutch to use the plugable
language
analyzer.  The stable version has the generic analyzer hard-coded.

-kuro

> -----Original Message-----
> From: Sameer Tamsekar [mailto:[EMAIL PROTECTED] 
> Sent: 2006-1-08 2:40
> To: [email protected]
> Subject: Help on language
> 
> Hello,
> 
>  I am working on building custom analyzer and language detector
> for native language("Marathi") , does anybody have idea how to extend
> nutch for using this language.
> 
> Regards,
> 
> Sameer
> 


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to