Could you tell me where Marathi is used and what script (a set of letters) is used to write it? Does Marathi use spaces to separate words?
If so, I don't see much problem from the architectural point of view. You just write the analyzer plugin (not very easy for some languages but do-able). But if it doesn't use spaces, like Japanese (also Korean and Chinese?), then you'd have a problem. Currently, the Query expressions analysis assumes that words are separated by spaces for non-CJK (Chinese, Japanese and Korean) characters, and a single CJK character forms a word, an invalid assumption. The analysis part of the Query expression is not made plugable yet. (I'm trying to come up with some proposal.) Oh, by the way, you'd need a dev version of Nutch to use the plugable language analyzer. The stable version has the generic analyzer hard-coded. -kuro > -----Original Message----- > From: Sameer Tamsekar [mailto:[EMAIL PROTECTED] > Sent: 2006-1-08 2:40 > To: [email protected] > Subject: Help on language > > Hello, > > I am working on building custom analyzer and language detector > for native language("Marathi") , does anybody have idea how to extend > nutch for using this language. > > Regards, > > Sameer > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
