Thank you for the help, I'll surely give feedback after i am done
On 1/12/06, Jérôme Charron <[EMAIL PROTECTED]> wrote: > > Would you tell me where i can get help document on How to use NGramProfile > > to train the > > language identifier and how to detect it. > > unfortunaly, there's no help document. > Here is how to use the NGramProfile: > java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name> > <filename> <encoding> > Where: > * profile-name is the ISO-639 language code (en, fr, de, ...) of the > language profile you want to create (mr for Marathi) > * filename is the name of the file you want to use to create the profile. > * encoding is the encoding of the file names filename > > Once your profile is created, the detection part is done. > Just add the languageidentifier plugin in your Nutch conf. > Perform a crawl, and if all is working fine you should see a trace with > something like: > Analysis .... with analyzer ..... (language-code) > > Since you don't provide a specific analyzer associated to your new language > code (mr), the default NutchAnalyzer will be used. > > Then create an Analyzer for Marathi by creating a new plugin (see for > instance analysis-de or analysis-fr plugins provided in the Nutch source). > Here is what must provide your plugin: > * An analyzer extension that implements > org.apache.nutch.analysis.NutchAnalyzer interface. > * The plugin.xml descriptor of your plugin must declare the association > between your analyzer and the language it should be used for. Something > like: > <implementation id="org.apache.nutch.analysis.mr.MarathiAnalyzer" class=" > org.apache.nutch.analysis.mr.MarathiAnalyzer" lang="mr"/> > > Once this plugin is finished, just add it to the list of activated plugins > in your configuration. Then the next time you perform a crawl, this analyzer > will be used for documents identified as Marathi documents. > > > > > > Will it be OK if i use Stop Analyzer instead of NutchDocumentAnalyzer with > > my custom stopwords? > > It's a first step to a language specific analyzer. > > > where i have to make changes in Nutch code? > > As you can notice, there is no changes to do in the Nutch code. > Just provide some more piece of code to plug in Nutch. > > If you can provide us feed-back on integrating Marathi in Nutch, it will be > very appreciated. > > Regards > > Jérôme > > -- > http://motrech.free.fr/ > http://www.frutch.org/ > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
