I think its a very good idea. It will be even better if one could create a separate Crawl script just for ngram creation where one could add their own URL for example national libraries URL or etc.. My thinking is that
bin/nutch ngram which is similler to crawl one shot intranet searching but only for ngram creation. instead on using crawl-urlfilter we will use crawl-ngram or something.. just my two cents :-) Cheers On 3/6/06, Ivan Sekulovic <[EMAIL PROTECTED]> wrote: > Hi Jerome! > > Would it be possible to generate ngram profiles for LanguageIdentifier > plugin from crawled content and not from file? What is my idea? The best > source for content in one language could be wikipedia.org. We would > just crawl the wikipedia in desired language and then create ngram > profile from it. What are your thoughts about this idea? > > Best Regards, > Ivan > > > > Jérôme Charron wrote: > > >>What is the good strategy to adopt for multilingualism sites ? > >> > >> > > > >I want nutch to index a site in the different languages and > > > > > >>then, the search only prints results that are in the user language. > >> > >> > > > >Hi Laurent, > > > >What I can suggest is to : > >1. use the languageidentifier plugin while crawling in order to guess the > >language of the content > >2. automatically filters the results by adding the lang:<user_agent_lang> > >clause to the query (could be done in the jsp). > > > >Jérôme > > > >-- > >http://motrech.free.fr/ > >http://www.frutch.org/ > > > > > > > >------------------------------------------------------------------------ > > > >No virus found in this incoming message. > >Checked by AVG Free Edition. > >Version: 7.1.375 / Virus Database: 268.1.1/273 - Release Date: 2.3.2006 > > > > > > > -- Best Regards Zaheed Haque Phone : +46 735 000006 E.mail: [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
