I think its a very good idea. It will be even better if one could
create a separate Crawl script just for ngram creation where one could
add their own URL for example  national libraries URL or etc.. My
thinking is that

bin/nutch ngram

which is similler to crawl one shot intranet searching but only for
ngram creation. instead on using crawl-urlfilter we will use
crawl-ngram or something..

just my two cents :-)

Cheers

On 3/6/06, Ivan Sekulovic <[EMAIL PROTECTED]> wrote:
> Hi Jerome!
>
> Would it be possible to generate ngram profiles for LanguageIdentifier
> plugin from crawled content and not from file? What is my idea? The best
> source for content in one language could be wikipedia.org.  We would
> just crawl the wikipedia in desired language and then create ngram
> profile from it. What are your thoughts about this idea?
>
> Best Regards,
> Ivan
>
>
>
> Jérôme Charron wrote:
>
> >>What is the good strategy to adopt for multilingualism sites ?
> >>
> >>
> >
> >I want nutch to index a site in the different languages and
> >
> >
> >>then, the search only prints results that are in the user language.
> >>
> >>
> >
> >Hi Laurent,
> >
> >What I can suggest is to :
> >1. use the languageidentifier plugin while crawling in order to guess the
> >language of the content
> >2. automatically filters the results by adding the lang:<user_agent_lang>
> >clause to the query (could be done in the jsp).
> >
> >Jérôme
> >
> >--
> >http://motrech.free.fr/
> >http://www.frutch.org/
> >
> >
> >
> >------------------------------------------------------------------------
> >
> >No virus found in this incoming message.
> >Checked by AVG Free Edition.
> >Version: 7.1.375 / Virus Database: 268.1.1/273 - Release Date: 2.3.2006
> >
> >
>
>
>


--
Best Regards
Zaheed Haque
Phone : +46 735 000006
E.mail: [EMAIL PROTECTED]


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to