i set this property to -1 but nutch dosen't crawl. i have a problem with Arabic sites: i can crawl an arabic site like: http://www.sahafa.com/ but i can't crawl another site like:http://www.aljazeera.net/Portal/ help me please.
On 1/31/12, Julien Nioche-4 [via Lucene] <[email protected]> wrote: > > > Try changing the value of this parameter in nutch-site.xml > > <property> > <name>db.max.outlinks.per.page</name> > <value>100</value> > <description>The maximum number of outlinks that we'll process for a page. > If this value is nonnegative (>=0), at most db.max.outlinks.per.page > outlinks > will be processed for a page; otherwise, all outlinks will be processed. > </description> > </property> > > > Julien > > On 31 January 2012 02:56, mina <[email protected]> wrote: > >> i crawl a site with nutch 1.4. but nutch dosen't crawl all links in this >> site. the language of this site is not English. for example nutch dosen't >> crawl this link: >> >> >> http://www.irna.ir/News/30786427/سوء-استفاده-از-نام-كمیته-امداد-برای-جمع-آوری-رای-در-مناطق-محروم/سياسي/ >> >> what can i solve this problem? what config i should do? >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702014.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > > > _______________________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702789.html > > To unsubscribe from error in crawl all link in no English language sites, > visit > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3702014&code=dGFoZXJlZ2Fuaml5YXJAZ21haWwuY29tfDM3MDIwMTR8NTgyODE5NjA3 -- View this message in context: http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702796.html Sent from the Nutch - User mailing list archive at Nabble.com.

