Hi,

Your fetch interval is very little. fetchInterval unit is ms. 2592000
ms is equal approximent 43 min. When do you start your second depth ?
If after 43 min. This is normal.

Talat

2014-05-11 19:20 GMT+03:00 Vangelis karv <[email protected]>:
> XML:
> <name>http.agent.name</name>
> <value>RiSpider</value>
>
>   <name>http.robots.agents</name>
>   <value>RiSpider,*</value>
>
>   <name>http.content.limit</name>
>   <value>-1</value>
>
>   <name>plugin.includes</name>
>  
> <value>protocol-http|urlfilter-(domain|regex)|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|urlnormalizer-(pass|regex|basic)|scoring-opic|microformats-reltag</value>
>
>   <name>fetcher.queue.mode</name>
>   <value>byDomain</value>
>
> <name>http.accept.language</name>
> <value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
>
>   <name>db.update.max.inlinks</name>
>   <value>20000</value>
>
> <name>parser.character.encoding.default</name>
> <value>utf-8</value>
>
> <name>storage.data.store.class</name>
> <value>org.apache.gora.sql.store.SqlStore</value>
>
>   <name>moreIndexingFilter.indexMimeTypeParts</name>
>   <value>false</value>
>
>   <name>fetcher.server.delay</name>
>   <value>0.0</value>
>
>   <name>parser.timeout</name>
>   <value>-1</value>
>
>   <name>gora.buffer.read.limit</name>
>   <value>5000</value>
>
>   <name>gora.buffer.write.limit</name>
>   <value>5000</value>
>
>   <name>index.parse.md</name>
>   <value>*</value>
>
>   <name>metatags.names</name>
>   <value>*</value>
>
>
> MySQL fields at depth=10, topN=500:
>
> Seed Url:   uk.co.dailymail.www:http/home/index.html
> uk.co.dailymail.www:http/home/index.html, ..., Home | Mail Online ,status: 2, 
> ..., ..., , , score: 1.0198, typ: application/xhtml+xml, batchID: 
> 1399744393-1426553032, http://www.dailymail.co.uk/home/index.html , ..., Home 
> | Mail Online, , fetchInterval:2592000, prevfetchTime: 1402353412236, ..., 
> ..., ..., fetchTime: 1402360203750, , ..., ..., ...
>
>
> uk.co.dailymail.www:http/terms, ..., , 3, ..., , , , 0.0197911, text/html, 
> 1399744408-1706414367, http://www.dailymail.co.uk/terms, , , 
> http://www.dailymail.co.uk/terms, 3888000, 1402353098818, ..., , ..., 
> 1403656233583, , ..., , ...
>
> All the other urls in the database have either fetchInterval 2592000 or 
> 3888000.
>
> Any ideas?
>
>
>
>
>
>> Date: Sun, 11 May 2014 13:46:27 +0300
>> Subject: Re: Fetcher-Parser Nutch 2.2.1
>> From: [email protected]
>> To: [email protected]
>>
>> Hi Vangelis,
>>
>> Maybe your interval time is very little. That is caused fething every
>> depth. Can you share nutch-site.xml and url's f coloumn fields and values.
>>
>> Talat
>> 11 May 2014 02:30 tarihinde "Vangelis karv" <[email protected]> yazdı:
>>
>> > Hi everyone!
>> >
>> > Let's say we start a crawl with depth 5 and topN 500 and www.something.com,
>> > with domain(www.something.com) and regex urlfilters.
>> > I have noticed that the url: www.something.com is fetched, parsed and
>> > updated in every depth. Why is that happening?
>> > In my opinion the particular url should be fetched and parsed only in the
>> > 1st depth and updated in every depth.
>> >
>> > Thank you in advance,
>> > Vangelis
>> >
>
>
>
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to