Hi,
I don't know if I well understood the "no regular expression filter" but I
delete the urlfilter from my nutch-site.xml,
this is my nutch-site.xml configuration :
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>plugin.includes</name>
<value>protocol-file|parse-(text|msword|msexcel|mspowerpoint|rtf|xml|html|js|pdf|oo)|index-basic|query-basic|summary-basic|scoring-opic</value>
</property>
<property>
<name>file.content.ignored</name>
<value>false</value>
</property>
<property>
<name>file.content.limit</name> <value>-1</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>true</value>
</property>
<property>
<name>fetcher.threads.fetch</name>
<value>1000</value>
</property>
<property>
<name>fetcher.threads.per.host</name>
<value>1000</value>
<description>This number is the maximum number of threads that
should be allowed to access a host at one time.</description>
</property>
<property>
<name>fetcher.verbose</name>
<value>true</value>
<description>If true, fetcher will log more verbosely.</description>
</property>
<property>
<name>fetcher.server.delay</name>
<value>5.0</value>
<description>The number of seconds the fetcher will delay between
successive requests to the same server.</description>
</property>
<property>
<name>fetcher.max.crawl.delay</name>
<value>30</value>
</property>
<property>
<name>indexer.max.tokens</name>
<value>Integer.MAX_VALUE</value>
</property>
<property>
<name>db.max.outlinks.per.page</name>
<value>10000</value>
</property>
<property>
<name>db.max.anchor.length</name>
<value>200</value>
<description>The maximum number of characters permitted in an anchor.
</description>
</property>
</configuration>
the fetcher freezes after 2 hours.....
as I said the logs don't give informations because each time I run it, the
freezes never occur on the same directory or file .....
Do I have to make a change in my configuration?
Thanks in advance,
Aïcha
Stefan Groschupf-2 wrote:
>
> Hi,
>
> try to have no regular expression filter and check if this helps.
> Let me know if this solve the problem.
> You may be want to do a thread dump and send the log to the list to
> check where exactly the fetcher freezes.
>
> Stefan
>
> Am 03.11.2006 um 15:53 schrieb Aisha:
>
>>
>> Hi,
>>
>> I don't know why but I have no answer on the 3 forums where I sent my
>> problem........
>> As the problem of Fetcher freezes occurs every time I try to fetch
>> my file
>> system I can't imagine that I am the only one who have this problem
>> and as I
>> said in my last e-mail, I found many mails about this problem but no
>> solution seems have been done........
>> It is a big problem so I don't understand why nobody seems
>> interested on
>> it........
>>
>> I try to crawl over my file system but the crawl never finished, it
>> aborted
>> with the message "Aborting with 3 hung threads".
>>
>> The number of hung threads is not the same if I retry....
>>
>> I modify the configuration grawing the number of threads but it
>> doesn't
>> solve the problem........
>>
>> Please could somebody help me,
>> I can't crawl my file system..........
>>
>> thanks in advance.
>> Aïcha
>>
>> --
>> View this message in context: http://www.nabble.com/Fetcher-freezes-
>> tf2568287.html#a7158776
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
>
>
>
>
>
--
View this message in context:
http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7199731
Sent from the Nutch - Dev mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers