I already suggested to add a kind of timeout mechanism here and had
done this for my installation,
however the patch suggestion was rejected since it was a 'non
reproducible' problem.
:-/
Am 07.04.2006 um 21:55 schrieb Rajesh Munavalli:
Hi Piotr,
Thanks for the help. I think I found the source of the
error. It
was in the "crawl-urlfilter.txt".
I had the following reg expression to grab all the URLs
+^http://([a-z0-9]*\.)*(a-z0-9*)*
The regex factory should have ran into infinite loop.
Thanks,
Rajesh
On 4/7/06, Piotr Kosiorowski <[EMAIL PROTECTED]> wrote:
Hello Rajesh,
I have run bin/nutch crawl urls -dir crawl.test -depth 3
on standard nutch-0.7.2 setup.
The urls file contain http://www.math.psu.edu/MathLists/
Contents.htmlonly.
In crawl-rlfilter I have changed the url pattern to:
# accept hosts in MY.DOMAIN.NAME
+^http://
JVM: java version "1.4.2_06"
Linux
It runs without problems.
Please reinstall from distribution make only required changes and
retest. If it fails we will to track it down again.
Regards
Piotr
Rajesh Munavalli wrote:
Forgot to mention one more parameter. Modify the crawl-urlfilter to
accept
any URL.
On 4/6/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote:
Java version: JSDK 1.4.2_08
URL Seed: http://www.math.psu.edu/MathLists/Contents.html
I even tried allocating more stack memory using "-Xss", process
memory
"-Xms" option. However, if I run the individual tools
(fetchlisttool,
fetcher, updatedb..etc) separately from the shell, it works fine.
Thanks,
--Rajesh
On 4/6/06, Piotr Kosiorowski <[EMAIL PROTECTED]> wrote:
Which Java version do you use?
Is it the same for all urls or only for specific one?
If URL you are trying to crawl is public you can send it to me
(off
list
if you wish) and I can check it on my machine.
Regards
Piotr
Rajesh Munavalli wrote:
I had earlier posted this message to the list but havent got any
response.
Here are more details.
Nutch versionI: nutch.0.7.2
URL File: contains a single URL. File name: "urls"
Crawl-url-filter: is set to grab all URLs
Command: bin/nutch crawl urls -dir crawl.test -depth 3
Error: java.lang.StackOverflowError
The error occurrs while it executes the "UpdateDatabaseTool".
One solution I can think of is to provide more stack memory.
But is
there a
better solution to this?
Thanks,
Rajesh
---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general