Hello Rajesh,
I have run bin/nutch crawl urls -dir crawl.test -depth 3
on standard nutch-0.7.2 setup.
The urls file contain http://www.math.psu.edu/MathLists/Contents.html only.
In crawl-rlfilter I have changed the url pattern to:
# accept hosts in MY.DOMAIN.NAME
+^http://
JVM: java version "1.4.2_06"
Linux
It runs without problems.
Please reinstall from distribution make only required changes and
retest. If it fails we will to track it down again.
Regards
Piotr
Rajesh Munavalli wrote:
Forgot to mention one more parameter. Modify the crawl-urlfilter to accept
any URL.
On 4/6/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote:
Java version: JSDK 1.4.2_08
URL Seed: http://www.math.psu.edu/MathLists/Contents.html
I even tried allocating more stack memory using "-Xss", process memory
"-Xms" option. However, if I run the individual tools (fetchlisttool,
fetcher, updatedb..etc) separately from the shell, it works fine.
Thanks,
--Rajesh
On 4/6/06, Piotr Kosiorowski <[EMAIL PROTECTED]> wrote:
Which Java version do you use?
Is it the same for all urls or only for specific one?
If URL you are trying to crawl is public you can send it to me (off list
if you wish) and I can check it on my machine.
Regards
Piotr
Rajesh Munavalli wrote:
I had earlier posted this message to the list but havent got any
response.
Here are more details.
Nutch versionI: nutch.0.7.2
URL File: contains a single URL. File name: "urls"
Crawl-url-filter: is set to grab all URLs
Command: bin/nutch crawl urls -dir crawl.test -depth 3
Error: java.lang.StackOverflowError
The error occurrs while it executes the "UpdateDatabaseTool".
One solution I can think of is to provide more stack memory. But is
there a
better solution to this?
Thanks,
Rajesh
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general