Hi Piotr,
Thanks for the help. I think I found the source of the error. It
was in the "crawl-urlfilter.txt".
I had the following reg expression to grab all the URLs
+^http://([a-z0-9]*\.)*(a-z0-9*)*
The regex factory should have ran into infinite loop.
Thanks,
Rajesh
On 4/7/06, Piotr Kosiorowski <[EMAIL PROTECTED]> wrote:
>
> Hello Rajesh,
> I have run bin/nutch crawl urls -dir crawl.test -depth 3
> on standard nutch-0.7.2 setup.
> The urls file contain http://www.math.psu.edu/MathLists/Contents.htmlonly.
> In crawl-rlfilter I have changed the url pattern to:
> # accept hosts in MY.DOMAIN.NAME
> +^http://
>
> JVM: java version "1.4.2_06"
> Linux
>
> It runs without problems.
> Please reinstall from distribution make only required changes and
> retest. If it fails we will to track it down again.
> Regards
> Piotr
>
>
>
> Rajesh Munavalli wrote:
> > Forgot to mention one more parameter. Modify the crawl-urlfilter to
> accept
> > any URL.
> >
> > On 4/6/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote:
> >> Java version: JSDK 1.4.2_08
> >> URL Seed: http://www.math.psu.edu/MathLists/Contents.html
> >>
> >> I even tried allocating more stack memory using "-Xss", process memory
> >> "-Xms" option. However, if I run the individual tools (fetchlisttool,
> >> fetcher, updatedb..etc) separately from the shell, it works fine.
> >>
> >> Thanks,
> >> --Rajesh
> >>
> >>
> >>
> >> On 4/6/06, Piotr Kosiorowski <[EMAIL PROTECTED]> wrote:
> >>> Which Java version do you use?
> >>> Is it the same for all urls or only for specific one?
> >>> If URL you are trying to crawl is public you can send it to me (off
> list
> >>>
> >>> if you wish) and I can check it on my machine.
> >>> Regards
> >>> Piotr
> >>>
> >>> Rajesh Munavalli wrote:
> >>>> I had earlier posted this message to the list but havent got any
> >>> response.
> >>>> Here are more details.
> >>>>
> >>>> Nutch versionI: nutch.0.7.2
> >>>> URL File: contains a single URL. File name: "urls"
> >>>> Crawl-url-filter: is set to grab all URLs
> >>>>
> >>>> Command: bin/nutch crawl urls -dir crawl.test -depth 3
> >>>> Error: java.lang.StackOverflowError
> >>>>
> >>>> The error occurrs while it executes the "UpdateDatabaseTool".
> >>>>
> >>>> One solution I can think of is to provide more stack memory. But is
> >>> there a
> >>>> better solution to this?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Rajesh
> >>>>
> >>>
> >>>
> >>>
> >
>
>