Otis, that's a great fix! Unfortunately, I cannot do the same with the HTML
parse plugin :(

The StackOverflowError is going away if I increase the stack size to 1024k.
I'm specifying this by including the following in conf/hadoo-env.sh:

export HADOOP_OPTS="-server -Xss1024k"

But this is working only when using the 'local' JobTracker. Don't the tasks
on the slaves pick up the environment from conf/hadoop-env.sh? I've stopped
and started all the daemons after I made the change and have verified that
the above line exists in the conf/hadoop-env.sh on all the slaves.

Thanks,
Siddhartha

On Fri, Jun 13, 2008 at 9:43 AM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> Removed the plugin from the config :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
> > From: Siddhartha Reddy <[EMAIL PROTECTED]>
> > To: [email protected]
> > Sent: Thursday, June 12, 2008 11:41:17 PM
> > Subject: Re: java.lang.StackOverflowError in
> HTMLMetaProcessor.getMetaTagsHelper
> >
> > Hi Otis,
> >
> > Thanks for the reply. I just noticed that I'm also getting
> > StackOverFlowErrors due to the JavaScript parsing plugin. Can you please
> > tell me how you worked around this problem?
> >
> > The stack trace:
> > java.lang.StackOverflowError
> >         at org.apache.xerces.dom.ParentNode.getLength(Unknown Source)
> >         at
> > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:147)
> >         at
> > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
> >         at
> > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
> >         at
> > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
> >         at
> > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
> >         at
> > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
> >         ....
> >
> > Thanks,
> > Siddhartha
> >
> > On Thu, Jun 12, 2008 at 9:22 AM, wrote:
> >
> > > I've seen stack overflow errors, but I believe they were due to the
> > > JavaScript parsing plugin.
> > >
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > > ----- Original Message ----
> > > > From: Siddhartha Reddy
> > > > To: [email protected]
> > > > Sent: Wednesday, June 11, 2008 11:32:16 PM
> > > > Subject: java.lang.StackOverflowError in
> > > HTMLMetaProcessor.getMetaTagsHelper
> > > >
> > > > Hi,
> > > >
> > > > While parsing some pages, I am getting a java.lang.StackOverflowError
> > > > exception due to the recursion in
> HTMLMetaProcessor.getMetaTagsHelper.
> > > I'm
> > > > pasting part of the stack trace below. Unfortunately, I've logic that
> > > > deletes the segment if fetch/parse fails, so I do not know which
> > > particular
> > > > web page caused this problem; I'll recrawl the same pages with
> modified
> > > > logic (that does not delete the segment on failed parsing) and try to
> > > find
> > > > the offending URL.
> > > >
> > > > Did anyone encounter such a problem before? Apart from increasing the
> > > stack
> > > > size for Java, is there any other possible solution?
> > > >
> > > > java.lang.StackOverflowError
> > > >         at java.lang.Character.toUpperCase(Character.java:4278)
> > > >         at java.lang.String.regionMatches(String.java:1384)
> > > >         at java.lang.String.equalsIgnoreCase(String.java:1120)
> > > >         at
> > > >
> > >
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55)
> > > >         at
> > > >
> > >
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > > >         at
> > > >
> > >
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > > >         at
> > > >
> > >
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > > >         at
> > > >
> > >
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > > >         at
> > > >
> > >
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > > >         ....
> > > >
> > > > Thanks,
> > > > Siddhartha
> > >
> > >
> >
> >
> > --
> > http://sids.in
> > "If you are not having fun, you are not doing it right."
>
>


-- 
http://sids.in
"If you are not having fun, you are not doing it right."

Reply via email to