Yes I see that. But in fact I see javascript in my summaries too and don't
know how remove it :)

-----Original Message-----
From: Jack Tang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 15, 2006 11:14 AM
To: [email protected]
Subject: Re: javascript in summaries [nutch-0.7.1]

On 3/15/06, Ilia S. Yatsenko <[EMAIL PROTECTED]> wrote:
>
> This script present in html page inside <script>//<!-- code //--></script>
Really?
In html parser I think DOMContentUitls escape the element.

private static final boolean getTextHelper(StringBuffer sb, Node node,
                                             boolean abortOnNestedAnchors,
                                             int anchorDepth) {
    if ("script".equalsIgnoreCase(node.getNodeName())) {
      return false;
    }
    if ("style".equalsIgnoreCase(node.getNodeName())) {
      return false;
    }


>
>
> -----Original Message-----
> From: Jack Tang [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 15, 2006 10:58 AM
> To: [email protected]
> Subject: Re: javascript in summaries [nutch-0.7.1]
>
> Maybe you can filter javascript files(*.js) using url filter..
>
> /Jack
>
> On 3/15/06, Ilia S. Yatsenko <[EMAIL PROTECTED]> wrote:
> > Hello
> >
> >
> >
> > Sorry my little English
> >
> >
> >
> > I use nutch-0.7.1 and have issue with html parser
> >
> >
> >
> > I got in summary javascript code and don't know how to remove it. For
> > example
> >
> >
> >
> > . \n'); } if (plugin) { document.write(' '); document.write(' ');
> > document.write(' '); document.write(' '); document.write(' ');
> > document.write ...
> >
> >
> >
> > Or http://62.141.52.208:8080/dual/search.jsp?query=document.write :)
> >
> >
> >
> > This is my nutch-site.plugin line:
> >
> > <property>
> >
> >
>
<value>nutch-extensionpoints|protocol-(http|httpclient)|urlfilter-regex|pars
> > e-html|index-(basic|more)|query-(more|stemmer|site|url)</value>
> >
> > </property>
> >
> >
> >
> > Can anybody help me?
> >
> >
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Reply via email to