JSParseFilter produces weired URL
---------------------------------

                 Key: NUTCH-807
                 URL: https://issues.apache.org/jira/browse/NUTCH-807
             Project: Nutch
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0.0
         Environment: Redhat 2.6.18-128.1.6.el5PAE  i686 i686 i386 GNU/Linux
            Reporter: Minyao Zhu


This is found when crawling site: http://zhidao.baidu.com/    ( a Chinese 
language site )

It appears this page contains javascripts which confused JSParseFilter, which 
produced URL like this:

http://zhidao.baidu.com/){if(A===46){baidu.hide(

Not sure the impact/scope of this issue in general.  The observation for this 
specific site is, much less pages got crawled.

Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to