JSParseFilter produces weired URL

                 Key: NUTCH-807
                 URL: https://issues.apache.org/jira/browse/NUTCH-807
             Project: Nutch
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0.0
         Environment: Redhat 2.6.18-128.1.6.el5PAE  i686 i686 i386 GNU/Linux
            Reporter: Minyao Zhu

This is found when crawling site: http://zhidao.baidu.com/    ( a Chinese 
language site )

It appears this page contains javascripts which confused JSParseFilter, which 
produced URL like this:


Not sure the impact/scope of this issue in general.  The observation for this 
specific site is, much less pages got crawled.


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to