parse-(html)does follow links of full html page, parse-(tika) does follow any 
links and stops at level 1

                 Key: NUTCH-817
             Project: Nutch
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.1
         Environment: Suse linux 11.1, java version "1.6.0_13"
            Reporter: matthew a. grisius

submitted per Julien Nioche. I did not see where to attach a file so I pasted 
it here. btw: Tika command line returns empty html body for this file.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" 




<!-- Generated by javadoc on Fri Mar 28 17:23:42 EDT 2008-->


Matrix Application Development Kit


<SCRIPT type="text/javascript">

    targetPage = "" +;

    if (targetPage != "" && targetPage != "undefined")

       targetPage = targetPage.substring(1);

    function loadFrames() {

        if (targetPage != "" && targetPage != "undefined")

             top.classFrame.location = top.targetPage;






<FRAMESET cols="20%,80%" title="" onLoad="top.loadFrames()">

<FRAMESET rows="30%,70%" title="" onLoad="top.loadFrames()">

<FRAME src="overview-frame.html" name="packageListFrame" title="All Packages">

<FRAME src="allclasses-frame.html" name="packageFrame" title="All classes and 
interfaces (except non-static nested types)">


<FRAME src="overview-summary.html" name="classFrame" title="Package, class and 
interface descriptions" scrolling="yes">



Frame Alert</H2>


This document is designed to be viewed using the frames feature. If you see 
this message, you are using a non-frame-capable web client.


Link to<A HREF="overview-summary.html">Non-frame version.</A>




This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to