[ https://issues.apache.org/jira/browse/NUTCH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche reassigned NUTCH-817: ----------------------------------- Assignee: Julien Nioche > parse-(html)does follow links of full html page, parse-(tika) does follow any > links and stops at level 1 > -------------------------------------------------------------------------------------------------------- > > Key: NUTCH-817 > URL: https://issues.apache.org/jira/browse/NUTCH-817 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 1.1 > Environment: Suse linux 11.1, java version "1.6.0_13" > Reporter: matthew a. grisius > Assignee: Julien Nioche > Attachments: sample-javadoc.html > > > submitted per Julien Nioche. I did not see where to attach a file so I pasted > it here. btw: Tika command line returns empty html body for this file. > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" > "http://www.w3.org/TR/html4/frameset.dtd"> > <!--NewPage--> > <HTML> > <HEAD> > <!-- Generated by javadoc on Fri Mar 28 17:23:42 EDT 2008--> > <TITLE> > Matrix Application Development Kit > </TITLE> > <SCRIPT type="text/javascript"> > targetPage = "" + window.location.search; > if (targetPage != "" && targetPage != "undefined") > targetPage = targetPage.substring(1); > function loadFrames() { > if (targetPage != "" && targetPage != "undefined") > top.classFrame.location = top.targetPage; > } > </SCRIPT> > <NOSCRIPT> > </NOSCRIPT> > </HEAD> > <FRAMESET cols="20%,80%" title="" onLoad="top.loadFrames()"> > <FRAMESET rows="30%,70%" title="" onLoad="top.loadFrames()"> > <FRAME src="overview-frame.html" name="packageListFrame" title="All Packages"> > <FRAME src="allclasses-frame.html" name="packageFrame" title="All classes and > interfaces (except non-static nested types)"> > </FRAMESET> > <FRAME src="overview-summary.html" name="classFrame" title="Package, class > and interface descriptions" scrolling="yes"> > <NOFRAMES> > <H2> > Frame Alert</H2> > <P> > This document is designed to be viewed using the frames feature. If you see > this message, you are using a non-frame-capable web client. > <BR> > Link to<A HREF="overview-summary.html">Non-frame version.</A> > </NOFRAMES> > </FRAMESET> > </HTML> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.