Sorry in my previous posting the output of nutch "readseg -get" was wrong .. here is the actual output:
-Corrado SegmentReader: get 'http://testmachine.test.net/index.html' Content:: Version: 2 url: http://testmachine.test.net/index.html base: http://testmachine.test.net/index.html contentType: text/html metadata: Content-Length=345 Connection=close ETag="2f4ac-159-421166c12a140" nutch.segment.name=20061108113703 nutch.crawl.score=1.0 Recommended=plugins nutch.content.digest=82e307c71d7476ce729a8e6d3b0de50a Accept-Ranges=bytes Server=Apache/2.2.0 (Fedora) Content-Type=text/html; charset=UTF-8 date=Wed, 08 Nov 2006 10:37:57 GMT Last-Modified=Tue, 31 Oct 2006 07:34:53 GMT Content: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> <HTML> <HEAD> <TITLE> PLUG-IN TEST </TITLE> </HEAD> <meta name="recommended" content="plugins"> <A HREF="http://testmachine.test.net/omniORB/index.html">omniORB</A> <BR> <A HREF="http://testmachine.test.net/nutch/index.html">Nutch</A> </HTML> Crawl Generate:: Version: 4 Status: 1 (DB_unfetched) Fetch time: Wed Nov 08 11:36:31 CET 2006 Modified time: Thu Jan 01 01:00:00 CET 1970 Retries since fetch: 0 Retry interval: 30.0 days Score: 1.0 Signature: null Metadata: null Crawl Fetch:: Version: 4 Status: 5 (fetch_success) Fetch time: Wed Nov 08 11:37:58 CET 2006 Modified time: Thu Jan 01 01:00:00 CET 1970 Retries since fetch: 0 Retry interval: 30.0 days Score: 1.0 Signature: 82e307c71d7476ce729a8e6d3b0de50a Metadata: null Crawl Parse:: Version: 4 Status: 4 (linked) Fetch time: Wed Nov 08 11:38:05 CET 2006 Modified time: Thu Jan 01 01:00:00 CET 1970 Retries since fetch: 0 Retry interval: 30.0 days Score: 0.5 Signature: null Metadata: null ParseData:: Version: 5 Status: success(1,0) Title: PLUG-IN TEST Outlinks: 2 outlink: toUrl: http://testmachine.test.net/omniORB/index.html anchor: omniORB outlink: toUrl: http://testmachine.test.net/nutch/index.html anchor: Nutch Content Metadata: Connection=close Content-Length=345 nutch.crawl.score=1.0 nutch.segment.name=20061108113703 ETag="2f4ac-159-421166c12a140" Recommended=plugins nutch.content.digest=82e307c71d7476ce729a8e6d3b0de50a Accept-Ranges=bytes Content-Type=text/html; charset=UTF-8 Server=Apache/2.2.0 (Fedora) Last-Modified=Tue, 31 Oct 2006 07:34:53 GMT date=Wed, 08 Nov 2006 10:37:57 GMT Parse Metadata: OriginalCharEncoding=UTF-8 CharEncodingForConversion=UTF-8 ParseText:: PLUG-IN TEST omniORB Nutch ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
