Piotr Kosiorowski wrote:
Hello,

I am attaching the patch in "svn diff" format. I hope it is ok - I do
[...]

Index: src/java/org/apache/nutch/analysis/NutchDocumentAnalyzer.java
===================================================================
--- src/java/org/apache/nutch/analysis/NutchDocumentAnalyzer.java       
(revision 158818)
+++ src/java/org/apache/nutch/analysis/NutchDocumentAnalyzer.java       
(working copy)
@@ -77,8 +77,9 @@
   /** Returns a new token stream for text from the named field. */
   public TokenStream tokenStream(String fieldName, Reader reader) {
     Analyzer analyzer;
-    if ("url".equals(fieldName) || ("anchor".equals(fieldName)))
-      analyzer = ANCHOR_ANALYZER;
+    if ("url".equals(fieldName) || ("anchor".equals(fieldName))
+                || ("host".equals(fieldName)) || ("title".equals(fieldName)))
+            analyzer = ANCHOR_ANALYZER;
     else
       analyzer = CONTENT_ANALYZER;

Could somebody confirm/deny my analysis in the previous post, that the use of ANCHOR_ANALYZER for "url" is wrong, and the CONTENT_ANALYZER should be used instead?


--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.net email is sponsored by Microsoft Mobile & Embedded DevCon 2005
Attend MEDC 2005 May 9-12 in Vegas. Learn more about the latest Windows
Embedded(r) & Windows Mobile(tm) platforms, applications & content.  Register
by 3/29 & save $300 http://ads.osdn.com/?ad_id=6883&alloc_id=15149&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to