Author: snagel
Date: Fri Aug 22 22:28:12 2014
New Revision: 1619944

URL: http://svn.apache.org/r1619944
Log:
NUTCH-1775 IndexingFilter: document origin of passed CrawlDatum

Modified:
    nutch/trunk/CHANGES.txt
    nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java

Modified: nutch/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1619944&r1=1619943&r2=1619944&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Fri Aug 22 22:28:12 2014
@@ -2,6 +2,8 @@ Nutch Change Log
 
 Nutch Current Development
 
+* NUTCH-1775 IndexingFilter: document origin of passed CrawlDatum (snagel)
+
 * NUTCH-1693 TextMD5Signature computed on textual content (Tien Nguyen Manh, 
markus via snagel)
 
 * NUTCH-1409 remove deprecated properties db.{default,max}.fetch.interval, 
generate.max.per.host.by.ip (Matthias Agethle via snagel)

Modified: nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java
URL: 
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java?rev=1619944&r1=1619943&r2=1619944&view=diff
==============================================================================
--- nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java (original)
+++ nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java Fri Aug 
22 22:28:12 2014
@@ -39,15 +39,22 @@ public interface IndexingFilter extends 
 
   /**
    * Adds fields or otherwise modifies the document that will be indexed for a
-   * parse. Unwanted documents can be removed from indexing by returning a 
null value.
+   * parse. Unwanted documents can be removed from indexing by returning a null
+   * value.
    * 
-   * @param doc document instance for collecting fields
-   * @param parse parse data instance
-   * @param url page url
-   * @param datum crawl datum for the page
-   * @param inlinks page inlinks
-   * @return modified (or a new) document instance, or null (meaning the 
document
-   * should be discarded)
+   * @param doc
+   *          document instance for collecting fields
+   * @param parse
+   *          parse data instance
+   * @param url
+   *          page url
+   * @param datum
+   *          crawl datum for the page (fetch datum from segment containing
+   *          fetch status and fetch time)
+   * @param inlinks
+   *          page inlinks
+   * @return modified (or a new) document instance, or null (meaning the
+   *         document should be discarded)
    * @throws IndexingException
    */
   NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum 
datum, Inlinks inlinks)


Reply via email to