Author: snagel
Date: Fri Dec  5 19:53:35 2014
New Revision: 1643412

URL: http://svn.apache.org/r1643412
Log:
NUTCH-1877 Suffix URL filter to ignore query string by default

Modified:
    nutch/branches/2.x/CHANGES.txt
    nutch/branches/2.x/conf/suffix-urlfilter.txt.template
    nutch/trunk/CHANGES.txt
    nutch/trunk/conf/suffix-urlfilter.txt.template

Modified: nutch/branches/2.x/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/CHANGES.txt?rev=1643412&r1=1643411&r2=1643412&view=diff
==============================================================================
--- nutch/branches/2.x/CHANGES.txt (original)
+++ nutch/branches/2.x/CHANGES.txt Fri Dec  5 19:53:35 2014
@@ -2,6 +2,8 @@ Nutch Change Log
 
 Current Development 2.3-SNAPSHOT
 
+* NUTCH-1877 Suffix URL filter to ignore query string by default (markus via 
snagel)
+
 * NUTCH-1825 protocol-http may hang for certain web pages (Phu Kieu via snagel)
 
 * NUTCH-1483 Can't crawl filesystem with protocol-file plugin (Rogério 
Pereira Araújo, Mengying Wang, snagel)

Modified: nutch/branches/2.x/conf/suffix-urlfilter.txt.template
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/conf/suffix-urlfilter.txt.template?rev=1643412&r1=1643411&r2=1643412&view=diff
==============================================================================
--- nutch/branches/2.x/conf/suffix-urlfilter.txt.template (original)
+++ nutch/branches/2.x/conf/suffix-urlfilter.txt.template Fri Dec  5 19:53:35 
2014
@@ -16,8 +16,19 @@
 
 # case-insensitive, allow unknown suffixes
 +I
-# uncomment the line below to filter on url path
-#+P
+
+# filter on URL path only
++P
+# comment out to filter on complete URL
+# but be aware that the pattern
+#    .com
+#  will then reject
+#    http://xyz.com
+#    http://xyz.com/search?q=foo.com
+#  while the pattern
+#    .mp3
+#  will not apply to (URLs will pass)
+#    http://xyz.com/music.mp3?q=abc
 
 ### prohibit these
 # pictures

Modified: nutch/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1643412&r1=1643411&r2=1643412&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Fri Dec  5 19:53:35 2014
@@ -2,6 +2,8 @@ Nutch Change Log
 
 Nutch Current Development 1.10-SNAPSHOT
 
+* NUTCH-1877 Suffix URL filter to ignore query string by default (markus via 
snagel)
+
 * NUTCH-1890 Major Typo in Documentation for Integrating Nutch and Solr (Boadu 
Akoto Charles Jnr, mattmann)
 
 * NUTCH-1887 Specify HTMLMapper to use in TikaParser (jnioche)

Modified: nutch/trunk/conf/suffix-urlfilter.txt.template
URL: 
http://svn.apache.org/viewvc/nutch/trunk/conf/suffix-urlfilter.txt.template?rev=1643412&r1=1643411&r2=1643412&view=diff
==============================================================================
--- nutch/trunk/conf/suffix-urlfilter.txt.template (original)
+++ nutch/trunk/conf/suffix-urlfilter.txt.template Fri Dec  5 19:53:35 2014
@@ -2,8 +2,19 @@
 
 # case-insensitive, allow unknown suffixes
 +I
-# uncomment the line below to filter on url path
-#+P
+
+# filter on URL path only
++P
+# comment out to filter on complete URL
+# but be aware that the pattern
+#    .com
+#  will then reject
+#    http://xyz.com
+#    http://xyz.com/search?q=foo.com
+#  while the pattern
+#    .mp3
+#  will not apply to (URLs will pass)
+#    http://xyz.com/music.mp3?q=abc
 
 ### prohibit these
 # pictures


Reply via email to