Hi list, I'm working on configuring Nutch with ElasticSearch to provide a website search functionality. I've been reading the Nutch documentation and the NutchTutorial. In the NutchTutorial on the Wiki, the section "Configure Regular Expression Filters" gives the example of:
+^http://([a-z0-9]*\.)*nutch.apache.org/ However, I am a bit confused by this. Firstly, do / not need to be escaped as usual in a regular expression? As in ^http:\/\/(a-z..... instead of ^http://(a-z.... Also, I notice the first period is escaped, but the two periods in "nutch.apache.org" are not escaped. Periods are normally wildcards in regular expressions, hence my confusion. Is this an error in the documentation? Are these regexes PCRE or POSIX? Thank you! Steve

