Hi, Markus was referring to the -filter flag you can add to your solrindex command. Please take a look at the relevant wiki entry [0]
You should be able to point this to a specific regex or automaton urlfiler file and achieve what you want... hopefully without dabbling in Java and indexing filters. hth Lewis [0] http://wiki.apache.org/nutch/bin/nutch%20solrindex On Sat, Nov 3, 2012 at 3:57 AM, Joe Zhang <[email protected]> wrote: > Markus gave me a little hint, but he's not available today. And This is an > urgent issue. > > The question is simple (nutch 1.5.1 and solr 3.6.1 working together): > > - The URL patterns in regex-urlfilter.txt control the behavior of crawling, > i.e., which pages to visit (or not to visit) > - What I need to do is to specificy **which pages to be indexed by solr** > (this is a subset of the pages visited) --> I wonder whether there is a > place to specify such URL patterns. -- Lewis

