-filter is just a binary flag only, right? How do I specify the actual pattern file then?
On Sat, Nov 3, 2012 at 4:16 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > > Markus was referring to the -filter flag you can add to your solrindex > command. Please take a look at the relevant wiki entry [0] > > You should be able to point this to a specific regex or automaton > urlfiler file and achieve what you want... hopefully without dabbling > in Java and indexing filters. > > hth > > Lewis > > [0] http://wiki.apache.org/nutch/bin/nutch%20solrindex > > On Sat, Nov 3, 2012 at 3:57 AM, Joe Zhang <[email protected]> wrote: > > Markus gave me a little hint, but he's not available today. And This is > an > > urgent issue. > > > > The question is simple (nutch 1.5.1 and solr 3.6.1 working together): > > > > - The URL patterns in regex-urlfilter.txt control the behavior of > crawling, > > i.e., which pages to visit (or not to visit) > > - What I need to do is to specificy **which pages to be indexed by solr** > > (this is a subset of the pages visited) --> I wonder whether there is a > > place to specify such URL patterns. > > > > -- > Lewis >

