Hi,

Markus was referring to the -filter flag you can add to your solrindex
command. Please take a look at the relevant wiki entry [0]

You should be able to point this to a specific regex or automaton
urlfiler file and achieve what you want... hopefully without dabbling
in Java and indexing filters.

hth

Lewis

[0] http://wiki.apache.org/nutch/bin/nutch%20solrindex

On Sat, Nov 3, 2012 at 3:57 AM, Joe Zhang <[email protected]> wrote:
> Markus gave me a little hint, but he's not available today. And This is an
> urgent issue.
>
> The question is simple (nutch 1.5.1 and solr 3.6.1 working together):
>
> - The URL patterns in regex-urlfilter.txt control the behavior of crawling,
> i.e., which pages to visit (or not to visit)
> - What I need to do is to specificy **which pages to be indexed by solr**
> (this is a subset of the pages visited) --> I wonder whether there is a
> place to specify such URL patterns.



-- 
Lewis

Reply via email to