I was reading through the FAQ and had a follow-up to one of the
questions on there.  Here's what's on the FAQ:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Is it possible to fetch only pages from some specific domains?

Please have a look on PrefixURLFilter. Adding some regular expressions
to the urlfilter.regex.file might work, but adding a list with thousands
of regular expressions would slow down your system excessively.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        I see the urlfilter.prefix.file entry in conf/nutch-default.xml,
but don't see any corresponding file (regex-urlfilter.txt).  Am I just
missing it, or does it need to be created from scratch.  If the later,
what is the format?  I'll update the FAQ with the answers.

Thanks,
Jake.

Reply via email to