How exactly should I write it? This is what it looks like now: <property> <name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. In order to use HTTPS please enable protocol-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library. </description> </property> On Sun, Nov 4, 2012 at 2:07 PM, Lewis John Mcgibbney < [email protected]> wrote: > In your nutch-default.xml... which should be overridden in > nutch-site.xml prior to compiling if using the src distribution. > > > > On Sun, Nov 4, 2012 at 6:10 PM, Joe Zhang <[email protected]> wrote: > > the plugin.includes property is where? > > > > On Sun, Nov 4, 2012 at 4:53 AM, Lewis John Mcgibbney < > > [email protected]> wrote: > > > >> Please ensure you have the correct spacing and string formatting when > >> executing command line tasks. > >> > >> you don't seem to have a space between your crawldb directory and the > >> solr core. It is my understanding that the -filter command does not > >> take a parameter... this will pick up your urlfilter as specified in > >> plugin.includes property... > >> > >> On Sun, Nov 4, 2012 at 6:10 AM, Joe Zhang <[email protected]> wrote: > >> > >> Lewis > >> > > > > -- > Lewis >

