Hi Dave,

I'm by now means an expert of the JEXL syntax (cf.
(http://commons.apache.org/proper/commons-jexl/reference/syntax.html)
but after a few trials the expression must be

 doc.getFieldValue('url')=~'.*/englishnews/.*'

It's easy to test using the indexchecker, e.g.
 % bin/nutch indexchecker
-Dplugin.includes='exchange-jexl|protocol-okhttp|parse-html|indexer-solr|index-(basic|more)'
-DdoIndex=true   http://...

If you want to improve the Wiki page
   https://wiki.apache.org/nutch/Exchanges
we're happy to grant you write access to the wiki, see
   https://wiki.apache.org/nutch/

Best,
Sebastian


On 3/5/19 4:06 PM, Dave Beckstrom wrote:
> Ryan and Roannel,
> 
> Thank you guys so much for your replies.  I didn't realize it but I was not
> seeing all of the emails from you.
> 
> Roannel you sent some really helpful replies that never came in as an
> email.  I found your replies when I browsed the web-based archives on the
> apache site.   I wanted to make sure I thanked you for your help!!!
> 
> I can't find one example of an exchanges.xml other than what ships with
> Nutch.   I'm really in the blind trying to get the exchanges to work.  I
> believe this may be the last item I need help with and then I'll have Nutch
> working the way I need it to.  Any help you can offer would be GREATLY
> appreciated.
> 
> Let's say I have a document that was crawled and the URL for the document
> was as follows:
> 
> http://www.somedomain.com/news/englishnews/2018/this-is-my-news-article.cfm
> 
> Here is the expression I have coded in exchanges.xml:
> 
> <param name="expr" value="doc.getFieldValue('url')=~'/englishnews/'" />
> 
> That expression is not triggering.  As near as I can tell the "=~" is the
> "contains" expression.  The idea being if the url contains "englishnews"
> then this expression should trigger.  I believe the slashes around
> "englishnews" makes it function as a regular expression, which should
> evaluate to true, rather then a string compare.
> 
> If anyone can help get me past this final road block I would greatly
> appreciate the help!  I spent an entire day on this yesterday and got
> nowhere.
> 
> Thank you!
> 
> Dave
> 

Reply via email to