Hi Sebastian,

Thank you sir!

Two things you provided solved the problem for me!  One was the correct
syntax for the regex but the other was when you provided the info on the
indexchecker command.  Part of what i was dealing with was not having much
to go on when debugging and that command helped a lot!

In addition, the following line gave me an important clue:

-Dplugin.includes='exchange-jexl|protocol-okhttp|parse-html|indexer-solr|index-(basic|more)'

I realized that I did not have  exchange-jexl  listed as a plug-in to
include via my nutch-site.xml config file.  I'd have never have figured
that out without the clue you provided.

The exchanges are working, content is going into the right collections,
life is good!

Thank you again!

Best,

Dave Beckstrom
*Fig Leaf Software* <http://www.figleaf.com/> | "We've Got You Covered"
*Service-Disabled Veteran-Owned Small Business (SDVOSB)*
763-323-3499
dbeckst...@figleaf.com


On Tue, Mar 5, 2019 at 12:44 PM Sebastian Nagel
<wastl.na...@googlemail.com.invalid> wrote:

> Hi Dave,
>
> I'm by now means an expert of the JEXL syntax (cf.
> (http://commons.apache.org/proper/commons-jexl/reference/syntax.html)
> but after a few trials the expression must be
>
>  doc.getFieldValue('url')=~'.*/englishnews/.*'
>
> It's easy to test using the indexchecker, e.g.
>  % bin/nutch indexchecker
>
> -Dplugin.includes='exchange-jexl|protocol-okhttp|parse-html|indexer-solr|index-(basic|more)'
> -DdoIndex=true   http://...
>
> If you want to improve the Wiki page
>    https://wiki.apache.org/nutch/Exchanges
> we're happy to grant you write access to the wiki, see
>    https://wiki.apache.org/nutch/
>
> Best,
> Sebastian
>
>
> On 3/5/19 4:06 PM, Dave Beckstrom wrote:
> > Ryan and Roannel,
> >
> > Thank you guys so much for your replies.  I didn't realize it but I was
> not
> > seeing all of the emails from you.
> >
> > Roannel you sent some really helpful replies that never came in as an
> > email.  I found your replies when I browsed the web-based archives on the
> > apache site.   I wanted to make sure I thanked you for your help!!!
> >
> > I can't find one example of an exchanges.xml other than what ships with
> > Nutch.   I'm really in the blind trying to get the exchanges to work.  I
> > believe this may be the last item I need help with and then I'll have
> Nutch
> > working the way I need it to.  Any help you can offer would be GREATLY
> > appreciated.
> >
> > Let's say I have a document that was crawled and the URL for the document
> > was as follows:
> >
> >
> http://www.somedomain.com/news/englishnews/2018/this-is-my-news-article.cfm
> >
> > Here is the expression I have coded in exchanges.xml:
> >
> > <param name="expr" value="doc.getFieldValue('url')=~'/englishnews/'" />
> >
> > That expression is not triggering.  As near as I can tell the "=~" is the
> > "contains" expression.  The idea being if the url contains "englishnews"
> > then this expression should trigger.  I believe the slashes around
> > "englishnews" makes it function as a regular expression, which should
> > evaluate to true, rather then a string compare.
> >
> > If anyone can help get me past this final road block I would greatly
> > appreciate the help!  I spent an entire day on this yesterday and got
> > nowhere.
> >
> > Thank you!
> >
> > Dave
> >
>
>

-- 
*Fig Leaf Software, Inc.* 
https://www.figleaf.com/ 
<https://www.figleaf.com/>  

Full-Service Solutions Integrator






Reply via email to