Not that i know of. You would have to create a custom parser filter plugin 
which is straighforward. Then in the overloaded filter() you would apply XPath 
queries on the passed DocumentFragment and set the values you get in the 
metadata and return it. Check the source of the headings plugin, it looks for h 
elements via Node objects but it's similar to what  you would need.

 
 
-----Original message-----
> From:Reyes, Mark <[email protected]>
> Sent: Tuesday 12th November 2013 18:41
> To: [email protected]
> Subject: Re: Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC
> 
> Any GOOD documentation on how to pursue the Xpath route for making the
> Nutch crawl more atomic/specific?
> 
> Thanks again,
> Mark
> ---------------------------------------------------------------------------
> ------------
> 
> P. 866.475.0317 x 3244
> Bridgepoint Education
> INNOVATIVE SOLUTIONS THAT ADVANCE LEARNING SM
> 
> 
> 
> 
> On 11/11/13, 11:16 AM, "Markus Jelsma" <[email protected]> wrote:
> 
> >Ah yes, this is probably about extracting it from pages, not returning
> >it. Headings can be extracted using the headings plugin which is
> >available in 1.7. You can also use Xpath for extraction but there's not a
> >plugin available yet plus it won't work with parse-tika.
> > 
> >-----Original message-----
> >> From:Olle Romo <[email protected]>
> >> Sent: Monday 11th November 2013 19:50
> >> To: [email protected]
> >> Subject: Re: Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC
> >> 
> >> Hi Mark,
> >> 
> >> Not sure if this is exactly what you're looking for but maybe try the
> >>whitelist_blacklist_plugin from NUTCH-585
> >>https://issues.apache.org/jira/browse/NUTCH-585
> >> 
> >> Best,
> >> Olle
> >> 
> >> On Nov 11, 2013, at 7:01 PM, "Reyes, Mark" <[email protected]>
> >>wrote:
> >> 
> >> > Hi:
> >> > 
> >> > I¹m using Nutch 1.7 to crawl/index the pages of my domain to Solr and
> >>JavaScript library AJAX Solr to capture that index as JSON, which would
> >>then print that to the front-end.
> >> > 
> >> > My question is, if it¹s possible to have specific content return
> >>(i.e. An H2 tag and a p tag) on the search results page versus all
> >>contents of that page?
> >> > 
> >> > Thank you,
> >> > Mark
> >> > 
> >> > 
> >> > IMPORTANT NOTICE: This e-mail message is intended to be received only
> >>by persons entitled to receive the confidential information it may
> >>contain. E-mail messages sent from Bridgepoint Education may contain
> >>information that is confidential and may be legally privileged. Please
> >>do not read, copy, forward or store this message unless you are an
> >>intended recipient of it. If you received this transmission in error,
> >>please notify the sender by reply e-mail and delete the message and any
> >>attachments.
> >> 
> >> 
> 
> 
> IMPORTANT NOTICE: This e-mail message is intended to be received only by 
> persons entitled to receive the confidential information it may contain. 
> E-mail messages sent from Bridgepoint Education may contain information that 
> is confidential and may be legally privileged. Please do not read, copy, 
> forward or store this message unless you are an intended recipient of it. If 
> you received this transmission in error, please notify the sender by reply 
> e-mail and delete the message and any attachments.

Reply via email to