Hi Vicky,

What's the exact use case?
- process and any XML
- or extract content from HTML and index it as separate fields

> Is https://issues.apache.org/jira/browse/NUTCH-1644 is going to make in next
> release?

Difficult to predict, only if someone takes the work to finalize it.
There are also 2 competing solutions referenced in NUTCH-1644.

I've done some work last year on
  https://issues.apache.org/jira/browse/NUTCH-1870
see
  https://github.com/sebastian-nagel/nutch/tree/NUTCH-1870
but it's still not ready and tested entirely ;(

> In the above implementation the configuration is done statically, we need an
> implementation that will pick up the url,xpath configurations, message
> schema and destination from the storage.

That means a different configuration per document/page or per host/domain?

Btw., NUTCH-1870 allows to apply XSLs only if the URL matches a pattern.


Best,
Sebastian


On 01/18/2017 07:19 AM, vickyk wrote:
> I have been looking for the xpath integration in the nutch, I was able to see
> that some work have been done in this area, however that is not exactly what
> we require.
> This one is the nutch JIRA 
> https://issues.apache.org/jira/browse/NUTCH-1644
> In the above implementation the configuration is done statically, we need an
> implementation that will pick up the url,xpath configurations, message
> schema and destination from the storage.
> 
> I can start with the static and test the stuff, and thereafter have our
> custom plug-in implementation which will get the details from the Database. 
> Is https://issues.apache.org/jira/browse/NUTCH-1644 is going to make in next
> release?
> 
> Thanks,
> Vicky
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Dymanic-Xpath-plugin-tp4314525.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to