Hi Vicky, What's the exact use case? - process and any XML - or extract content from HTML and index it as separate fields
> Is https://issues.apache.org/jira/browse/NUTCH-1644 is going to make in next > release? Difficult to predict, only if someone takes the work to finalize it. There are also 2 competing solutions referenced in NUTCH-1644. I've done some work last year on https://issues.apache.org/jira/browse/NUTCH-1870 see https://github.com/sebastian-nagel/nutch/tree/NUTCH-1870 but it's still not ready and tested entirely ;( > In the above implementation the configuration is done statically, we need an > implementation that will pick up the url,xpath configurations, message > schema and destination from the storage. That means a different configuration per document/page or per host/domain? Btw., NUTCH-1870 allows to apply XSLs only if the URL matches a pattern. Best, Sebastian On 01/18/2017 07:19 AM, vickyk wrote: > I have been looking for the xpath integration in the nutch, I was able to see > that some work have been done in this area, however that is not exactly what > we require. > This one is the nutch JIRA > https://issues.apache.org/jira/browse/NUTCH-1644 > In the above implementation the configuration is done statically, we need an > implementation that will pick up the url,xpath configurations, message > schema and destination from the storage. > > I can start with the static and test the stuff, and thereafter have our > custom plug-in implementation which will get the details from the Database. > Is https://issues.apache.org/jira/browse/NUTCH-1644 is going to make in next > release? > > Thanks, > Vicky > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Dymanic-Xpath-plugin-tp4314525.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

