Markus, Thanks, I will try that too.. I am a newbie. will have to read up the source for index-metatags better.
On Tue, May 27, 2014 at 7:23 PM, Markus Jelsma <[email protected]>wrote: > Hi - i think i would implement a custom parser filter that looks for > specific tags and attributes and add it to the parse meta data. Using the > index-metatags plugin i would then have those newly added fields indexed. > > > Markus > > > -----Original message----- > From:Alan Francis <[email protected]> > Sent:Tue 27-05-2014 15:47 > Subject:Identifying Video Links in Pages > To:[email protected]; > I have a use case in which we want to separate pages which have an iframe > embed tag from youtube. and add it as a additional field for indexing. > > I am using apache Nutch 1.8 with Solr 4.8 > > What I have done so far is to over-ride the "parse-html" plugin and > identify iframe tags with youtube urls in ComContentUtils.getTextHelper() > and append it in "content" with some special tags > > I then receive the content in an Custom Indexing filter plugin to extract > the urls from the content and add it as a new field in NutchDocument. > > Is there a better way to do this? > > > > -- > -Alan Francis > -- -Alan Francis

