Yakn wrote: > I am using the SegmentReader, iterating over content. I have the Content, > ParseData, and ParseText objects, and I am looking for a way to get access > to the meta tags in the header of the HTML in my Content object. Is there > anyway to get access to these meta tags? I do not want to have to use the > HtmlParseFilter. The only way I have seen for this to work is to use the > HTMLMetaTags.
That's the whole purpose of HtmlParseFilters - Nutch doesn't store all meta tags in ParseData - it would take too much space, and in general case it's not so useful, because all critical information (robot directives, redirects) we already handle. If you want to use meta tags in any other way you should implement a simple HtmlParseFilter that will put all meta tags into ParseData. > > Can I get what I need from Content? Please help, thanks. If you parse it again - then yes. Otherwise you need to use HtmlParseFilter. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers