Yakn wrote:
> I am using the SegmentReader, iterating over content. I have the Content,
> ParseData, and ParseText objects, and I am looking for a way to get access
> to the meta tags in the header of the HTML in my Content object. Is there
> anyway to get access to these meta tags? I do not want to have to use the
> HtmlParseFilter. The only way I have seen for this to work is to use the
> HTMLMetaTags.

That's the whole purpose of HtmlParseFilters - Nutch doesn't store all 
meta tags in ParseData - it would take too much space, and in general 
case it's not so useful, because all critical information (robot 
directives, redirects) we already handle. If you want to use meta tags 
in any other way you should implement a simple HtmlParseFilter that will 
put all meta tags into ParseData.
> 
> Can I get what I need from Content? Please help, thanks.

If you parse it again - then yes. Otherwise you need to use HtmlParseFilter.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to