rubdabadub wrote: > On 3/2/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: >> Dennis Kubes wrote: >> > Believe it or not I don't think that meta tags are currently stored. >> > I looked through the html parsing code and didn't see anywhere that it >> > could be storing it except in html filters. I see that meta tags are >> > parsed and passed to the html filters but I didn't see any default >> > filter that was storing them. >> > >> > If there isn't a reason why we shouldn't be storing meta tags, if we >> > aren't currently storing them (I could be missing where this is >> > happening :) ), and this is something that people want then I can >> > create an html filter that will store the meta-tags in the Parse >> > MetaData. > > Yes!! Please that would be nice. Maybe we can do metatag-parse, > metatag-index > metatag-query?? no?? This way those who want this can turn it on as a > plugin?? no?? > >> The reason is simple - space. Storing additional data consumes space, >> and if someone just occasionally needs this info from one or two pages >> it's less costly to re-parse the page again. > > Oh I see. Now I understand. But I wonder what is the MetaData parser > doing really? is it being used anywhere in the crawl-index life cycle > at all? > Just wondering...
We need to parse metatags in order to determine the robot settings and possible redirects. So, it doesn't cost to pass them to HtmlParseFilters. Now, you are free to implement your own HtmlParseFilter that uses these metatags in any way you wish, among others you may stuff all metatags in ParseData and/or the index - keeping in mind that this will cost you some disk space ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
