rubdabadub wrote:
> On 3/2/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>> Dennis Kubes wrote:
>> > Believe it or not I don't think that meta tags are currently stored.
>> > I looked through the html parsing code and didn't see anywhere that it
>> > could be storing it except in html filters.  I see that meta tags are
>> > parsed and passed to the html filters but I didn't see any default
>> > filter that was storing them.
>> >
>> > If there isn't a reason why we shouldn't be storing meta tags, if we
>> > aren't currently storing them (I could be missing where this is
>> > happening :) ), and this is something that people want then I can
>> > create an html filter that will store the meta-tags in the Parse
>> > MetaData.
>
> Yes!! Please that would be nice.  Maybe we can do metatag-parse, 
> metatag-index
> metatag-query?? no?? This way those who want this can turn it on as a
> plugin?? no??
>
>> The reason is simple - space. Storing additional data consumes space,
>> and if someone just occasionally needs this info from one or two pages
>> it's less costly to re-parse the page again.
>
> Oh I see. Now I understand. But I wonder what is the MetaData parser
> doing really? is it being used anywhere in the crawl-index life cycle 
> at all?
> Just wondering...

We need to parse metatags in order to determine the robot settings and 
possible redirects. So, it doesn't cost to pass them to 
HtmlParseFilters. Now, you are free to implement your own 
HtmlParseFilter that uses these metatags in any way you wish, among 
others you may stuff all metatags in ParseData and/or the index - 
keeping in mind that this will cost you some disk space ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to