Thanks for getting back to me Jérôme,

Would you suggest I jump into the Tokenizer? Would we need to
differentiate indexing, summaries, and/or anchors (as google claims to
do)? Should I target 0.7.2 or 0.8-dev?

Aside, perhaps we should add the modified date field (as NutchWax and
others do).

Alex

But since there is no specification about this, you should probably
support the most used :
* <!-- robots content="none" -->
* <noindex>
* <!-- googleon ... -->  <!-- googleoff ... -->

--
55.67N 12.588E
CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1
--


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to