Thanks for getting back to me Jérôme, Would you suggest I jump into the Tokenizer? Would we need to differentiate indexing, summaries, and/or anchors (as google claims to do)? Should I target 0.7.2 or 0.8-dev?
Aside, perhaps we should add the modified date field (as NutchWax and others do). Alex
But since there is no specification about this, you should probably support the most used : * <!-- robots content="none" --> * <noindex> * <!-- googleon ... --> <!-- googleoff ... -->
-- 55.67N 12.588E CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1 -- ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
