[ http://issues.apache.org/jira/browse/NUTCH-34?page=comments#action_63142 ] Stephan Strittmatter commented on NUTCH-34: -------------------------------------------
Andrzej, about the boolean flag, this is a good objection. You are right it is better to provide this flag also. Then the fetcher can be limited to a specific file size to be fetched (the parser plugin can not handle partial content) and then files bigger than the defined value are completley ignored. > Parsing different content formats > --------------------------------- > > Key: NUTCH-34 > URL: http://issues.apache.org/jira/browse/NUTCH-34 > Project: Nutch > Type: Improvement > Components: fetcher > Reporter: Stephan Strittmatter > Priority: Trivial > > At the moment Nuch is set up to filter content by config the xml-config file. > There it is also set global how many bytes are loaded. > I think it yould be better to let the parser plugins "register" themselfe in > some registry where every plugin could tell the fetcher, that: > 1. this document type is wanted (because the parser plugin is > installed and activated) > 2. how much of the content is required (some plugins need the whole > content and some not) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by: New Crystal Reports XI. Version 11 adds new functionality designed to reduce time involved in creating, integrating, and deploying reporting solutions. Free runtime info, new features, or free trial, at: http://www.businessobjects.com/devxi/728 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
