[ http://issues.apache.org/jira/browse/NUTCH-33?page=comments#action_62341 ] Jerome Charron commented on NUTCH-33: -------------------------------------
[John] Though not ideal, a system wide property is probably the easiest way to ensure behavior consistency among tools and plugins. [Jerome] Ok, I will add a system wide property in nutch-default.xml (the caller code can choose to do magic resolution by calling the right method) What is your opinion about this point: 1. Is it the calling code that check the mime.magic property and call the getMimeType(String) or getMimeType(String, byte[]) depending on the value, 2. or is it the getMimeType(String, byte[]) method that must check the mime.magic property and uses the magic resolution if the flag is true? The first second one is better for consistency. But the second one is strange: You call a getMimeType(String, byte[]) instead of getMimeType(String) so the developper expects to uses the magic analyzis to be performed.... [John] Yes, it's better to follow jaf's api. [Jerome] That's done: * Refactoring to org.apache.nutch.util.mime * Uses a public MimeType object with parsing capabilites (using Hari Kodungallu's code) * new patch version for protocol-file and protocol-ftp plugins * add new patch for protocol-http and index-more plugins (index-more no more needs jaf). * unit regression tests are ok Todo: * Add the mime.magic property * Perform some functional tests > MIME content type detector (using magic char sequences) > ------------------------------------------------------- > > Key: NUTCH-33 > URL: http://issues.apache.org/jira/browse/NUTCH-33 > Project: Nutch > Type: New Feature > Reporter: Jerome Charron > Assignee: John Xing > Priority: Minor > Attachments: NUTCH-33.patch, mime-types.tar.gz > > Extension based content-type detector is not suffisant in some cases. > The solution is to add a content type detector based on some magic char > sequences like in apache httpd for instance. > (Note: I created this issue only to keep a trace, but I'm currently working > on it) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers