[ http://issues.apache.org/jira/browse/NUTCH-135?page=all ]
     
Jerome Charron resolved NUTCH-135:
----------------------------------

    Fix Version:     (was: 0.7.2-dev)
     Resolution: Fixed

Committed to trunk (to be merged into branche 0.7?)
Thanks Stefan.

I have performed unit and functional tests, but I don't have resources for a 
wide and intensive test.
If someone can perform such test, it would be greatly apreciated.

Note: During my tests, I notice some strange content-types returned by 
de.yahoo.com and all de.yahoo related files. The content-type returned by the 
protocol layer to the Content constructor is always text/plain, but when 
performing some wget on these sites the content-type in headers is text/html 
... sorry, I don't have time for more investigations..


> http header meta data are case insensitive in the real world (e.g. 
> Content-Type or content-type)
> ------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-135
>          URL: http://issues.apache.org/jira/browse/NUTCH-135
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.7, 0.7.1
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: cached.jsp.patch, contentProperties_patch.txt, 
> contentProperties_patch_WithContentProperties.txt
>
> As described in issue nutch-133, some webservers return http header meta data 
> not standard conform case insensitive.
> This provides many negative side effects, for example query thet content type 
> from the meta data return null also in case the webserver returns a content 
> type, but the key is not standard conform e.g. lower case. Also this has 
> effects to the pdf parser that queries the content length etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to