I believe that the String.CASE_INSENSITIVE_ORDER
comparator only affects the way the keys are ordered
internally within the TreeMap. It would not affect
lookups, so headers.get(key) would still be a case
sensitive.
Perhaps subclassing Properties to make all gets and
puts case insensitive is the best solution.
--- Sven Wende <[EMAIL PROTECTED]> wrote:
> I think it�s only case insensitive in that TreeMap
> parserHeaders() produces
> !
>
> But in the constructor this map is copied into a
> Properties object.
>
> ********************************************
> // parse headers
> headers.putAll(parseHeaders(in, line));
> ********************************************
>
> Look at the following snippet, which does the same
> thing as
> HttpResponse.class does:
>
> ********************************************
> public static void main(String[] args) {
> TreeMap headers = new
> TreeMap(String.CASE_INSENSITIVE_ORDER);
> headers.put("content-type", "text");
>
> Properties headers2 = new Properties();
> headers2.putAll(headers);
>
>
> System.out.println(headers.get("Content-Type")); //
> = "text"
>
>
> System.out.println(headers2.get("Content-Type")); //
> = null
> }
> ********************************************
>
> You can use the following url for your tests:
>
> http://www.verdi.de/0x0ac80f2b_0x0069a759
>
> It is a PDF file and the server sends "Content-type:
> application/pdf" !
>
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
>
> >
>
[mailto:[EMAIL PROTECTED]
> On
> > Behalf Of Chirag Chaman
> > Sent: Mittwoch, 29. Dezember 2004 16:22
> > To: [EMAIL PROTECTED]
> > Subject: RE: [Nutch-dev] Fetch / Parse errors and
> a Bug
> >
> > That is strange, coz I would expect it to be case
> > insensitive, but then again I have not tested,
> just looking
> > at the code.
> >
> > You see how the TreeMap is initialized with
> > String.CASE_INSENSITIVE_ORDER
> >
> > private Map parseHeaders(PushbackInputStream in,
> StringBuffer line)
> > throws IOException, HttpException {
> > TreeMap headers = new
> TreeMap(String.CASE_INSENSITIVE_ORDER);
> > return parseHeaders(in, line, headers);
> >
> > So I would imagine that a look up for Content-Type
> is case
> > insensitive as well.
> >
> >
> > Can you send me the link to a page that has this
> problem --
> > I'll run some tests to see what's causing this.
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> >
>
[mailto:[EMAIL PROTECTED]
> On
> > Behalf Of Sven Wende
> > Sent: Wednesday, December 29, 2004 9:16 AM
> > To: [EMAIL PROTECTED]
> > Subject: RE: [Nutch-dev] Fetch / Parse errors and
> a Bug
> >
> > Chirag:
> >
> > > I looked at where you mention that the content
> type is
> > being looked up
> > > and is Case Sensitive -- that is not correct.
> The HTTP protocol is
> > > adding the Content-type to the TreeMap which is
> initialized
> > with the
> > > String.CASE_INSENSITIVE_ORDER comparator. Thus
> it
> > internally will do a
> > > case-insensitive match.
> >
> > Which code do you refer to?
> >
> > I described a problem in the protocoll-http
> plugin. Just take
> > a look at the following code snippet from the CVS.
> As you can
> > see, the headers are read in and stored in a
> simple Hashtable.
> > The problem with case sensitive headers for
> content-type occurs in the
> > toContent() method. (for example)
> >
> >
>
**************************************************************
> > **************
> > *****
> > package net.nutch.protocol.http;
> >
> > /** An HTTP response. */
> >
> > public class HttpResponse {
> > private Properties headers = new Properties();
>
> >
> > /** Returns the value of a named header. */
> > public String getHeader(String name) {
> > return (String)headers.get(name);
> > }
> >
> > public Content toContent() {
> > String contentType =
> getHeader("Content-Type");
> > if (contentType == null)
> > contentType = "";
> > return new Content(orig, base, content,
> contentType, headers);
> > }
> >
> > private void processHeaderLine(StringBuffer
> line, TreeMap headers)
> > throws IOException, HttpException {
> > int colonIndex = line.indexOf(":"); //
> key is up to colon
> > if (colonIndex == -1) {
> > int i;
> > for (i= 0; i < line.length(); i++)
> > if
> (!Character.isWhitespace(line.charAt(i)))
> > break;
> > if (i == line.length())
> > return;
> > throw new HttpException("No colon in
> header:" + line);
> > }
> > String key = line.substring(0, colonIndex);
> >
> > int valueStart = colonIndex+1; //
> skip whitespace
> > while (valueStart < line.length()) {
> > int c = line.charAt(valueStart);
> > if (c != ' ' && c != '\t')
> > break;
> > valueStart++;
> > }
> > String value = line.substring(valueStart);
> >
> > headers.put(key, value);
> > }
> > }
> >
>
**************************************************************
> > **************
> > *****
> >
> > > I think the problem is that no "content-type"
> was ever on
> > the page --
> > > this leaves both the content type and the
> extension/suffix
> > to be blank
> > > and that causes a problem. Also, if a
> character-set is also not
> > > specified then the fetcher fails as well (as it
> cannot
> > write to disk).
> >
> > I tested it and there was a "content-type" header.
> If its
> > name was "Content-Type", everything was ok but if
> its name
> > was "content-type" Nutch internally looses the
> information
> > about the content-type by the use of the code
> above.
> >
>
=== message truncated ===
__________________________________
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!
http://my.yahoo.com
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers