Xin-Yi Liu: > I believe that the String.CASE_INSENSITIVE_ORDER comparator > only affects the way the keys are ordered internally within > the TreeMap. It would not affect lookups, so > headers.get(key) would still be a case sensitive.
No - it does also affect lookups. Just try the main() method, I provided. The problem is the copy to a Properties object. > Perhaps subclassing Properties to make all gets and puts case > insensitive is the best solution. Thats exactly what i have done in a local patch on my system. > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Xin-Yi Liu > Sent: Donnerstag, 30. Dezember 2004 23:55 > To: [EMAIL PROTECTED] > Subject: RE: [Nutch-dev] Fetch / Parse errors and a Bug > > I believe that the String.CASE_INSENSITIVE_ORDER comparator > only affects the way the keys are ordered internally within > the TreeMap. It would not affect lookups, so > headers.get(key) would still be a case sensitive. > > Perhaps subclassing Properties to make all gets and puts case > insensitive is the best solution. > > --- Sven Wende <[EMAIL PROTECTED]> wrote: > > > I think it4s only case insensitive in that TreeMap > > parserHeaders() produces > > ! > > > > But in the constructor this map is copied into a Properties object. > > > > ******************************************** > > // parse headers > > headers.putAll(parseHeaders(in, line)); > > ******************************************** > > > > Look at the following snippet, which does the same thing as > > HttpResponse.class does: > > > > ******************************************** > > public static void main(String[] args) { > > TreeMap headers = new > > TreeMap(String.CASE_INSENSITIVE_ORDER); > > headers.put("content-type", "text"); > > > > Properties headers2 = new Properties(); > > headers2.putAll(headers); > > > > > > System.out.println(headers.get("Content-Type")); // = "text" > > > > > > System.out.println(headers2.get("Content-Type")); // = null > > } > > ******************************************** > > > > You can use the following url for your tests: > > > > http://www.verdi.de/0x0ac80f2b_0x0069a759 > > > > It is a PDF file and the server sends "Content-type: > > application/pdf" ! > > > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > > > > > > [mailto:[EMAIL PROTECTED] > > On > > > Behalf Of Chirag Chaman > > > Sent: Mittwoch, 29. Dezember 2004 16:22 > > > To: [EMAIL PROTECTED] > > > Subject: RE: [Nutch-dev] Fetch / Parse errors and > > a Bug > > > > > > That is strange, coz I would expect it to be case > insensitive, but > > > then again I have not tested, > > just looking > > > at the code. > > > > > > You see how the TreeMap is initialized with > > > String.CASE_INSENSITIVE_ORDER > > > > > > private Map parseHeaders(PushbackInputStream in, > > StringBuffer line) > > > throws IOException, HttpException { > > > TreeMap headers = new > > TreeMap(String.CASE_INSENSITIVE_ORDER); > > > return parseHeaders(in, line, headers); > > > > > > So I would imagine that a look up for Content-Type > > is case > > > insensitive as well. > > > > > > > > > Can you send me the link to a page that has this > > problem -- > > > I'll run some tests to see what's causing this. > > > > > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > > > > [mailto:[EMAIL PROTECTED] > > On > > > Behalf Of Sven Wende > > > Sent: Wednesday, December 29, 2004 9:16 AM > > > To: [EMAIL PROTECTED] > > > Subject: RE: [Nutch-dev] Fetch / Parse errors and > > a Bug > > > > > > Chirag: > > > > > > > I looked at where you mention that the content > > type is > > > being looked up > > > > and is Case Sensitive -- that is not correct. > > The HTTP protocol is > > > > adding the Content-type to the TreeMap which is > > initialized > > > with the > > > > String.CASE_INSENSITIVE_ORDER comparator. Thus > > it > > > internally will do a > > > > case-insensitive match. > > > > > > Which code do you refer to? > > > > > > I described a problem in the protocoll-http > > plugin. Just take > > > a look at the following code snippet from the CVS. > > As you can > > > see, the headers are read in and stored in a > > simple Hashtable. > > > The problem with case sensitive headers for > > content-type occurs in the > > > toContent() method. (for example) > > > > > > > > > ************************************************************** > > > ************** > > > ***** > > > package net.nutch.protocol.http; > > > > > > /** An HTTP response. */ > > > > > > public class HttpResponse { > > > private Properties headers = new Properties(); > > > > > > > > /** Returns the value of a named header. */ > > > public String getHeader(String name) { > > > return (String)headers.get(name); > > > } > > > > > > public Content toContent() { > > > String contentType = > > getHeader("Content-Type"); > > > if (contentType == null) > > > contentType = ""; > > > return new Content(orig, base, content, > > contentType, headers); > > > } > > > > > > private void processHeaderLine(StringBuffer > > line, TreeMap headers) > > > throws IOException, HttpException { > > > int colonIndex = line.indexOf(":"); // > > key is up to colon > > > if (colonIndex == -1) { > > > int i; > > > for (i= 0; i < line.length(); i++) > > > if > > (!Character.isWhitespace(line.charAt(i))) > > > break; > > > if (i == line.length()) > > > return; > > > throw new HttpException("No colon in > > header:" + line); > > > } > > > String key = line.substring(0, colonIndex); > > > > > > int valueStart = colonIndex+1; // > > skip whitespace > > > while (valueStart < line.length()) { > > > int c = line.charAt(valueStart); > > > if (c != ' ' && c != '\t') > > > break; > > > valueStart++; > > > } > > > String value = line.substring(valueStart); > > > > > > headers.put(key, value); > > > } > > > } > > > > > > ************************************************************** > > > ************** > > > ***** > > > > > > > I think the problem is that no "content-type" > > was ever on > > > the page -- > > > > this leaves both the content type and the > > extension/suffix > > > to be blank > > > > and that causes a problem. Also, if a > > character-set is also not > > > > specified then the fetcher fails as well (as it > > cannot > > > write to disk). > > > > > > I tested it and there was a "content-type" header. > > If its > > > name was "Content-Type", everything was ok but if > > its name > > > was "content-type" Nutch internally looses the > > information > > > about the content-type by the use of the code > > above. > > > > > > === message truncated === > > > > > __________________________________ > Do you Yahoo!? > The all-new My Yahoo! - Get yours free! > http://my.yahoo.com > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Nutch-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > > ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
