> the following code would fail in case the meta tags are in upper case
>
> Node nameNode = attrs.getNamedItem("name");
> Node equivNode = attrs.getNamedItem("http-equiv");
> Node contentNode = attrs.getNamedItem("content");
This code works well, because Nutch HTML Parser uses Xerces implementation
HTMLDocumentImpl object that lowercased attributes (instead of elements
names that are uppercased).
For consistency and to decouple a little Nutch HTML Parser and Xerces
implementation, I suggest to change these lines by something like:
Node nameNode = null;
Node equivNode = null;
Node contentNode = null;
for (int i=0; i<attrs.getLength(); i++) {
Node attr = attrs.item(i);
String attrName = attr.getNodeName().toLowerCase();
if (attrName.equals("name")) {
nameNode = attr;
} else if (attrName.equals("http-equiv")) {
equivNode = attr;
} else if (attrName.equals("content")) {
contentNode = attr;
}
}
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/