[Nutch-dev] Re: HTMLMetaProcessor a bug?

Gal Nitzan Tue, 10 Jan 2006 05:16:00 -0800

Thanks, I was checking something with the default from jdk...

On Tue, 2006-01-10 at 11:06 +0100, Jérôme Charron wrote:
> > the following code would fail in case the meta tags are in upper case
> >
> >         Node nameNode = attrs.getNamedItem("name");
> >         Node equivNode = attrs.getNamedItem("http-equiv");
> >         Node contentNode = attrs.getNamedItem("content");
> 
> This code works well, because Nutch HTML Parser uses Xerces implementation
> HTMLDocumentImpl object that lowercased attributes (instead of elements
> names that are uppercased).
> For consistency and to decouple a little Nutch HTML Parser and Xerces
> implementation, I suggest to change these lines by something like:
> Node nameNode = null;
> Node equivNode = null;
> Node contentNode = null;
> for (int i=0; i<attrs.getLength(); i++) {
>   Node attr = attrs.item(i);
>   String attrName = attr.getNodeName().toLowerCase();
>   if (attrName.equals("name")) {
>     nameNode = attr;
>   } else if (attrName.equals("http-equiv")) {
>     equivNode = attr;
>   } else if (attrName.equals("content")) {
>     contentNode = attr;
>   }
> }
> 
> 
> Jérôme
> 
> 
> --
> http://motrech.free.fr/
> http://www.frutch.org/





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: HTMLMetaProcessor a bug?

Reply via email to