Hi,

I am using ManifoldCF 2.6.

The rss connector does not crawl the feed http://rss.orf.at/news.xml.
In manifoldcf.log the following line appears:
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector$OuterContextClass 
DEBUG 2017-03-23 14:29:54,718 (Worker thread '1') - RSS: RSS document 
'http://rss.orf.at/news.xml' does not have rss, feed, or rdf:RDF tag - not 
valid feed

I tried the following change in RSSConnector (on branch release-2.6-branch) and 
now the feed is crawled.
It is maybe a bug in the RSSConnector.

Kind Regards,
Joachim

--- 
a/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
+++ 
b/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java
@@ -3311,7 +3311,7 @@ public class RSSConnector extends 
org.apache.manifoldcf.crawler.connectors.BaseR
           Logging.connectors.debug("RSS: Parsed bottom-level XML for RSS 
document '"+documentIdentifier+"'");
         return new 
RSSContextClass(theStream,namespace,localName,qName,atts,documentIdentifier,activities,filter);
       }
-      else if (localName.equals("RDF"))
+      else if (localName.toUpperCase().equals("RDF"))
       {
         // RDF/Atom feed detected
         outerTagCount++;
@@ -3345,7 +3345,7 @@ public class RSSConnector extends 
org.apache.manifoldcf.crawler.connectors.BaseR
       {
         rescanTimeSet = ((RSSContextClass)context).process();
       }
-      else if (tagName.equals("RDF"))
+      else if (tagName.toUpperCase().equals("RDF"))
       {
         rescanTimeSet = ((RDFContextClass)context).process();
       }

_______________________________________________

Dipl.-Ing. Joachim Butz
Softwareentwickler

HC SOLUTIONS GesmbH
A - 4030 Linz, Dauphinestraße 5
Telefon: +43 (0)732 / 9394 0
Mobil:
Fax:     +43 (0)732 / 9394 800
E-Mail:  [email protected]
Home:   http://www.hcsolutions.at/
            http://www.tomo-base.at/

Firmenbuchnummer: FN 115314 F
Firmenbuchgericht: Landesgericht Linz
Rechtsform: GesmbH
UID-Nr. ATU 36898407
_______________________________________________








Reply via email to