Hi, I am using ManifoldCF 2.6.
The rss connector does not crawl the feed http://rss.orf.at/news.xml. In manifoldcf.log the following line appears: org.apache.manifoldcf.crawler.connectors.rss.RSSConnector$OuterContextClass DEBUG 2017-03-23 14:29:54,718 (Worker thread '1') - RSS: RSS document 'http://rss.orf.at/news.xml' does not have rss, feed, or rdf:RDF tag - not valid feed I tried the following change in RSSConnector (on branch release-2.6-branch) and now the feed is crawled. It is maybe a bug in the RSSConnector. Kind Regards, Joachim --- a/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java +++ b/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/RSSConnector.java @@ -3311,7 +3311,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR Logging.connectors.debug("RSS: Parsed bottom-level XML for RSS document '"+documentIdentifier+"'"); return new RSSContextClass(theStream,namespace,localName,qName,atts,documentIdentifier,activities,filter); } - else if (localName.equals("RDF")) + else if (localName.toUpperCase().equals("RDF")) { // RDF/Atom feed detected outerTagCount++; @@ -3345,7 +3345,7 @@ public class RSSConnector extends org.apache.manifoldcf.crawler.connectors.BaseR { rescanTimeSet = ((RSSContextClass)context).process(); } - else if (tagName.equals("RDF")) + else if (tagName.toUpperCase().equals("RDF")) { rescanTimeSet = ((RDFContextClass)context).process(); } _______________________________________________ Dipl.-Ing. Joachim Butz Softwareentwickler HC SOLUTIONS GesmbH A - 4030 Linz, Dauphinestraße 5 Telefon: +43 (0)732 / 9394 0 Mobil: Fax: +43 (0)732 / 9394 800 E-Mail: [email protected] Home: http://www.hcsolutions.at/ http://www.tomo-base.at/ Firmenbuchnummer: FN 115314 F Firmenbuchgericht: Landesgericht Linz Rechtsform: GesmbH UID-Nr. ATU 36898407 _______________________________________________
