Re: Issue Crawling Alternate URLs

2016-10-07 Thread Sebastian Nagel
m: Sebastian Nagel [mailto:wastl.na...@googlemail.com] > Sent: Thursday, October 06, 2016 8:26 AM > To: user@nutch.apache.org > Subject: Re: Issue Crawling Alternate URLs > > Hi, > >> http://rssfeeds.azcentral.com/phoenix/asu > > That's already an RSS feed which unlu

RE: Issue Crawling Alternate URLs

2016-10-06 Thread Adler, Matthew (US)
ginal Message- From: Sebastian Nagel [mailto:wastl.na...@googlemail.com] Sent: Thursday, October 06, 2016 8:26 AM To: user@nutch.apache.org Subject: Re: Issue Crawling Alternate URLs Hi, > http://rssfeeds.azcentral.com/phoenix/asu That's already an RSS feed which unluckily fails to parse: (us

Re: Issue Crawling Alternate URLs

2016-10-06 Thread Sebastian Nagel
Hi, > http://rssfeeds.azcentral.com/phoenix/asu That's already an RSS feed which unluckily fails to parse: (using plugin "feed") Status: failed(2,200): com.sun.syndication.io.ParsingFeedException: Invalid XML: Error on line 183: XML document structures must start and end within the same

Issue Crawling Alternate URLs

2016-10-05 Thread Adler, Matthew (US)
Hello Nutch Users: I’m currently having an issue with Nutch 1.4, similar to the one logged here: https://issues.apache.org/jira/browse/NUTCH-2319 Using the example in that JIRA issue, if I am on the following URL: http://rssfeeds.azcentral.com/phoenix/asu I expect that nutch will be able to