Hi Markus, Thanks for the reply. I following the link which led me to this link http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html and from here another link http://sourceforge.net/projects/sitemap-parser/
The last link is a page that allows you to download SitemapParser0.9.jar file. There is no information on the page that tells you where to put the file and how to tell nutch to use it. Have you used this utility or do you know the answers to any of the questions above? Jackie -----Original Message----- From: Markus Jelsma [mailto:[email protected]] Sent: Friday, December 19, 2014 6:20 AM To: [email protected] Subject: RE: Nutch 1.9 error No, i am wrong. Nutch 1.x has a patch for sitemap processing, please see: https://issues.apache.org/jira/browse/NUTCH-1465 -----Original message----- > From:Markus Jelsma <[email protected]> > Sent: Friday 19th December 2014 12:17 > To: [email protected] > Subject: RE: Nutch 1.9 error > > No, unfortunately not. > > > -----Original message----- > > From:Richardson, Jacquelyn F. <[email protected]> > > Sent: Friday 19th December 2014 5:16 > > To: [email protected] > > Subject: RE: Nutch 1.9 error > > > > Is it possible to crawl sitemap.xml file with Nutch 1.x? > > > > -----Original Message----- > > From: Markus Jelsma [mailto:[email protected]] > > Sent: Thursday, December 18, 2014 3:09 PM > > To: [email protected] > > Subject: RE: Nutch 1.9 error > > > > Hi - the sitemap command is not part of Nutch 1.x, nor does it have a > > HostDB. I suspect you are using Nutch 2.x commands. > > > > -----Original message----- > > > From:Richardson, Jacquelyn F. <[email protected]> > > > Sent: Thursday 18th December 2014 20:30 > > > To: [email protected] > > > Subject: Nutch 1.9 error > > > > > > I am using Nutch 1.9. I am trying to crawl our sitemap.xml file. > > > > > > When I submit the following command: > > > bin/nutch sitemap crawl -hostdb hostdb -threads 2 to nutch I > > > receive the following error: > > > Error: Could not find or load main class sitemap > > > > > > Any help you can give will be greatly appreciated. > > > > > > Jackie Richardson > > > > > > > > > > > >

