Crawl Jira using nutch

Joshua Jiang Tue, 06 Nov 2012 09:01:22 -0800

Hi all,

I am currently using nutch to crawl all the jira issues we have. Have anyone 
done this before? Nutch crawls most of the issues but I am still missing some 
issues from jira. THis is the two urls I put in seeds.txt:
1. https://our jira/secure/Dashboard.jspa


2. https://our jira/secure/BrowseProjects.jspa#all
Either these two urls are not enough, or I am guessing that the 
db.fetch.interval.default in nutch-site.xml is not appropriate so it didn't 
crawl all the pages(currently I am define this as 86400s). Anyone has any ideas?

Thanks,
Joshua

Crawl Jira using nutch

Reply via email to