No, Nutch won't be able to crawl any Javascript generated URL unless you make some very heavy customizations such as using stuff like selenium or a Javascript runtime with embedded DOM environment. Nutch can however crawl AJAX webpages like google does.
https://issues.apache.org/jira/browse/NUTCH-1323 -----Original message----- > From:Deepa Jayaveer <[email protected]> > Sent: Friday 14th March 2014 10:10 > To: [email protected] > Subject: reg pagination > > Hi > I am using Nutch 2.1 with MySQL. The requirement is to crawl all the > Paginated web pages. > > Say, for example, if I had given the Seed URL as the first page (page no:1 > ) of some website (http://x.com?num=1) > and by giving appropriate regular expression through URL filter to make > nutch to crawl the pages with the pattern as "num" > Nutc able to crawl the given URLs > http://x.com?num=2 > http://x.com?num=3 ... > > Nutch is successfully crawling if the pagination URL is given in the > anchor tag(a href) for pagination. > > I was facing issue when the web pages had used some java script function > to call the pagination by > calling function like onPaginationSubmit() > > Nutch was not able to take crawl those pages. can anyone help to give > solution on how to crawl those paginated pages? > > > > > Thanks and Regards > Deepa Devi > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > >

