No, Nutch won't be able to crawl any Javascript generated URL unless you make 
some very heavy customizations such as using stuff like selenium or a 
Javascript runtime with embedded DOM environment. Nutch can however crawl AJAX 
webpages like google does.

https://issues.apache.org/jira/browse/NUTCH-1323
 
-----Original message-----
> From:Deepa Jayaveer <[email protected]>
> Sent: Friday 14th March 2014 10:10
> To: [email protected]
> Subject: reg pagination 
> 
> Hi 
>  I am using Nutch 2.1 with MySQL.  The requirement is to crawl all the 
> Paginated  web pages.
> 
> Say, for example, if I had given the Seed URL as the first page (page no:1 
> ) of some website (http://x.com?num=1)
> and by  giving appropriate regular expression through URL filter  to make 
> nutch to crawl the pages with the pattern as  "num"
> Nutc able to crawl the given URLs
> http://x.com?num=2
> http://x.com?num=3 ...
> 
> Nutch is successfully  crawling  if the pagination  URL is given in the 
> anchor tag(a href) for pagination.
> 
>  I was facing issue when the web pages had used some java script function 
> to call the pagination by 
> calling  function like onPaginationSubmit()
> 
> Nutch was not able to take crawl those pages. can anyone help to give 
> solution on how to crawl those paginated pages?
> 
> 
> 
> 
> Thanks and Regards
> Deepa Devi 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> 

Reply via email to