Hi 
 I am using Nutch 2.1 with MySQL.  The requirement is to crawl all the 
Paginated  web pages.

Say, for example, if I had given the Seed URL as the first page (page no:1 
) of some website (http://x.com?num=1)
and by  giving appropriate regular expression through URL filter  to make 
nutch to crawl the pages with the pattern as  "num"
Nutc able to crawl the given URLs
http://x.com?num=2
http://x.com?num=3 ...

Nutch is successfully  crawling  if the pagination  URL is given in the 
anchor tag(a href) for pagination.

 I was facing issue when the web pages had used some java script function 
to call the pagination by 
calling  function like onPaginationSubmit()

Nutch was not able to take crawl those pages. can anyone help to give 
solution on how to crawl those paginated pages?




Thanks and Regards
Deepa Devi 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Reply via email to