Hi, Is there any way that you can create a url that gets beyond that page without clicking a button? I.e. can you type something like http://form.example.com/check.do?forward=target_page& org.apache.struts.taglib.html.CANCEL=Continue In a web browser and view the page that is created by hitting the button?
I'm no nutch expert, but if this button requires cookies to display that next page, then you may need to use the http-client plugin instead of the http plugin. The problem with the http-client plugin is that all of your original urls need to be escaped. I.e. in your urls list, you need: http%3A//www.google.com instead of http://www.google.com Patrick -----Original Message----- From: karthik085 [mailto:[EMAIL PROTECTED] Sent: Monday, July 14, 2008 5:49 PM To: [email protected] Subject: Bypass Validation Hi. I am trying to crawl a page using nutch. That page exists behinds a validator (struts), i.e. In order to get to the page, a button needs to be clicked. Is there anyway this can be bypassed so web crawler can get to the page without clicking this button? Code: <form name="loginForm" method="post" action="/check.do"> <input type="hidden" name="forward" value="target_page"> <input type="submit" name="org.apache.struts.taglib.html.CANCEL" value="Continue" onclick="bCancel=true;"> </form> Any help is appreciated. Thanks. -- Sent from the Nutch - User mailing list archive at Nabble.com.
