After looking over the code for _get_clickable for form.py, I find that the
clickables list is populated by an xpath('.//input[@type="submit"]').
However, on the web, many submit buttons are of input type=image. Is it a
good idea to extend the logic for constructing the clickables list?
Is there a better way to handle this?
def _get_clickable(clickdata, form):
"""
Returns the clickable element specified in clickdata,
if the latter is given. If not, it returns the first
clickable element found
"""
clickables = [el for el in form.xpath('.//input[@type="submit"]')]
if not clickables:
clickables = [el for el in form.xpath('.//input[@type="image"]')]
# If we don't have clickdata, we just use the first clickable element
if clickdata is None:
el = clickables[0]
return (el.name, el.value)
Similarly, some type of submit button are not inside the <input > tag but
inside the <button >. How can we handle this?
According to me, handling these cases is important and will increase the
coverage of the loginform utility as well.
Any comments?
On Wednesday, 18 February 2015 15:43:14 UTC+8, pratik dand wrote:
>
> I am writing a crawler using scrapy that logs in a php app and crawls it.
> With the help of loginform <https://github.com/scrapy/loginform> utility,
> I was able to log into 13 apps excluding a few like zencart, openemr,
> magneto etc. I am describing below the problems I am facing with these
> apps. Any suggestions/insight on any of them will be very helpful.
>
> 1) OpenEMR.
> The login form is simple and was polluted by loginform utility. However,
> on making a post request using FormRequest, the html response received
> indicates that "OpenEMR needs Javascriot to perform user authentication".
> Does this mean that the filling forms using FormRequest won't help me log
> in the site ever?
>
> 2) Zencart
> The login page has 2 forms (login and create account). The loginform
> utility finds the correct form and populates it. However, the submit button
> has no name so cannot be passed in the formdata (as done by loginform). To
> side-step this, I set dont_click to true. But since there exists another
> form on the page with compulsary fields, an error is returned ("error while
> filling form"). The next thing I tried was using the clickdata attribute in
> the form request. The submit button has the code <input alt="Sign In"
> titlte = "Sign In" src="...." type="image">. On setting the
> clickdata={"type":"image","alt":"Sign In"}, I get an error saying "no
> clickable element matching {"type":"image","alt":"Sign In"} found". Is this
> submit button not clickable then? If so, I don't know what to do next.
>
> 3) Magneto
> The submit button is not inside <input> but inside <button>. I gave its id
> in the clickdata but it doesn't work?
>
> Any ideas will be helpful.
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.