Re: Autentication first

cosimo anglano Thu, 02 Jun 2016 10:32:41 -0700

Hi all,

I have solved the problem, so I thought to report the solution so that it 
can be possibly useful to someone else.
The problem was not due to Scrapy, but to an oversight we did when 
developing rules.
In practice, we did not consider that the phpBB board we were scraping 
provided - in each one of its pages -
links to pages causing an immediate logout, namely one deleting all the 
cookies set by the board, and one
logging out users.
Our "general" rule, that is


Rule(LinkExtractor(),callback = 'parse_standard',follow=True)


caused these logging-out links to be followed, so that the spider was 
logged out just after having logged in.

Changing the above rule with one avoiding these pages solved our problem
Rule(LinkExtractor(restrict_xpaths='//a[not(contains(@href,"logout")) and 
not(contains(@href,"delete_cookies"))]'),callback = 'parse_standard',follow=
True)

So, sorry for the false alarm, and thanks for the replies we got.

Cosimo

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Autentication first

Reply via email to