Hi all, I have solved the problem, so I thought to report the solution so that it can be possibly useful to someone else. The problem was not due to Scrapy, but to an oversight we did when developing rules. In practice, we did not consider that the phpBB board we were scraping provided - in each one of its pages - links to pages causing an immediate logout, namely one deleting all the cookies set by the board, and one logging out users. Our "general" rule, that is
Rule(LinkExtractor(),callback = 'parse_standard',follow=True) caused these logging-out links to be followed, so that the spider was logged out just after having logged in. Changing the above rule with one avoiding these pages solved our problem Rule(LinkExtractor(restrict_xpaths='//a[not(contains(@href,"logout")) and not(contains(@href,"delete_cookies"))]'),callback = 'parse_standard',follow= True) So, sorry for the false alarm, and thanks for the replies we got. Cosimo -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
