Of course, I fight with with this issue for a month, submit request for 
help on the Internet and find the cause of the issue 2 days later...

I managed to reproduce this issue in firefox by clicking more and faster. 
It seems that parameter _flowExecutionKey has server-side limit how many 
times it can be used (or how fast - I don't know yet). I will have to 
workaround it somehow.

W dniu poniedziałek, 10 października 2016 11:46:08 UTC+2 użytkownik Mateusz 
Lewicki napisał:
>
> I have spider for site openlife.pl. It is my pension fund site, where I 
> log-in and can view history of payments, value of funds etc. I wanted to 
> scrap history of my payments and fees taken - and succeeded. About 400 
> operations total. Operations are listed in < table > with all basic info 
> and url to details page (for now unused). Table lists only 25 entries, rest 
> is on subsequent "next page" pages - I read table, find url for next page, 
> go there with the same handler. Works 100%.
> Later I wanted to scrap data from details urls. Seems straightforward and 
> actually was - I managed to write this too. Except it didn't work 100%. 
> When downloading details is enabled, I get 50-100 operations (of 400) and 
> fewer than half of them have details. 
> It turned out that every single url on the site does not lead directly to 
> page displayed, but is first redirected. Example of proper redirect:
>
>    1. '
>    
> https://portal.openlife.pl/frontend/secure/accountHistory.html?_flowExecutionKey=_c647455B5-0E6D-EE90-60CF-E03446BA9D96_kA1AF3928-4351-6195-1E48-BC839F1B971B&_eventId=details&idHistory=16934891&historyType=charge
>    '
>    2. '
>    
> http://portal.openlife.pl/frontend/secure/accountHistory.html?_flowExecutionKey=_c647455B5-0E6D-EE90-60CF-E03446BA9D96_k7B26980D-F146-E179-AD85-AAB6782F62FA
>    '
>
> However a lot of urls are redirected like so:
>
>    1. '
>    
> https://portal.openlife.pl/frontend/secure/accountHistory.html?_flowExecutionKey=_c647455B5-0E6D-EE90-60CF-E03446BA9D96_kA1AF3928-4351-6195-1E48-BC839F1B971B&_eventId=details&idHistory=18711185&historyType=charge
>    '
>    2. '
>    
> http://portal.openlife.pl/frontend/secure/accountHistory.html?_flowId=account_history-flow
>    '
>    3. '
>    
> https://portal.openlife.pl/frontend/secure/accountHistory.html?_flowId=account_history-flow
>    '
>    4. '
>    
> http://portal.openlife.pl/frontend/secure/accountHistory.html?_flowExecutionKey=_cDD567B60-D3D2-ACED-7DEC-72080FC1906C_k51E2C0DB-264A-4DFF-47D2-8E861A1711FF
>    ']
>
> steps 2,3 are without any details, so 4 gives main page instead of detail 
> page. There scraping fails because there are no data expected by handler. 
> Steps 2,3 are common for all failed items. I didn't notice such redirection 
> when browsing manually. 
>
>
> What can cause such redirects? How to avoid them?
>
>
> source code can be viewed in full here: 
> https://github.com/mateuszzz88/scrapy_funds/blob/opeartion_details/crawler/scrapy_openlife/spiders/openlife.py
>  
> Methods of interest are on_account_history and 
> on_history_details
>
> Code also contains my attempts to solve the issue, including two custom 
> downloader middlewares that don't help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to