Hi Travis,

Thanks for the advise, It worked. Now I am able to scrap the page.

I have put question on this forums earlier as well, however haven't got any
helpful replies, I was thinking this forum is inactive and while posting
this question I wasn't having any hope that I will get answers for this,
however thanks to you, my problem resolved.

Thanks a lot.

Gaurang Shah
Blog: qtp-help.blogspot.com
Mobile: +91 738756556

On Thu, Mar 5, 2015 at 9:33 PM, Travis Leleu <[email protected]> wrote:

> Sounds like the site is detecting you're scraping and trying to prevent
> it. Id suggest looking into user agent middlewares to mimic a browser UA
> string
> On Mar 5, 2015 1:41 AM, "Gaurang shah" <[email protected]> wrote:
>
>> Hi Guys,
>>
>> I am trying scrapy a website, however the problem is whenever I try to
>> visit the page from which I have to scrap data it redirects to some other
>> page. if I visit that page manually in the the browser it's not being
>> redirected anyway, I checked the response code as well, it shows 200.
>>
>> However with scrapy it's being redirected and I am able to see the code
>> 302.
>>
>> Following is the website I am trying to scrap.
>> http://www.lonmark.org/membership/directory/partners
>>
>> In the scrapy logs I am able to see following entries.
>> 2015-03-05 15:08:36+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
>> http://www.lonmark.org/sitemap> from <GET
>> http://www.lonmark.org/membership/directory/partners>
>> 2015-03-05 15:08:37+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
>> http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap>
>> 2015-03-05 15:08:37+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
>> http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap>
>> 2015-03-05 15:08:41+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
>> http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap>
>>
>> Following the code.
>> class Spider(BaseSpider):
>>     name = "lonamrk"
>>     allowed_domains = ["lonmark.org"]
>>     # Request.meta = {'dont_redirect': True,
>>     #                 'handle_httpstatus_list': [302]}
>>
>>     start_urls = ["http://www.lonmark.org/membership/directory/partners";]
>>
>>     def parse(self, response):
>>         print response.url
>>         hxs = HtmlXPathSelector(response)
>>         company_links =
>> hxs.select("//*[@id='page_content']/table/tbody/tr[1]/td[1]/a/@href")
>>         for link in company_links:
>>             yield 
>> Request("http://www.lonmark.org/membership/directory/"+link._root,
>> callback=self.parse_company_info)
>>
>>
>>
>> If I uncomment the code, and stop redirection. Then I am not getting
>> anything in the response body.
>>
>> would someone please help me what to do ???
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/scrapy-users/Jx-zq7QNw5A/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to