1. The site might be banning you. 2. We deprecated select method when we added css support, now you have xpath and css methods available (http://doc.scrapy.org/en/latest/topics/selectors.html)
El lunes, 1 de septiembre de 2014 13:33:44 UTC-3, james josh escribió: > > > Question no :1 > > I am troubleing to crawl multiple page using nextpage link. which has > crawling different jobs count in each time: For ex: 20 jobs, 45 jobs, 200 jobs > > > Question no :2 > > > Please let me know why it has happen during debugging? > > > scrapy_demo\spiders\test.py:43: ScrapyDeprecationWarning: Call to deprecated > function select. Use .xpath() instead. and how to solve this. > > Please check this. > > Thanks > james > > > Following my scrapy code: > ------------------------- > > > > > > > > > from scrapy.spider import BaseSpiderfrom scrapy.selector import > HtmlXPathSelectorimport urlparsefrom scrapy.http.request import Requestfrom > scrapy.contrib.spiders import CrawlSpider,Rulefrom > scrapy.contrib.linkextractors.sgml import SgmlLinkExtractorfrom scrapy.item > import Item, Fieldclass ScrapyDemoSpiderItem(Item): > link = Field() > title = Field() > city = Field() > salary = Field() > content = Field()class ScrapyDemoSpider(BaseSpider): > name = 'eujobs77' > allowed_domains = ['eujobs77.com'] > start_urls = ['http://www.eujobs77.com/jobs']def parse(self,response): > hxs = HtmlXPathSelector(response) > listings = hxs.select('//div[@class="jobSearchBrowse jobSearchBrowsev1"]') > links = []#scrap listings page to get listing linksfor listing in listings: > link=listing.select('//h2[@class="jobtitle"]/a[@class="blue"]/@href').extract() > links.extend(link)#parse listing url to get content of the listing pagefor > link in links: > item=ScrapyDemoSpiderItem( > item['link']=linkyield Request(urlparse.urljoin(response.url, link), > meta={'item':item},callback=self.parse_listing_page) > > > #get next button link > next_page = Noneif > hxs.select('//div[@class="paggingNext"]/a[@class="blue"]/@href').extract(): > next_page = > hxs.select('//div[@class="paggingNext"]/a[@class="blue"]/@href').extract()[0]if > next_page:yield Request(urlparse.urljoin(response.url, next_page), > self.parse) > > > #scrap listing page to get contentdef parse_listing_page(self,response): > hxs = HtmlXPathSelector(response) > item = response.request.meta['item'] > item ['link'] = response.url > item['title'] = hxs.select("//h1[@id='share_jobtitle']/text()").extract() > item['city'] = > hxs.select("//html/body/div[3]/div[3]/div[2]/div[1]/div[3]/ul/li[1]/div[2]/text()").extract() > item['salary'] = > hxs.select("//html/body/div[3]/div[3]/div[2]/div[1]/div[3]/ul/li[3]/div[2]/text()").extract() > item['content'] = hxs.select("//div[@class='detailTxt > deneL']/text()").extract()yield item > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.