Re: Why my scrappy doesn't scrape

Paul Tremberth Wed, 15 Oct 2014 04:25:07 -0700

Hi,

I that the code you're running?
You've commented the rules attributes, probably after trying out a CrawlSpider.


(Now) You're using Spider, but the parse_items() callback, although
defined, will never be called if you do not reference it.
It was valid when your rules referenced it within a CrawlSpider context

With Spider, a simple way to get your callback called is to rename
parse_items() to parse

Hope that helps.
Paul.


On Wed, Oct 15, 2014 at 12:49 PM, Cabloofka <szymon.roziew...@gmail.com> wrote:
> Hello there,
>
> I am new to scrapy and trying to using it.
>
> I tried to debug with scrapy shell and inspection but it didn't help me out.
>
> My script simply does not do anything what I expect.
>
> Here is the script
>
> from scrapy.selector import Selector, HtmlXPathSelector
> from uksw.items import DataItem
> from scrapy.spider import Spider
>
> from scrapy.shell import inspect_response
>
> from scrapy.utils.response import open_in_browser
>
> class MySpider(Spider):
>     name = "ecolex"
>     allowed_domains = ["www.ecolex.org"]
>     start_urls = [
>
> "http://www.ecolex.org/ecolex/ledge/view/SearchResults?screen=Common&listingField=&allFields=&allFields_allWords=allWords&titleOfText=&titleOfText_allWords=allWords&subject=&subject_allWords=allWords&country=&country_allWords=allWords&region=&region_allWords=allWords&basin=&basin_allWords=allWords&keyword=&keyword_allWords=allWords&languageOfDocument=&languageOfDocument_allWords=allWords&searchDate_start=1960&searchDate_end=2014&sortField=searchDate";
>     ]
>
>
>
>     # rules = (
>     #          Rule(SgmlLinkExtractor(allow=("http://www.ecolex.org/";,)),
> callback='parse_items')
>     #         )
>
>     def parse_items(self, response):
>            hxs = HtmlXPathSelector(response)
>
>            inspect_response(response, self)
>
>            items = []
>            item = DataItem()
>
>            item["name"] = response.xpath('//title/text()').extract()
> #hxs.select("//div/text()").extract()
>            items.append(item)
>
>            return items
>
> I am trying to get e.g. the title or some text in div. Both things dont
> work.
>
> I am running the script with option -o name.json, the result is only one
> character '['.
>
> Any suggestions?
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scrapy-users+unsubscr...@googlegroups.com.
> To post to this group, send email to scrapy-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Why my scrappy doesn't scrape

Reply via email to