Saving scraped content

Portia Burton Tue, 04 Feb 2014 00:50:18 -0800

I'm building a scrapy application that looks for film grants.  I'm having 
two problems. First, I'm not sure exactly which logic I should use when 
looking for grant on the page. I'm *only* looking for available grants. 
http://www.filmindependent.org/labs-and-programs/grants-and-awards/


Second, I've used 
firebug<http://doc.scrapy.org/en/latest/topics/firebug.html>to help me 
determine which dom elements I should retrieve, but when I try 
to save the scraped file into a json file, all I get are empty dictionaries 
in items.json.  I have spent many days pouring through the documentation 
http://doc.scrapy.org/en/latest/topics/selectors.html, but still not sure 
what I'm doing wrong.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

from scrapy.spider import Spider
from scrapy.selector import Selector

from grants.items import GrantsItem

class FilmsSpider(Spider):
    name = "films"
    allowed_domains = ["filmindependent.org"]
    start_urls = ["http://www.filmindependent.org/labs-and-programs/grants-and-awards/";]

'''
    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//ul/li')
        for site in sites:
            title = site.xpath('a/text()').extract()
            desc = site.xpath('a/@href').extract()
            link = site.xpath('text()').extract()
            print title, link, desc
'''
Export to a csv file
    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//a')
        items = []
        for site in sites:
            item = GrantsItem()
            item['title'] = sel.xpath('h3/text()').extract()
            item['url'] = sel.xpath('@href').extract()
            items.append(item)
        return items


    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

"""
Grant Scraper Project

Scrapes the web for grant data.

Grant data is stored in postgres database
"""
from scrapy.item import Item, Field

class GrantsItem(Item):
    title = Field()
    url = Field()

items.json
Description: Binary data

Saving scraped content

Reply via email to