I'm building a scrapy application that looks for film grants. I'm having two problems. First, I'm not sure exactly which logic I should use when looking for grant on the page. I'm *only* looking for available grants. http://www.filmindependent.org/labs-and-programs/grants-and-awards/
Second, I've used firebug<http://doc.scrapy.org/en/latest/topics/firebug.html>to help me determine which dom elements I should retrieve, but when I try to save the scraped file into a json file, all I get are empty dictionaries in items.json. I have spent many days pouring through the documentation http://doc.scrapy.org/en/latest/topics/selectors.html, but still not sure what I'm doing wrong. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
from scrapy.spider import Spider
from scrapy.selector import Selector
from grants.items import GrantsItem
class FilmsSpider(Spider):
name = "films"
allowed_domains = ["filmindependent.org"]
start_urls = ["http://www.filmindependent.org/labs-and-programs/grants-and-awards/"]
'''
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//ul/li')
for site in sites:
title = site.xpath('a/text()').extract()
desc = site.xpath('a/@href').extract()
link = site.xpath('text()').extract()
print title, link, desc
'''
Export to a csv file
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//a')
items = []
for site in sites:
item = GrantsItem()
item['title'] = sel.xpath('h3/text()').extract()
item['url'] = sel.xpath('@href').extract()
items.append(item)
return items
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
"""
Grant Scraper Project
Scrapes the web for grant data.
Grant data is stored in postgres database
"""
from scrapy.item import Item, Field
class GrantsItem(Item):
title = Field()
url = Field()
items.json
Description: Binary data
