The answer that keeps on giving - thank you 3 years later!
On Saturday, June 30, 2012 at 11:23:56 PM UTC-4, Steven Almeroth wrote:
>
> Try this:
>
> rules = (Rule(SgmlLinkExtractor(allow=('//*[@id="Form"]'))),)
>
>
> notice the extra comma near the end.
>
> On Friday, June 29, 2012 5:47:35 PM UTC-5, Scrapy_lover wrote:
>>
>> When trying to crawl a website ,i got the following error >> any help
>> please ?
>>
>> *script code *
>>
>> ----------------------------------------------------------------------------------------------------------
>>
>>> from scrapy.contrib.spiders import CrawlSpider, Rule
>>>> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
>>>> from scrapy.selector import HtmlXPathSelector
>>>> from scrapy.item import Item
>>>>
>>>> class MySpider(CrawlSpider):
>>>> name = 'example.com'
>>>> allowed_domains = ['http://testaspnet.vulnweb.com/default.aspx']
>>>> start_urls = ['http://testaspnet.vulnweb.com/default.aspx']
>>>>
>>>> rules = (
>>>> Rule(SgmlLinkExtractor(allow=('//*[@id="Form"]' ) )))
>>>>
>>>> def parse_item(self, response):
>>>> self.log('%s' % response.url)
>>>> hxs = HtmlXPathSelector(response)
>>>> item = Item()
>>>>
>>>> item['text'] = hxs.select("//input[(@id or @name) and (@type =
>>>> 'text' or @type = 'password' or @type = 'file')]").extract()
>>>>
>>>> return item
>>>>
>>>>
>>>
>> --------------------------------------------------------------------------------------------------------------------------------------
>> *But it gave me the following error :*
>>
>> home@home-pc:~/isa$ scrapy crawl example.com
>>>> 2012-06-30 00:32:11+0200 [scrapy] INFO: Scrapy 0.14.4 started (bot: isa)
>>>> 2012-06-30 00:32:11+0200 [scrapy] DEBUG: Enabled extensions: LogStats,
>>>> TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
>>>> 2012-06-30 00:32:11+0200 [scrapy] DEBUG: Enabled downloader
>>>> middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware,
>>>> UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware,
>>>> RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware,
>>>> ChunkedTransferMiddleware, DownloaderStats
>>>> 2012-06-30 00:32:11+0200 [scrapy] DEBUG: Enabled spider middlewares:
>>>> HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
>>>> UrlLengthMiddleware, DepthMiddleware
>>>> 2012-06-30 00:32:11+0200 [scrapy] DEBUG: Enabled item pipelines:
>>>> Traceback (most recent call last):
>>>> File "/usr/local/bin/scrapy", line 4, in <module>
>>>> execute()
>>>> File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line
>>>> 132, in execute
>>>> _run_print_help(parser, _run_command, cmd, args, opts)
>>>> File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line
>>>> 97, in _run_print_help
>>>> func(*a, **kw)
>>>> File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line
>>>> 139, in _run_command
>>>> cmd.run(args, opts)
>>>> File
>>>> "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line
>>>> 43,
>>>> in run
>>>> spider = self.crawler.spiders.create(spname, **opts.spargs)
>>>> File
>>>> "/usr/local/lib/python2.7/dist-packages/scrapy/spidermanager.py", line 44,
>>>> in create
>>>> return spcls(**spider_kwargs)
>>>> File
>>>> "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spiders/crawl.py",
>>>> line 37, in __init__
>>>> self._compile_rules()
>>>> File
>>>> "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spiders/crawl.py",
>>>> line 83, in _compile_rules
>>>> self._rules = [copy.copy(r) for r in self.rules]
>>>> TypeError: 'Rule' object is not iterable
>>>>
>>>>
>>>>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.