Hi Daniel,
thank you very much for your kind help.
After scheduling the spider run, an output is actually produced:
Opening file
/var/lib/scrapyd/items/sole24ore/sole/89d644f8a13a11e4a2afc04a00090e80.jl
Read output!
This is my output:
{"url": ["http://m.bbc.co.uk", "http://www.bbc.com/news/", " .....
But modifying the feed settings as:
BOT_NAME = 'sole24ore'
SPIDER_MODULES = ['sole24ore.spiders']
NEWSPIDER_MODULE = 'sole24ore.spiders'
FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json'
doesn't produce an output.json into /home/marco/crawlscrape/sole24ore
am I missing some other steps?
Marco
2015-01-20 18:45 GMT+01:00 Daniel Fockler <[email protected]>:
> For your first problem, you've started the scrapyd project but you need to
> schedule a spider run using the schedule.json command. Something like
>
> curl http://localhost:6800/schedule.json -d project=sole24ore -d
> spider=yourspidername
>
> For your second problem your settings.py is misconfigured your feed settings
> should be like
>
> FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json'
> FEED_FORMAT = 'json'
>
> Hope that helps
>
> On Tuesday, January 20, 2015 at 4:23:04 AM UTC-8, Marco Ippolito wrote:
>>
>> Hi,
>> I' ve got 2 situations to solve.
>>
>> Seems that everything is ok:
>>
>> (SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore -p
>> sole24ore
>> Packing version 1421755479
>> Deploying to project "sole24ore" in http://localhost:6800/addversion.json
>> Server response (200):
>> {"status": "ok", "project": "sole24ore", "version": "1421755479",
>> "spiders": 1}
>>
>>
>> marco@pc:/var/lib/scrapyd/dbs$ ls -lah
>> totale 12K
>> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 .
>> drwxr-xr-x 5 scrapy nogroup 4,0K gen 20 06:55 ..
>> -rw-r--r-- 1 root root 2,0K gen 20 13:04 sole24ore.db
>>
>>
>> marco@pc:/var/lib/scrapyd/eggs/sole24ore$ ls -lah
>> totale 16K
>> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 .
>> drwxr-xr-x 3 scrapy nogroup 4,0K gen 20 12:47 ..
>> -rw-r--r-- 1 scrapy nogroup 5,5K gen 20 13:04 1421755479.egg
>>
>>
>> , but nothing is executed
>>
>> marco@pc:/var/lib/scrapyd/items/sole24ore/sole$ ls -a
>> . ..
>>
>> [detached from 2515.pts-4.pc]
>> marco@pc:~/crawlscrape/sole24ore$ curl
>> http://localhost:6800/listjobs.json?project=sole24ore
>> {"status": "ok", "running": [], "finished": [], "pending": []}
>>
>>
>>
>> The second aspect regards how to save the output into a json file.
>> What is the correct form to put into settings.py?
>>
>> ile Edit Options Buffers Tools Python Help
>> # Scrapy settings for sole24ore project
>> #
>> # For simplicity, this file contains only the most important settings by
>> # default. All the other settings are documented here:
>> #
>> # http://doc.scrapy.org/en/latest/topics/settings.html
>> #
>>
>> BOT_NAME = 'sole24ore'
>>
>> SPIDER_MODULES = ['sole24ore.spiders']
>> NEWSPIDER_MODULE = 'sole24ore.spiders'
>>
>> FEED_URI=file://home/marco/crawlscrape/sole24ore/output.json --set
>> FEED_FORMAT=json
>>
>>
>> SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore -p
>> sole24ore
>> Packing version 1421756389
>> Deploying to project "sole24ore" in http://localhost:6800/addversion.json
>> Server response (200):
>> {"status": "error", "message": "SyntaxError: invalid syntax"}
>>
>>
>> # Crawl responsibly by identifying yourself (and your website) on the
>> user-agent
>> #USER_AGENT = 'sole24ore (+http://www.yourdomain.com)'
>>
>> Looking forward to your kind help.
>> Kind regards.
>> Marco
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.