If you end up with an empty output.json or a file that just has a '[' character that could mean that scrapy couldn't find any items from your spider. But if that is not the case then there is another issue. Scrapyd should output logs for every spider that you run, in a logs directory
On Wednesday, January 21, 2015 at 11:41:49 AM UTC-8, Marco Ippolito wrote: > > Hi Daniel, > thanks again for helping. > > I tried with > FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json' > FEED_FORMAT = 'json' > > and with > FEED_URI = 'output.json' > FEED_FORMAT = 'json' > in both cases there is no output and not error message > > Any hints? > > Marco > > 2015-01-21 20:28 GMT+01:00 Daniel Fockler <[email protected] > <javascript:>>: > > You'll want to make sure in your settings.py that feed format is set, > like > > > > FEED_FORMAT = 'json' > > > > If it doesn't work after that then just try changing feed uri to > > > > FEED_URI = 'output.json' > > > > and scrapy will dump it in your project root > > > > On Tuesday, January 20, 2015 at 11:00:50 PM UTC-8, Marco Ippolito wrote: > >> > >> Hi Daniel, > >> thank you very much for your kind help. > >> > >> After scheduling the spider run, an output is actually produced: > >> > >> Opening file > >> > /var/lib/scrapyd/items/sole24ore/sole/89d644f8a13a11e4a2afc04a00090e80.jl > >> Read output! > >> This is my output: > >> {"url": ["http://m.bbc.co.uk", "http://www.bbc.com/news/", " ..... > >> > >> But modifying the feed settings as: > >> BOT_NAME = 'sole24ore' > >> > >> SPIDER_MODULES = ['sole24ore.spiders'] > >> NEWSPIDER_MODULE = 'sole24ore.spiders' > >> > >> FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json' > >> > >> doesn't produce an output.json into /home/marco/crawlscrape/sole24ore > >> > >> am I missing some other steps? > >> > >> Marco > >> > >> 2015-01-20 18:45 GMT+01:00 Daniel Fockler <[email protected]>: > >> > For your first problem, you've started the scrapyd project but you > need > >> > to > >> > schedule a spider run using the schedule.json command. Something like > >> > > >> > curl http://localhost:6800/schedule.json -d project=sole24ore -d > >> > spider=yourspidername > >> > > >> > For your second problem your settings.py is misconfigured your feed > >> > settings > >> > should be like > >> > > >> > FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json' > >> > FEED_FORMAT = 'json' > >> > > >> > Hope that helps > >> > > >> > On Tuesday, January 20, 2015 at 4:23:04 AM UTC-8, Marco Ippolito > wrote: > >> >> > >> >> Hi, > >> >> I' ve got 2 situations to solve. > >> >> > >> >> Seems that everything is ok: > >> >> > >> >> (SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore > -p > >> >> sole24ore > >> >> Packing version 1421755479 > >> >> Deploying to project "sole24ore" in > >> >> http://localhost:6800/addversion.json > >> >> Server response (200): > >> >> {"status": "ok", "project": "sole24ore", "version": "1421755479", > >> >> "spiders": 1} > >> >> > >> >> > >> >> marco@pc:/var/lib/scrapyd/dbs$ ls -lah > >> >> totale 12K > >> >> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 . > >> >> drwxr-xr-x 5 scrapy nogroup 4,0K gen 20 06:55 .. > >> >> -rw-r--r-- 1 root root 2,0K gen 20 13:04 sole24ore.db > >> >> > >> >> > >> >> marco@pc:/var/lib/scrapyd/eggs/sole24ore$ ls -lah > >> >> totale 16K > >> >> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 . > >> >> drwxr-xr-x 3 scrapy nogroup 4,0K gen 20 12:47 .. > >> >> -rw-r--r-- 1 scrapy nogroup 5,5K gen 20 13:04 1421755479.egg > >> >> > >> >> > >> >> , but nothing is executed > >> >> > >> >> marco@pc:/var/lib/scrapyd/items/sole24ore/sole$ ls -a > >> >> . .. > >> >> > >> >> [detached from 2515.pts-4.pc] > >> >> marco@pc:~/crawlscrape/sole24ore$ curl > >> >> http://localhost:6800/listjobs.json?project=sole24ore > >> >> {"status": "ok", "running": [], "finished": [], "pending": []} > >> >> > >> >> > >> >> > >> >> The second aspect regards how to save the output into a json file. > >> >> What is the correct form to put into settings.py? > >> >> > >> >> ile Edit Options Buffers Tools Python Help > >> >> # Scrapy settings for sole24ore project > >> >> # > >> >> # For simplicity, this file contains only the most important > settings > >> >> by > >> >> # default. All the other settings are documented here: > >> >> # > >> >> # http://doc.scrapy.org/en/latest/topics/settings.html > >> >> # > >> >> > >> >> BOT_NAME = 'sole24ore' > >> >> > >> >> SPIDER_MODULES = ['sole24ore.spiders'] > >> >> NEWSPIDER_MODULE = 'sole24ore.spiders' > >> >> > >> >> FEED_URI=file://home/marco/crawlscrape/sole24ore/output.json --set > >> >> FEED_FORMAT=json > >> >> > >> >> > >> >> SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore -p > >> >> sole24ore > >> >> Packing version 1421756389 > >> >> Deploying to project "sole24ore" in > >> >> http://localhost:6800/addversion.json > >> >> Server response (200): > >> >> {"status": "error", "message": "SyntaxError: invalid syntax"} > >> >> > >> >> > >> >> # Crawl responsibly by identifying yourself (and your website) on > the > >> >> user-agent > >> >> #USER_AGENT = 'sole24ore (+http://www.yourdomain.com)' > >> >> > >> >> Looking forward to your kind help. > >> >> Kind regards. > >> >> Marco > >> > > >> > -- > >> > You received this message because you are subscribed to a topic in > the > >> > Google Groups "scrapy-users" group. > >> > To unsubscribe from this topic, visit > >> > > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe. > >> > To unsubscribe from this group and all its topics, send an email to > >> > [email protected]. > >> > To post to this group, send email to [email protected]. > >> > Visit this group at http://groups.google.com/group/scrapy-users. > >> > For more options, visit https://groups.google.com/d/optout. > > > > -- > > You received this message because you are subscribed to a topic in the > > Google Groups "scrapy-users" group. > > To unsubscribe from this topic, visit > > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to > > [email protected] <javascript:>. > > To post to this group, send email to [email protected] > <javascript:>. > > Visit this group at http://groups.google.com/group/scrapy-users. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
