Hi Daniel, thanks again for helping. I tried with FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json' FEED_FORMAT = 'json'
and with FEED_URI = 'output.json' FEED_FORMAT = 'json' in both cases there is no output and not error message Any hints? Marco 2015-01-21 20:28 GMT+01:00 Daniel Fockler <[email protected]>: > You'll want to make sure in your settings.py that feed format is set, like > > FEED_FORMAT = 'json' > > If it doesn't work after that then just try changing feed uri to > > FEED_URI = 'output.json' > > and scrapy will dump it in your project root > > On Tuesday, January 20, 2015 at 11:00:50 PM UTC-8, Marco Ippolito wrote: >> >> Hi Daniel, >> thank you very much for your kind help. >> >> After scheduling the spider run, an output is actually produced: >> >> Opening file >> /var/lib/scrapyd/items/sole24ore/sole/89d644f8a13a11e4a2afc04a00090e80.jl >> Read output! >> This is my output: >> {"url": ["http://m.bbc.co.uk", "http://www.bbc.com/news/", " ..... >> >> But modifying the feed settings as: >> BOT_NAME = 'sole24ore' >> >> SPIDER_MODULES = ['sole24ore.spiders'] >> NEWSPIDER_MODULE = 'sole24ore.spiders' >> >> FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json' >> >> doesn't produce an output.json into /home/marco/crawlscrape/sole24ore >> >> am I missing some other steps? >> >> Marco >> >> 2015-01-20 18:45 GMT+01:00 Daniel Fockler <[email protected]>: >> > For your first problem, you've started the scrapyd project but you need >> > to >> > schedule a spider run using the schedule.json command. Something like >> > >> > curl http://localhost:6800/schedule.json -d project=sole24ore -d >> > spider=yourspidername >> > >> > For your second problem your settings.py is misconfigured your feed >> > settings >> > should be like >> > >> > FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json' >> > FEED_FORMAT = 'json' >> > >> > Hope that helps >> > >> > On Tuesday, January 20, 2015 at 4:23:04 AM UTC-8, Marco Ippolito wrote: >> >> >> >> Hi, >> >> I' ve got 2 situations to solve. >> >> >> >> Seems that everything is ok: >> >> >> >> (SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore -p >> >> sole24ore >> >> Packing version 1421755479 >> >> Deploying to project "sole24ore" in >> >> http://localhost:6800/addversion.json >> >> Server response (200): >> >> {"status": "ok", "project": "sole24ore", "version": "1421755479", >> >> "spiders": 1} >> >> >> >> >> >> marco@pc:/var/lib/scrapyd/dbs$ ls -lah >> >> totale 12K >> >> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 . >> >> drwxr-xr-x 5 scrapy nogroup 4,0K gen 20 06:55 .. >> >> -rw-r--r-- 1 root root 2,0K gen 20 13:04 sole24ore.db >> >> >> >> >> >> marco@pc:/var/lib/scrapyd/eggs/sole24ore$ ls -lah >> >> totale 16K >> >> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 . >> >> drwxr-xr-x 3 scrapy nogroup 4,0K gen 20 12:47 .. >> >> -rw-r--r-- 1 scrapy nogroup 5,5K gen 20 13:04 1421755479.egg >> >> >> >> >> >> , but nothing is executed >> >> >> >> marco@pc:/var/lib/scrapyd/items/sole24ore/sole$ ls -a >> >> . .. >> >> >> >> [detached from 2515.pts-4.pc] >> >> marco@pc:~/crawlscrape/sole24ore$ curl >> >> http://localhost:6800/listjobs.json?project=sole24ore >> >> {"status": "ok", "running": [], "finished": [], "pending": []} >> >> >> >> >> >> >> >> The second aspect regards how to save the output into a json file. >> >> What is the correct form to put into settings.py? >> >> >> >> ile Edit Options Buffers Tools Python Help >> >> # Scrapy settings for sole24ore project >> >> # >> >> # For simplicity, this file contains only the most important settings >> >> by >> >> # default. All the other settings are documented here: >> >> # >> >> # http://doc.scrapy.org/en/latest/topics/settings.html >> >> # >> >> >> >> BOT_NAME = 'sole24ore' >> >> >> >> SPIDER_MODULES = ['sole24ore.spiders'] >> >> NEWSPIDER_MODULE = 'sole24ore.spiders' >> >> >> >> FEED_URI=file://home/marco/crawlscrape/sole24ore/output.json --set >> >> FEED_FORMAT=json >> >> >> >> >> >> SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore -p >> >> sole24ore >> >> Packing version 1421756389 >> >> Deploying to project "sole24ore" in >> >> http://localhost:6800/addversion.json >> >> Server response (200): >> >> {"status": "error", "message": "SyntaxError: invalid syntax"} >> >> >> >> >> >> # Crawl responsibly by identifying yourself (and your website) on the >> >> user-agent >> >> #USER_AGENT = 'sole24ore (+http://www.yourdomain.com)' >> >> >> >> Looking forward to your kind help. >> >> Kind regards. >> >> Marco >> > >> > -- >> > You received this message because you are subscribed to a topic in the >> > Google Groups "scrapy-users" group. >> > To unsubscribe from this topic, visit >> > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe. >> > To unsubscribe from this group and all its topics, send an email to >> > [email protected]. >> > To post to this group, send email to [email protected]. >> > Visit this group at http://groups.google.com/group/scrapy-users. >> > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "scrapy-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
