Weird issue trying to get Scrapy to run on Windows Scheduled Task

alvin. zing Mon, 08 May 2017 05:38:58 -0700

So I am trying to run the below code from a Window Scheduled Task.
The weird thing is when I try running it by hand in a command prompt, it 
works.
However when the ST run the code, it only return the following, and the 
program will end.
I am struggling with this, may the community please help me with this?


Thank you in advance.



[The Command Prompt output when run from the ST]
C:\Windows\system32>python "C:\Users\xyz\Google Drive\cineplex\start.py" 
seati ngs 2017-05-06 21:47:03 [scrapy.utils.log] INFO: Scrapy 1.3.3 started 
(bot: scrapybo t) 2017-05-06 21:47:03 [scrapy.utils.log] INFO: Overridden 
settings: {} C:\Windows\system32>pause Press any key to continue . . . 



[The Python script I am running]
from cineplex import utils
from cineplex.spiders import showtimes_spider as st
from cineplex.spiders import seatings_spider as seat
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import sys
import time
from twisted.internet import reactor, defer

'''
Constant for Parent Directory.
Subfolders will contain all movie times and seatings for the day
'''
PARENT_DIR = r'./data/'


'''
Crawls all Seatings per Cinema
'''
def crawl_all_seatings():
# Create a CrawlerProcess instance to run multiple spiders simultaneously
# Read more here https://doc.scrapy.org/en/latest/topics/practices.html
process = CrawlerProcess()

# Check folder for today
directory_for_today = utils.create_dir_for_today(PARENT_DIR)

# Get all showtimes files' filepaths
filepaths = utils.get_all_showtimes_filepaths(directory_for_today)

# In every filepath, is a file with all the movie session ids
for filepath in filepaths:
sessions = utils.get_all_sessions(filepath)
# Only start crawling if there are sessions.
if len(sessions) > 0:
# Add spiders to crawler process
for session_id in sessions:
process.crawl(seat.SeatingsSpider, session_id=session_id, output_dir=
directory_for_today)

# Start crawling
process.start()


'''
Crawls all Cinemas' movies' showtimes
'''
def crawl_all_showtimes():
# Create a CrawlerProcess instance to run spiders simultaneously
# Read more here https://doc.scrapy.org/en/latest/topics/practices.html
process = CrawlerProcess()

# Check folder for today
directory_for_today = utils.create_dir_for_today(PARENT_DIR)

# Get all cinema id and names first
cinema_dict = utils.get_all_cinemas()

# Iterate through all cinema to get show timings
# Add spiders to crawler process
for cinema_id, cinema_name in cinema_dict.iteritems():
process.crawl(st.ShowTimesSpider, cinema_id=cinema_id, cinema_name=cinema_name, 
output_dir=directory_for_today )

# Start crawling
process.start()


'''
Main program run spiders
'''
def main(argv):
# Turns on Scrapy Logging
# configure_logging()

crawl_type = argv[1]
if crawl_type == 'showtimes':
# Collect all Showtimes
crawl_all_showtimes()

elif crawl_type == 'seatings':
# Collect all Seatings
crawl_all_seatings()

else:
print 'usage: showtimes for crawling show timing or seatings to crawl seats 
occupancy'



if __name__ == "__main__":
# main(sys.argv)
main(['','seatings'])

# Exit the program
sys.exit()

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Weird issue trying to get Scrapy to run on Windows Scheduled Task

Reply via email to