Re: GSOC : Support for spiders in other languages

faisal anees Sun, 23 Feb 2014 10:55:21 -0800

I actually wanted INPUT.py and OUTPUT.py to be single file .i.e, I wanted 
this single file to send the requests to the spider , then wait while the 
spider does the computations and then take the response from the spider, 
But the reason why I have separated them is that, I use the subprocess 
library to fork the process and it does blocking I/O.


Let me explain this clearly, when we run 

>scrapy crawl streaming -a 
Input=/home/faisal/Dropbox/PROGRAMS/SCRAPY/sandbox/INPUT.py -a 
Output=/home/faisal/Dropbox/PROGRAMS/SCRAPY/sandbox/OUPUT.py

the init function of the spider class will fork out a process which 
executes the INPUT.py file , using the subprocess library, which uses a 
PIPE for I/O. Now the issue is that I can send some data to the INPUT.py as 
stdin and take some data from INPUT.py through stdout. But I have only one 
chance to do this, and *the spider will have to wait. *Whereas I want the 
*INPUT.py 
to wait* while the spider does the computations.

In short, subprocess does not support *non-blocking I/O. *Thus, I had to 
seperate input and ouput.

I hope you have understood what I am trying to convey . Do you have any 
suggestions ? I also request other scrapy members reading this to help me 
with my problem .

Thanks
Faisal
On Monday, February 17, 2014 8:41:04 PM UTC+4, faisal anees wrote:
>
> Hi guys,
>
> I am Mohammed Faisal Anees, a second year Computer Science undergrad at 
> IIIT Hyderabad, India. I was really happy when I got to know that 
> Scrapy(which has helped me a lot in my projects:) ) is taking part in GSOC 
> 2014. What's better than contributing to an organisation that has helped 
> you a lot ?!!
>
> I was interested in this idea on the ideas page "Support for spiders in 
> other languages". TI had some questions regarding this:
>
> 1) Do we have to make wrappers or should the code be written in the other 
> language from scratch ? 
>
> 2) Quoting from the ideas page "The goal of this project is to allow 
> developers to write spiders simply and easily in any programming language, 
> while permitting Scrapy to manage concurrency, scheduling, item exporting, 
> caching, etc."  Does this mean this project will enable any programming 
> language to use Scrapy ... or will we be adding support for languages 
> separately one by one?
>
> 3) Which language will be better ? This question will depend on what the 
> target audience is .. Developers or Scientists ? We can expect developers 
> to be familiar with Javascript/Ruby/Java/Python/etc , Whereas Scientists 
> would know C/C++/Python/Java. This is just my view, I might be wrong too !!
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: GSOC : Support for spiders in other languages

Reply via email to