Hi Paul,

I am currently looking at the issues that you mentioned and have studied 
how benchmarking works. I am currently going through the various issues 
related with memory leaks and have also opened an issue (
https://github.com/scrapy/scrapy/issues/2629).

How do I get into contact with Mikhail?

Parth

On Friday, 3 March 2017 22:16:55 UTC+5:30, Paul Tremberth wrote:
>
> Hello Parth,
>
> Sorry we did not reply to your first message in February.
> It's great that you're interested in participating in GSoC with a Scrapy 
> project!
>
> For "Scrapy benchmarking suite" idea, you may want to get in touch with 
> Daniel and Mikhail who are listed as potential mentors for the project.
>
> A few pointers in the meantime:
> Scrapy currently has a `scrapy bench` command that tries to fetch pages at 
> maximum speed:
> https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking
> You can check how that is implemented and what is does and does not.
> It's quite naive and may not represent a realistic use-case with large or 
> broken HTML files, or broad crawls with lots of domains visited
>
> Scrapy commands also have a (undocumented?) --profile option to write 
> cProfile stats.
> you can try it out to see what you can get out of it.
>
> There are (at least) a couple of issues about potential memory leaks:
> - https://github.com/scrapy/scrapy/issues/482
> - https://github.com/scrapy/scrapy/issues/482
>
> Another question: maybe Python 2 and Python 3 show differences in terms of 
> CPU and memory usage?
>
> I would assume a succesful project for GSoC would allow investigating such 
> issues and find the root causes (if not fixing them).
>
> Hope this helps,
> Paul.
>
>
> On Fri, Mar 3, 2017 at 11:51 AM, Parth Verma <vermap...@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I'm interested in "Scrapy benchmarking suite" idea in the ideas list for 
>> GSoC '17.
>> Please help with what are the prerequisites for the same.
>>
>> Thanks.
>>
>>
>> On Saturday, 11 February 2017 21:20:19 UTC+5:30, Parth Verma wrote:
>>>
>>> Hi,
>>>
>>> I am Parth Verma, a second year undergraduate pursuing MSc. in 
>>> Mathematics and Computing at IIT Kharagpur, India.
>>> I have been doing open-source programming for a year. My github profile 
>>> is https://github.com/Parth-Vader.
>>> My programming knowledge includes Python (Intermediate) , C 
>>> (Intermediate) , C++(Intermediate), HTML/CSS (basic) and Bash. I use Ubuntu 
>>> 16.04 as my main operating system and Windows 8 for gaming.
>>> I have been doing Data Analytics, and for that, I need to collect data 
>>> from various online sources and that's why I used Scrapy.
>>>
>>> I am interested in Scrapy benchmarking suite, since I have prior 
>>> knowledge of various algorithms and I want to learn memory management in 
>>> CPUs. What should be my next steps?
>>>
>>> Furthermore, I would like to suggest an idea.
>>>
>>> A new section in the official documentation could be added where people 
>>> could share their configuration files that they used to successfully scrape 
>>> data from a specific website (by successful, I mean not getting banned and 
>>> getting a good speed.) This way, I believe , it would be easier for people 
>>> without any prior knowledge of HTML, Python or Shell, could easily use 
>>> scrapy to get data from those specific sites.
>>> In addition, we could create benchmarking for those sites as well.
>>>
>>> Thanks.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to scrapy-users...@googlegroups.com <javascript:>.
>> To post to this group, send email to scrapy...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to