Re: Ideas for GSOC '17

Parth Verma Tue, 07 Mar 2017 03:19:38 -0800

Hi Paul,

I have mailed Daniel about it now. I just wanted to know that is
benchmarking/memory management a major issue for scrapy as of now?


On 7 March 2017 at 15:53, Paul Tremberth <paul.trembe...@gmail.com> wrote:

> Hi Parth,
>
> Mikhail actually requested to be removed from the mentors list for the
> benchamrking suite idea. (He may be a backup mentor in the future if needed
> but does not have enough insights on the idea at the moment)
> The other mentor to contact would be Daniel Grana.
>
> Best,
> Paul.
>
> On Tue, Mar 7, 2017 at 11:07 AM, Parth Verma <vermapart...@gmail.com>
> wrote:
>
>> Hi Paul,
>>
>> I am currently looking at the issues that you mentioned and have studied
>> how benchmarking works. I am currently going through the various issues
>> related with memory leaks and have also opened an issue (
>> https://github.com/scrapy/scrapy/issues/2629).
>>
>> How do I get into contact with Mikhail?
>>
>> Parth
>>
>> On Friday, 3 March 2017 22:16:55 UTC+5:30, Paul Tremberth wrote:
>>>
>>> Hello Parth,
>>>
>>> Sorry we did not reply to your first message in February.
>>> It's great that you're interested in participating in GSoC with a Scrapy
>>> project!
>>>
>>> For "Scrapy benchmarking suite" idea, you may want to get in touch with
>>> Daniel and Mikhail who are listed as potential mentors for the project.
>>>
>>> A few pointers in the meantime:
>>> Scrapy currently has a `scrapy bench` command that tries to fetch pages
>>> at maximum speed:
>>> https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking
>>> You can check how that is implemented and what is does and does not.
>>> It's quite naive and may not represent a realistic use-case with large
>>> or broken HTML files, or broad crawls with lots of domains visited
>>>
>>> Scrapy commands also have a (undocumented?) --profile option to write
>>> cProfile stats.
>>> you can try it out to see what you can get out of it.
>>>
>>> There are (at least) a couple of issues about potential memory leaks:
>>> - https://github.com/scrapy/scrapy/issues/482
>>> - https://github.com/scrapy/scrapy/issues/482
>>>
>>> Another question: maybe Python 2 and Python 3 show differences in terms
>>> of CPU and memory usage?
>>>
>>> I would assume a succesful project for GSoC would allow investigating
>>> such issues and find the root causes (if not fixing them).
>>>
>>> Hope this helps,
>>> Paul.
>>>
>>>
>>> On Fri, Mar 3, 2017 at 11:51 AM, Parth Verma <vermap...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm interested in "Scrapy benchmarking suite" idea in the ideas list
>>>> for GSoC '17.
>>>> Please help with what are the prerequisites for the same.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Saturday, 11 February 2017 21:20:19 UTC+5:30, Parth Verma wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am Parth Verma, a second year undergraduate pursuing MSc. in
>>>>> Mathematics and Computing at IIT Kharagpur, India.
>>>>> I have been doing open-source programming for a year. My github
>>>>> profile is https://github.com/Parth-Vader.
>>>>> My programming knowledge includes Python (Intermediate) , C
>>>>> (Intermediate) , C++(Intermediate), HTML/CSS (basic) and Bash. I use 
>>>>> Ubuntu
>>>>> 16.04 as my main operating system and Windows 8 for gaming.
>>>>> I have been doing Data Analytics, and for that, I need to collect data
>>>>> from various online sources and that's why I used Scrapy.
>>>>>
>>>>> I am interested in Scrapy benchmarking suite, since I have prior
>>>>> knowledge of various algorithms and I want to learn memory management in
>>>>> CPUs. What should be my next steps?
>>>>>
>>>>> Furthermore, I would like to suggest an idea.
>>>>>
>>>>> A new section in the official documentation could be added where
>>>>> people could share their configuration files that they used to 
>>>>> successfully
>>>>> scrape data from a specific website (by successful, I mean not getting
>>>>> banned and getting a good speed.) This way, I believe , it would be easier
>>>>> for people without any prior knowledge of HTML, Python or Shell, could
>>>>> easily use scrapy to get data from those specific sites.
>>>>> In addition, we could create benchmarking for those sites as well.
>>>>>
>>>>> Thanks.
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to scrapy-users...@googlegroups.com.
>>>> To post to this group, send email to scrapy...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to scrapy-users+unsubscr...@googlegroups.com.
>> To post to this group, send email to scrapy-users@googlegroups.com.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/scrapy-users/9NtN3a48Cu8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> scrapy-users+unsubscr...@googlegroups.com.
> To post to this group, send email to scrapy-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>



-- 


*Parth Verma*

*Sophomore*

*Mathematics Department*
*IIT Kharagpur*

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Ideas for GSOC '17

Reply via email to