Hi Paul, I have mailed Daniel about it now. I just wanted to know that is benchmarking/memory management a major issue for scrapy as of now?
On 7 March 2017 at 15:53, Paul Tremberth <paul.trembe...@gmail.com> wrote: > Hi Parth, > > Mikhail actually requested to be removed from the mentors list for the > benchamrking suite idea. (He may be a backup mentor in the future if needed > but does not have enough insights on the idea at the moment) > The other mentor to contact would be Daniel Grana. > > Best, > Paul. > > On Tue, Mar 7, 2017 at 11:07 AM, Parth Verma <vermapart...@gmail.com> > wrote: > >> Hi Paul, >> >> I am currently looking at the issues that you mentioned and have studied >> how benchmarking works. I am currently going through the various issues >> related with memory leaks and have also opened an issue ( >> https://github.com/scrapy/scrapy/issues/2629). >> >> How do I get into contact with Mikhail? >> >> Parth >> >> On Friday, 3 March 2017 22:16:55 UTC+5:30, Paul Tremberth wrote: >>> >>> Hello Parth, >>> >>> Sorry we did not reply to your first message in February. >>> It's great that you're interested in participating in GSoC with a Scrapy >>> project! >>> >>> For "Scrapy benchmarking suite" idea, you may want to get in touch with >>> Daniel and Mikhail who are listed as potential mentors for the project. >>> >>> A few pointers in the meantime: >>> Scrapy currently has a `scrapy bench` command that tries to fetch pages >>> at maximum speed: >>> https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking >>> You can check how that is implemented and what is does and does not. >>> It's quite naive and may not represent a realistic use-case with large >>> or broken HTML files, or broad crawls with lots of domains visited >>> >>> Scrapy commands also have a (undocumented?) --profile option to write >>> cProfile stats. >>> you can try it out to see what you can get out of it. >>> >>> There are (at least) a couple of issues about potential memory leaks: >>> - https://github.com/scrapy/scrapy/issues/482 >>> - https://github.com/scrapy/scrapy/issues/482 >>> >>> Another question: maybe Python 2 and Python 3 show differences in terms >>> of CPU and memory usage? >>> >>> I would assume a succesful project for GSoC would allow investigating >>> such issues and find the root causes (if not fixing them). >>> >>> Hope this helps, >>> Paul. >>> >>> >>> On Fri, Mar 3, 2017 at 11:51 AM, Parth Verma <vermap...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I'm interested in "Scrapy benchmarking suite" idea in the ideas list >>>> for GSoC '17. >>>> Please help with what are the prerequisites for the same. >>>> >>>> Thanks. >>>> >>>> >>>> On Saturday, 11 February 2017 21:20:19 UTC+5:30, Parth Verma wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am Parth Verma, a second year undergraduate pursuing MSc. in >>>>> Mathematics and Computing at IIT Kharagpur, India. >>>>> I have been doing open-source programming for a year. My github >>>>> profile is https://github.com/Parth-Vader. >>>>> My programming knowledge includes Python (Intermediate) , C >>>>> (Intermediate) , C++(Intermediate), HTML/CSS (basic) and Bash. I use >>>>> Ubuntu >>>>> 16.04 as my main operating system and Windows 8 for gaming. >>>>> I have been doing Data Analytics, and for that, I need to collect data >>>>> from various online sources and that's why I used Scrapy. >>>>> >>>>> I am interested in Scrapy benchmarking suite, since I have prior >>>>> knowledge of various algorithms and I want to learn memory management in >>>>> CPUs. What should be my next steps? >>>>> >>>>> Furthermore, I would like to suggest an idea. >>>>> >>>>> A new section in the official documentation could be added where >>>>> people could share their configuration files that they used to >>>>> successfully >>>>> scrape data from a specific website (by successful, I mean not getting >>>>> banned and getting a good speed.) This way, I believe , it would be easier >>>>> for people without any prior knowledge of HTML, Python or Shell, could >>>>> easily use scrapy to get data from those specific sites. >>>>> In addition, we could create benchmarking for those sites as well. >>>>> >>>>> Thanks. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "scrapy-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to scrapy-users...@googlegroups.com. >>>> To post to this group, send email to scrapy...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to scrapy-users+unsubscr...@googlegroups.com. >> To post to this group, send email to scrapy-users@googlegroups.com. >> Visit this group at https://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "scrapy-users" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/scrapy-users/9NtN3a48Cu8/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- *Parth Verma* *Sophomore* *Mathematics Department* *IIT Kharagpur* -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.