Hi Paul, I am currently looking at the issues that you mentioned and have studied how benchmarking works. I am currently going through the various issues related with memory leaks and have also opened an issue ( https://github.com/scrapy/scrapy/issues/2629).
How do I get into contact with Mikhail? Parth On Friday, 3 March 2017 22:16:55 UTC+5:30, Paul Tremberth wrote: > > Hello Parth, > > Sorry we did not reply to your first message in February. > It's great that you're interested in participating in GSoC with a Scrapy > project! > > For "Scrapy benchmarking suite" idea, you may want to get in touch with > Daniel and Mikhail who are listed as potential mentors for the project. > > A few pointers in the meantime: > Scrapy currently has a `scrapy bench` command that tries to fetch pages at > maximum speed: > https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking > You can check how that is implemented and what is does and does not. > It's quite naive and may not represent a realistic use-case with large or > broken HTML files, or broad crawls with lots of domains visited > > Scrapy commands also have a (undocumented?) --profile option to write > cProfile stats. > you can try it out to see what you can get out of it. > > There are (at least) a couple of issues about potential memory leaks: > - https://github.com/scrapy/scrapy/issues/482 > - https://github.com/scrapy/scrapy/issues/482 > > Another question: maybe Python 2 and Python 3 show differences in terms of > CPU and memory usage? > > I would assume a succesful project for GSoC would allow investigating such > issues and find the root causes (if not fixing them). > > Hope this helps, > Paul. > > > On Fri, Mar 3, 2017 at 11:51 AM, Parth Verma <vermap...@gmail.com > <javascript:>> wrote: > >> Hi, >> >> I'm interested in "Scrapy benchmarking suite" idea in the ideas list for >> GSoC '17. >> Please help with what are the prerequisites for the same. >> >> Thanks. >> >> >> On Saturday, 11 February 2017 21:20:19 UTC+5:30, Parth Verma wrote: >>> >>> Hi, >>> >>> I am Parth Verma, a second year undergraduate pursuing MSc. in >>> Mathematics and Computing at IIT Kharagpur, India. >>> I have been doing open-source programming for a year. My github profile >>> is https://github.com/Parth-Vader. >>> My programming knowledge includes Python (Intermediate) , C >>> (Intermediate) , C++(Intermediate), HTML/CSS (basic) and Bash. I use Ubuntu >>> 16.04 as my main operating system and Windows 8 for gaming. >>> I have been doing Data Analytics, and for that, I need to collect data >>> from various online sources and that's why I used Scrapy. >>> >>> I am interested in Scrapy benchmarking suite, since I have prior >>> knowledge of various algorithms and I want to learn memory management in >>> CPUs. What should be my next steps? >>> >>> Furthermore, I would like to suggest an idea. >>> >>> A new section in the official documentation could be added where people >>> could share their configuration files that they used to successfully scrape >>> data from a specific website (by successful, I mean not getting banned and >>> getting a good speed.) This way, I believe , it would be easier for people >>> without any prior knowledge of HTML, Python or Shell, could easily use >>> scrapy to get data from those specific sites. >>> In addition, we could create benchmarking for those sites as well. >>> >>> Thanks. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to scrapy-users...@googlegroups.com <javascript:>. >> To post to this group, send email to scrapy...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.