Hello Parth, Sorry we did not reply to your first message in February. It's great that you're interested in participating in GSoC with a Scrapy project!
For "Scrapy benchmarking suite" idea, you may want to get in touch with Daniel and Mikhail who are listed as potential mentors for the project. A few pointers in the meantime: Scrapy currently has a `scrapy bench` command that tries to fetch pages at maximum speed: https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking You can check how that is implemented and what is does and does not. It's quite naive and may not represent a realistic use-case with large or broken HTML files, or broad crawls with lots of domains visited Scrapy commands also have a (undocumented?) --profile option to write cProfile stats. you can try it out to see what you can get out of it. There are (at least) a couple of issues about potential memory leaks: - https://github.com/scrapy/scrapy/issues/482 - https://github.com/scrapy/scrapy/issues/482 Another question: maybe Python 2 and Python 3 show differences in terms of CPU and memory usage? I would assume a succesful project for GSoC would allow investigating such issues and find the root causes (if not fixing them). Hope this helps, Paul. On Fri, Mar 3, 2017 at 11:51 AM, Parth Verma <vermapart...@gmail.com> wrote: > Hi, > > I'm interested in "Scrapy benchmarking suite" idea in the ideas list for > GSoC '17. > Please help with what are the prerequisites for the same. > > Thanks. > > > On Saturday, 11 February 2017 21:20:19 UTC+5:30, Parth Verma wrote: >> >> Hi, >> >> I am Parth Verma, a second year undergraduate pursuing MSc. in >> Mathematics and Computing at IIT Kharagpur, India. >> I have been doing open-source programming for a year. My github profile >> is https://github.com/Parth-Vader. >> My programming knowledge includes Python (Intermediate) , C >> (Intermediate) , C++(Intermediate), HTML/CSS (basic) and Bash. I use Ubuntu >> 16.04 as my main operating system and Windows 8 for gaming. >> I have been doing Data Analytics, and for that, I need to collect data >> from various online sources and that's why I used Scrapy. >> >> I am interested in Scrapy benchmarking suite, since I have prior >> knowledge of various algorithms and I want to learn memory management in >> CPUs. What should be my next steps? >> >> Furthermore, I would like to suggest an idea. >> >> A new section in the official documentation could be added where people >> could share their configuration files that they used to successfully scrape >> data from a specific website (by successful, I mean not getting banned and >> getting a good speed.) This way, I believe , it would be easier for people >> without any prior knowledge of HTML, Python or Shell, could easily use >> scrapy to get data from those specific sites. >> In addition, we could create benchmarking for those sites as well. >> >> Thanks. >> > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.