Hello Shankar, It's great to know you're looking forward to participating in GSoC 2017! For the HTTP/1.1 downloader project, I suggest that you get familiar with how scrapy uses Twised in [1]
Scrapy uses Twisted Agent [2] and customizes it to handle proxies with the CONNECT method, TLS connections without verifying peer certificates or using a specific TLS method, etc. Note that some of the description for the project [3] is not up-to-date as I re-read it. Especially, scrapy does not ship with Twisted code anymore in scrapy.xlib.tx Also, you can check the open issues around HTTP in GitHub. For example, there's an old ticket about handling responses without a reason phrase [4] Here is a recent Pull Request to Twisted [5] by one of the contributors to Scrapy, namely Rolango, to be able to customize the HTTP client parser. This could be a pre-requisite for the GSoC project. I would also say that Scrapy HTTP/1.1 download handler needs more thorough tests, with all the various good and bad practices from web servers, especially for HTTP proxies and TLS connections. Just to name a few: - servers that never respond - servers that send less bytes than advertized [6] - servers can be very slow, or throttling a lot Some of these tests are already implemented, some of them are less robust or incomplete (see [7]) Finally, as bonus points, it would be great to see how far Scrapy is from supporting an HTTP 2 client (see [8]) Hope this helps, Paul. [1] https://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/handlers/http11.py [2] https://twistedmatrix.com/documents/current/web/howto/client.html [3] http://gsoc2017.scrapinghub.com/ideas/#download-handler [4] https://github.com/scrapy/scrapy/issues/345 [5] https://github.com/twisted/twisted/pull/712 [6] https://github.com/scrapy/scrapy/issues/2586 [7] https://github.com/scrapy/scrapy/issues/2545 [8] https://github.com/scrapy/scrapy/issues/1854 On Sat, Feb 25, 2017 at 7:00 PM, SHANKAR JHA <shankar...@gmail.com> wrote: > Hi Scrapians, > > I am studying computer science and newbie in GSoC. I always want to > contribute to open source project. > > I just found "*new HTTP/1.1 downloader handler" *in scrapy project ideas > and I am very excited to work on this. > I am good in python and HTTP but never worked on twisted. > > Can anyone help me and show me the direction, So that I can start working > on my project. > > With Due Regards, > Shankar Jha > > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.