-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all,
During the week: I started building a search engine software of my own. After looking through many crawlers I decided to use Scrapy[1]. There are a few reasons for this: Scrapy is very mature and maintained by a company with active developer community, it is a Python software, there is Django integration, it's flexible, and it's pipeline architecture is simple. So, I will attach scrapy crawler (onionbot) to Django + Postgresql with the popularity data that ahmia has been gathering. In this model a website data is: 1) URL 2) keywords (HTML keywords, title, h1, h2, h3, h3 etc.) 3) All the words from the page (word1: count_of_word1, word2: count_of_word2...) 4) Domain 5) Public WWW backlinks to the domain 6) Popularity according to the Tor2web stats 7) Number of clicks in the search results to the domain Hopefully, this will work. I have no idea before I run a the prototype software. [1] http://scrapy.org/ Have a nice weekend everyone! Greetings, Juha -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTv9wrAAoJELGTs54GL8vATZAIALHaX+9o5Li7w9HyY7U76NKu uilmQxmgE5+uuhx2f9cMLxYjG8z3MU2haRSpv8SuU7pzuTQghPOdLKqtdUuqfKJ2 RZQb6nOvdJNsyP7Mo2hF7DBY9ASVp4vLA5KKhKUD1q2LQV2rZ95gMYDLHfaY+ref IpCU6rYIZSlbT7MFYW4/SXX1762AIilXfpDrGzzZQV5OeCCBkS5sG6Xe3SeF8Foa xCJtfR0/I1WtAczACwjKB+PTTIzPg9gOXutZvDhJSmEr7GRzx38GnztcgoroiIq3 CQ8UWcyLua2UzvMUuI3sIWS7B4Y14yfsbR+4zzuIIS2G6CBUwW+tHlrcCiBZGy0= =z/Oj -----END PGP SIGNATURE----- _______________________________________________ tor-reports mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-reports
