Thanks Ismael! Great work. You are very productive.
-Juha On Sat, Aug 13, 2016 at 1:02 AM, Ismael R <[email protected]> wrote: > Hi everyone, > > I'm working on ahmia.fi, the hidden service search engine and you're > reading > status update #6. > > During the last two weeks, I finished porting the django app to the new > structure. I'm also working on last minute things before shipping the new > site > online. > > I will continue updating documentation and add some unit tests to the > project. > > The code is not merged yet but you're welcome to check it on my forks. [1] > [2] > > > Since this status report is short, here is a list of goals I had in my > initial > project proposition and what work has been done on each. > > Review code and infrastructure: > - Split the project in several repositories > - Improve documentation > - Automate testing (Travis.CI) > - Track code quality (Landscape.IO) > - Track requirements (Requires.IO) > - Refactor each subproject > > Improve search results: > - Better use of elasticsearch (use of stemmers, shingles, term-centric > search) > - Search results are now pages instead of domains. > > Improve UI/UX: > Not much work has been done for this goal. The website has been in the > process > of porting old pages to a new design. All pages are now using the new > design. > > Gather more statistics: > - Pagerank is now used to compute an authority score for each page > - I suggested that we could use a self hosted statistics framework like > piwik > [3] but no decision has been made. > > Use stats to better rank search results: > - Results are ranked by authority score. > > Make sense of the indexed info to understand a search meaning: > - Shingles enable us to differenciate these two queries: "i'm not happy i'm > working" and "i'm happy i'm not working". > - Synonyms could be used by the search algorithm if we provided a synonym > dictionnary. No work has been done at making this work. > > Make a google trend-like interface to visualize searches over time: > No work has been done to reach this optional goal. Even some stats > fonctionnalities were dropped in the new site because they were "domain- > centric" when a search engine needs to be "page-centric". We could probably > index searches in elasticsearch and use Date Histogram Aggregation [4] to > display trends. > > Make stats available with the API: > No work has been done to reach this optional goal. Some API endpoints were > also dropped because they were domain-centric. It would be great to have an > API with a coherent url scheme. I think Django Rest Framework can help > design > that API while keeping the code simple. > > > That's it for this week, > Have a nice weekend. > > Ismael R. > > > [1] https://github.com/iriahi/ahmia-site > [2] https://github.com/iriahi/ahmia-crawler > [3] https://piwik.org/ > [4] https://www.elastic.co/guide/en/elasticsearch/reference/ > current/search-aggregations-bucket-datehistogram-aggregation.html > [5] http://www.django-rest-framework.org/ > _______________________________________________ > tor-dev mailing list > [email protected] > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev >
_______________________________________________ tor-dev mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
