Hey Giovanni, nice to meet you. I'm the person that did the Test Driven Relevancy talk. We've got a product Quepid (http://quepid.com) that lets you gather good/bad results for queries and do a sort of test driven development against search relevancy. Sounds similar to your existing scripted approach. Have you considered keeping a static catalog for testing purposes? We had a project with a lot of updates and date-dependent relevancy. This lets you create some test scenarios against a static data set. However, one downside is you can't recreate problems in production in your test setup exactly-- you have to find a similar issue that reflects what you're seeing.
Cheers, -Doug On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi < giovanni.bricc...@banzai.it> wrote: > Thank you for the links. > > The book is really useful, I will definitively have to spend some time > reformatting the logs to to access number of result founds, session id and > much more. > > I'm also quite happy that my test cases produces similar results to the > precision reports shown at the beginning of the book. > > Giovanni > > > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan <iori...@yahoo.com>: > > > Hi Giovanni, > > > > Here are some relevant pointers : > > > > > > > http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy > > > > > > http://rosenfeldmedia.com/books/search-analytics/ > > > > http://www.sematext.com/search-analytics/index.html > > > > > > Ahmet > > > > > > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi < > > giovanni.bricc...@banzai.it> wrote: > > It is about one year I'm working on an e-commerce site, and > unfortunately I > > have no "information retrieval" background, so probably I am missing some > > important practices about relevance tuning and search engines. > > During this period I had to fix many "bugs" about bad search results, > which > > I have solved sometimes tuning edismax weights, sometimes creating ad hoc > > query filters or query boosting; but I am still not able to figure out > what > > should be the correct process to improve search results relevance. > > > > These are the practices I am following, I would really appreciate any > > comments about them and any hints about what practices you follow in your > > projects: > > > > - In order to have a measure of search quality I have written many test > > cases such as "if the user searches for <<nike sport watch>> the search > > result should display at least four <<tom tom>> products with the words > > <<nike>> and <<sportwatch>> in the title". I have written a tool that > read > > such tests from json files and applies them to my applications, and then > > counts the number of results that does not match the criterias stated in > > the test cases. (for those interested this tool is available at > > https://github.com/gibri/kelvin but it is still quite a prototype) > > > > - I use this count as a quality index, I tried various times to change > the > > edismax weight to lower the whole number of error, or to add new > > filters/boostings to the application to try to decrease the error count. > > > > - The pros of this is that at least you have a number to look at, and > that > > you have a quick way of checking the impact of a modification. > > > > - The bad side is that you have to maintain the test cases: now I have > > about 800 tests and my product catalogue changes often, this implies that > > some products exits the catalog, and some test cases cant pass anymore. > > > > - I am populating the test cases using errors reported from users, and I > > feel that this is driving the test cases too much toward pathologic > cases. > > An more over I haven't many test for cases that are working well now. > > > > I would like to use search logs as drivers to generate tests, but I feel > I > > haven't picked the right path. Using top queries, manually reviewing > > results, and then writing tests is a slow process; moreover many top > > queries are ambiguous or are driven by site ads. > > > > Many many queries are unique per users. How to deal with these cases? > > > > How are you using your log to find out test cases to fix? Are you looking > > for queries where the user is not "opening" any returned results? Which > kpi > > have you chosen to find out query that are not providing good results? > And > > what are you using as kpi for the whole search, beside the conversion > rate? > > > > Can you suggest me any other practices you are using on your projects? > > > > Thank you very much in advance > > > > Giovanni > > > > > -- Doug Turnbull Search & Big Data Architect OpenSource Connections <http://o19s.com>