It is about one year I'm working on an e-commerce site, and unfortunately I have no "information retrieval" background, so probably I am missing some important practices about relevance tuning and search engines. During this period I had to fix many "bugs" about bad search results, which I have solved sometimes tuning edismax weights, sometimes creating ad hoc query filters or query boosting; but I am still not able to figure out what should be the correct process to improve search results relevance.
These are the practices I am following, I would really appreciate any comments about them and any hints about what practices you follow in your projects: - In order to have a measure of search quality I have written many test cases such as "if the user searches for <<nike sport watch>> the search result should display at least four <<tom tom>> products with the words <<nike>> and <<sportwatch>> in the title". I have written a tool that read such tests from json files and applies them to my applications, and then counts the number of results that does not match the criterias stated in the test cases. (for those interested this tool is available at https://github.com/gibri/kelvin but it is still quite a prototype) - I use this count as a quality index, I tried various times to change the edismax weight to lower the whole number of error, or to add new filters/boostings to the application to try to decrease the error count. - The pros of this is that at least you have a number to look at, and that you have a quick way of checking the impact of a modification. - The bad side is that you have to maintain the test cases: now I have about 800 tests and my product catalogue changes often, this implies that some products exits the catalog, and some test cases cant pass anymore. - I am populating the test cases using errors reported from users, and I feel that this is driving the test cases too much toward pathologic cases. An more over I haven't many test for cases that are working well now. I would like to use search logs as drivers to generate tests, but I feel I haven't picked the right path. Using top queries, manually reviewing results, and then writing tests is a slow process; moreover many top queries are ambiguous or are driven by site ads. Many many queries are unique per users. How to deal with these cases? How are you using your log to find out test cases to fix? Are you looking for queries where the user is not "opening" any returned results? Which kpi have you chosen to find out query that are not providing good results? And what are you using as kpi for the whole search, beside the conversion rate? Can you suggest me any other practices you are using on your projects? Thank you very much in advance Giovanni