It is about one year I'm working on an e-commerce site, and unfortunately I
have no "information retrieval" background, so probably I am missing some
important practices about relevance tuning and search engines.
During this period I had to fix many "bugs" about bad search results, which
I have solved sometimes tuning edismax weights, sometimes creating ad hoc
query filters or query boosting; but I am still not able to figure out what
should be the correct process to improve search results relevance.

These are the practices I am following, I would really appreciate any
comments about them and any hints about what practices you follow in your
projects:

- In order to have a measure of search quality I have written many test
cases such as "if the user searches for <<nike sport watch>> the search
result should display at least four <<tom tom>> products with the words
<<nike>> and <<sportwatch>> in the title". I have written a tool that read
such tests from json files and applies them to my applications, and then
counts the number of results that does not match the criterias stated in
the test cases. (for those interested this tool is available at
https://github.com/gibri/kelvin but it is still quite a prototype)

- I use this count as a quality index, I tried various times to change the
edismax weight to lower the whole number of error, or to add new
filters/boostings to the application to try to decrease the error count.

- The pros of this is that at least you have a number to look at, and that
you have a quick way of checking the impact of a modification.

- The bad side is that you have to maintain the test cases: now I have
about 800 tests and my product catalogue changes often, this implies that
some products exits the catalog, and some test cases cant pass anymore.

- I am populating the test cases using errors reported from users, and I
feel that this is driving the test cases too much toward pathologic cases.
An more over I haven't many test for cases that are working well now.

I would like to use search logs as drivers to generate tests, but I feel I
haven't picked the right path. Using top queries, manually reviewing
results, and then writing tests is a slow process; moreover many top
queries are ambiguous or are driven by site ads.

Many many queries are unique per users. How to deal with these cases?

How are you using your log to find out test cases to fix? Are you looking
for queries where the user is not "opening" any returned results? Which kpi
have you chosen to find out query that are not providing good results? And
what are you using as kpi for the whole search, beside the conversion rate?

Can you suggest me any other practices you are using on your projects?

Thank you very much in advance

Giovanni

Reply via email to