Hi!

> Cparle wants to make sure that people searching for "clarinet" also get
> shown images of "piccolo clarinet" etc.
> 
> To make this possible, where an image has been tagged "basset horn" he
> is therefore looking to add "clarinet" as an additional keyword, so that
> if somebody types "clarinet" into the search box, one of the images
> retrieved by ElasticSearch will be the basset horn one.

Generally if the image is tagged with "basset horn" and the user query
is "clarinet", we can do one of the following:

1. Index all upstream-hierarchy for "basset horn" (presumably we would
have to cut off when it gets too deep or too abstract) and then match
directly when searching.

2. Expand hierarchy down-stream from "clarinet" and then match against
search index.

3. Have some manual or automatic process that ensures that both
"clarinet" and "basset horn" are indexed (not necessarily at once) and
rely on it to discover the matches.

The problem with (1) is that if hierarchy changes, we will have to do
huge number of updates which might overwhelm the system, and most of
these updates would be not even for things people search for, but we
have no way to know that.

The problem with (2) is that downstream hierarchies explode very fast,
and if you search for "clarinet" and there are 10000 descendants in
these hierarchies, we can't search for all of them, so you may never get
a chance to find the basset horn. Also, of course, querying big
downstream hierarchies takes time too, which means performance hit.

-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to