[Wikitech-l] Re: Word embeddings / vector search

Thiemo Kreuz Tue, 09 May 2023 00:28:36 -0700

I'm curious what the actual question is. The basic concepts are
studied for about 60 years, and are in use for about 20 to 30 years.
One particular detail the industry apparently needs to re-learn every
time is how easily such vector spaces encode and reproduce any
existing bias, racism, phobia, and so on, and how hard it is to raise
awareness, despite doing something about it.


That said, the Elasticsearch technology we currently use on Wikimedia
infrastructure in version 7.10.x is already responding to the current
machine learning hype cycle.

https://www.elastic.co/de/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0
https://en.wikipedia.org/wiki/Special:Version

We certainly need to update some day, but I think nobody is actively
working on this at the moment. However, the topic appears in the
currently discussed annual plan. The responsible Search Platform team
is also quite active and monitors a good selection of communication
channels, including a separate mailing list.

https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/Product_%26_Technology#Objectives
https://wikitech.wikimedia.org/wiki/Search_Platform/Contact#Office_Hours

Kind regards
Thiemo
_______________________________________________
Wikitech-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Word embeddings / vector search

Reply via email to