Smalyshev added a comment.

You could make a list, but that sounds like a maintenance burden.

Precisely.

I would just index all ExternalId's

I also think this is the best way.

make the index assume the entries are distinct and unique, so when it encounters a second Wikidata item with the same external ID, it just overwrites it

Index can't do that unfortunately. I don't even think there's such thing as unique field in ElasticSearch - the only field that is unique is the document ID.

So if you have two items with the same external ID, the search will find them both. Now if you build some service on top of it (like special page) it can interpret the search results and resolve the collision. But I see no way to not have duplicates in the search index if they are in the data.


TASK DETAIL
https://phabricator.wikimedia.org/T99899

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Lydia_Pintscher, PokestarFan, Josve05a, Herzi.Pinki, hoo, Jarekt, Multichill, Acer, Liuxinyu970226, Agabi10, -jem-, thiemowmde, Magnus, Jane023, Spage, Smalyshev, Bene, Ricordisamoa, Addshore, jeremyb, Aklapper, daniel, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to