Multichill added a comment.

make the index assume the entries are distinct and unique, so when it encounters a second Wikidata item with the same external ID, it just overwrites it

Index can't do that unfortunately. I don't even think there's such thing as unique field in ElasticSearch - the only field that is unique is the document ID.

So if you have two items with the same external ID, the search will find them both. Now if you build some service on top of it (like special page) it can interpret the search results and resolve the collision. But I see no way to not have duplicates in the search index if they are in the data.

I would do a really simple and stupid resolving approach: Take the top result and maybe do some sorting of the results by something like popularity_score .
Might be good to avoid making yet another special page and just use the API ( /""> ). That way you probably only need a bit of _javascript_ to glue everything together. Now the _javascript_ is just hitting""> , could be changed to hit the search if someone starts with P<some integer>:something . Or you could expand wbsearchentities to do the dirty work for you, but I think you'll get a bit of code mix up.

Probably best to split this task up in two parts:

  1. Get the ExternalId's indexed
  2. Figure out a way for the user to access it

First part is probably clear now (just index all ExternalId's), second part probably needs a bit more thought.



To: Multichill
Cc: Lydia_Pintscher, PokestarFan, Josve05a, Herzi.Pinki, hoo, Jarekt, Multichill, Acer, Liuxinyu970226, Agabi10, -jem-, thiemowmde, Magnus, Jane023, Spage, Smalyshev, Bene, Ricordisamoa, Addshore, jeremyb, Aklapper, daniel, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
Wikidata-bugs mailing list

Reply via email to