[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

Smalyshev Fri, 02 Mar 2018 13:11:24 -0800

Smalyshev added a comment.

So if I look into PrefetchingTermLookup, it's a sum of TermBuffer and TermLookup, which is:

public function prefetchTerms( array $entityIds, array $termTypes = null, array $languageCodes = null )
public function getPrefetchedTerm( EntityId $entityId, $termType, $languageCode )
public function getLabel( EntityId $entityId, $languageCode );
public function getLabels( EntityId $entityId, array $languageCodes );
public function getDescription( EntityId $entityId, $languageCode );
public function getDescriptions( EntityId $entityId, array $languageCodes );

I don't see where term types are defined and how they relate to labels/descriptions - though I guess 'label' is type for labels and 'description' for descriptions, but could there be more? Is Elastic implementation supposed to support them or should throw if they're encountered? There are constants in TermIndexEntry, but I assume those are implementation details for SQL part and I should stay away from them?EntityTermLookupBase seems to just use plain strings, not constants. Is it ok to keep doing this?

Also, there's an overlap of coverage - you can use getLabel or you can use getPrefetchedTerm(... 'label' ...). When which one is supposed to be used? Should getLabel use prefetched data if available?

I see there's a bunch of code dealing with prefetching, buffering, resolving different repos etc. organizing - should all this code be reimplemented for ElasticSearch implementation of prefetchTerms or can it be somehow reused? All that code relies on TermBuffer/TermIndex now.

As I understand, now TermLookup methods are supposed to be dealing with entities from federated repos too, and also working on the WikibaseClient side where there's no repo config. However, if we want to have ElasticSearch implementation, this means we need access to configuration of target repo, and more specifically, to its CirrusSearch settings (at least basic ones like index names, cluster names, etc. - we probably don't need search profiles for just fetching labels). I am not sure those are easily available. So do we need this capability for ElasticSearch implementation? If yes, we'd need to figure out how to get the right configs... Or we could duplicate config in client configs - that would probably work for index names - which are also very conventional and derived from wiki names, so we could also create them probably - but connection settings still need to come from somewhere, unless we assume everybody uses the same endpoint. Which is I think the case in production, but not sure how safe it is to rely on it.

TASK DETAIL

https://phabricator.wikimedia.org/T143706

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: WMDE-leszek, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

Reply via email to