dcausse added a comment. |
deboosting can happen in the rescore stage, since we use a weighted sum we can either apply a negative penalty when relationship:P31:Q4167410 or a positive value when NOT relationship:P31:Q4167410.
Will we add all properties or just a set of selected properties?
Concerning cardinality of this new field it's hard to judge but I'm in favor of not over-indexing, in this case I'd be for a simple mapping like:
relationship: { "type": "keyword" "fields": { "type": { "type": "text", "analyzer": "split(':')[0]", "search_analyzer": "keyword" } } }
In other words for P31:Q4167410 I'd keep only P31:Q4167410 and P31 as indexed terms, imo id does not make sense to index Q4167410 separately.
One possibility to avoid reindexing from mysql every-time we want to add a new property would be to create a custom analyzer where we provide a white list of properties to index.
All properties would present in the source doc but just a few selected ones would be indexed. Adding a new property would just require to update the analysis chain and perform an in-place re-index.
We then need to carefully monitor disk and terms in mem usage when whitelisting new props. Having all relationships in the source can make experimenting with relforge a bit easier, you'll just have to prepare the analysis chain on relforge and send a remote reindex api call.
Cc: EBernhardson, dcausse, daniel, Aklapper, Smalyshev, GoranSMilovanovic, QZanden, EBjune, Izno, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs