dcausse added a comment.

deboosting can happen in the rescore stage, since we use a weighted sum we can either apply a negative penalty when relationship:P31:Q4167410 or a positive value when NOT relationship:P31:Q4167410.
Will we add all properties or just a set of selected properties?
Concerning cardinality of this new field it's hard to judge but I'm in favor of not over-indexing, in this case I'd be for a simple mapping like:

relationship: {
   "type": "keyword"
   "fields": {
       "type": {
               "type": "text",
               "analyzer": "split(':')[0]",
               "search_analyzer": "keyword"
       }
   }
}

In other words for P31:Q4167410 I'd keep only P31:Q4167410 and P31 as indexed terms, imo id does not make sense to index Q4167410 separately.

One possibility to avoid reindexing from mysql every-time we want to add a new property would be to create a custom analyzer where we provide a white list of properties to index.
All properties would present in the source doc but just a few selected ones would be indexed. Adding a new property would just require to update the analysis chain and perform an in-place re-index.
We then need to carefully monitor disk and terms in mem usage when whitelisting new props. Having all relationships in the source can make experimenting with relforge a bit easier, you'll just have to prepare the analysis chain on relforge and send a remote reindex api call.


TASK DETAIL
https://phabricator.wikimedia.org/T175199

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: EBernhardson, dcausse, daniel, Aklapper, Smalyshev, GoranSMilovanovic, QZanden, EBjune, Izno, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to