| Smalyshev added a comment. |
OK, looking at current usage, there are only 21 string properties with more than 100K values. Looking at them in particular, the interesting ones are:
HomoloGene ID (P593) - probably should be external ID. There are more like this, with less usage.
Over a million usages:
- page(s) (P304) - 15332300 items.
- volume (P478) - 15288265 items.
- issue (P433) - 13757879 items
These are mostly used for scientific articles and IMO useless for search. We may want to exclude them (not sure about volume/issue but if we want to do bibliographical searches we probably need to have more robust model anyway).
- taxon name (P225) - 2480324
- Commons category (P373) - 2122490
These might be actually useful for searches.
The rest have much lesser usage, and even though some of them may also be useless for searches, adding those won't be that big of a deal.
Also, I am a bit concerned about properties like Wikidata SPARQL query equivalent (P3921) - should we have size limits on property value? I don't want to have 2K of text in the index there, not because it would hurt the index (probably not) but because it's useless - nobody is going to search for such value.
Cc: Liuxinyu970226, Smalyshev, debt, aude, Lydia_Pintscher, Aklapper, Multichill, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, FloNight, Wikidata-bugs, jayvdb, Mbch331, jeremyb
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
