Multichill added a comment.
@Smalyshev / @debt :I think this is one of those tasks where we have a bit of a misunderstanding about scope (see https://lists.wikimedia.org/pipermail/wikidata/2018-August/012282.html ). Close this one as resolved and make clearly scoped follow up tasks to untangle
Esc3300 added a comment.
I don't see clear disadvantages of doing the indexing Multichill suggests.
I don't see any mentioned here either, besides not indexing some specify ones (page number, e.g.).
Compared to pubmed article titles, it seems at least as useful.TASK
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-05-23T16:31:01Z] starting wikidata full reindex for T163642TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc:
Multichill added a comment.
And https://www.wikidata.org/w/index.php?search=haswbstatement%3AP217%3DSK-C-5 works :-). https://www.wikidata.org/w/index.php?search="SK-C-5" doesn't work (yet?). Is that the next step?TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Smalyshev added a comment.
@Multichill checkout https://www.wikidata.org/w/index.php?title=Q219831=""> - it has the data now.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Stashbot,
Smalyshev added a comment.
Yes, edit should show it.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Stashbot, Lea_Lacroix_WMDE, gerritbot, Liuxinyu970226, Smalyshev, debt, aude, Lydia_Pintscher,
Lea_Lacroix_WMDE added a comment.
Great!TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, Lea_Lacroix_WMDECc: Stashbot, Lea_Lacroix_WMDE, gerritbot, Liuxinyu970226, Smalyshev, debt, aude,
Smalyshev added a comment.
I'll note here when the reindex is done, and then I guess you can announce :) In the meantime I can check that everything works smoothly with edited entries.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Lea_Lacroix_WMDE added a comment.
I have no deadline in mind, I was just wondering when to announce it, and if you or me should do it :)TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev,
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-05-08T23:22:45Z] Synchronized wmf-config: SWAT: [[gerrit:431994|Add string and external-id types to Wikibase indexing]] T163642 T99899 (duration: 01m 26s)TASK
Smalyshev added a comment.
@Lea_Lacroix_WMDE Also, for newly edited items it should be working as soon as wmf.3 is deployed. But for older items it will need reindex.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
gerritbot added a comment.
Change 431994 merged by jenkins-bot:
[operations/mediawiki-config@master] Add string and external-id types to Wikibase indexing
https://gerrit.wikimedia.org/r/431994TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
gerritbot added a comment.
Change 431994 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/mediawiki-config@master] Add string and external-id types to indexing
https://gerrit.wikimedia.org/r/431994TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Smalyshev added a comment.
Also, right now we can only locate by haswbstatement:P123=SK-C-5. If we want to index data without attached property IDs, we need to add different field & analyzer to do that. Should we do it?TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Smalyshev added a comment.
@Lea_Lacroix_WMDE we need to make configs that enable indexing (will be done next thing) and then we need to actually reindex. Reindexing takes several days, so I planned to do it immediately after the Hackathon, unless you need it sooner.TASK
Lea_Lacroix_WMDE added a comment.
Hey @Smalyshev, when is this going to be live?TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, Lea_Lacroix_WMDECc: Lea_Lacroix_WMDE, gerritbot, Liuxinyu970226,
gerritbot added a comment.
Change 430277 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add capability to exclude properties from by-type index
https://gerrit.wikimedia.org/r/430277TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
gerritbot added a comment.
Change 430277 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Add capability to exclude properties from by-type index
https://gerrit.wikimedia.org/r/430277TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Lydia_Pintscher added a comment.
Yeah let's leave them out for now.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_PintscherCc: Liuxinyu970226, Smalyshev, debt, aude, Lydia_Pintscher, Aklapper,
Smalyshev added a comment.
There is a size limit on string values already. I don't remember the exact limit right now. Or are you looking for something else?
I was thinking about shorter limit - not sure it makes sense to look up something by whole SPARQL query... but maybe we should just exclude
Lydia_Pintscher added a comment.
In T163642#4155642, @Smalyshev wrote:
OK, looking at current usage, there are only 21 string properties with more than 100K values. Looking at them in particular, the interesting ones are:
HomoloGene ID (P593) - probably should be external ID. There are more like
Smalyshev added a comment.
OK, looking at current usage, there are only 21 string properties with more than 100K values. Looking at them in particular, the interesting ones are:
HomoloGene ID (P593) - probably should be external ID. There are more like this, with less usage.
Over a million
Lydia_Pintscher added a comment.
In T163642#411, @Smalyshev wrote:
Also just had a thought - this does not cover qualifiers and references of course. Do we want anything there or that is already WDQS domain?
I'd say for now let's leave them out.TASK
Smalyshev added a comment.
Hmm 228 is not that bad... Let me see if I can get some usage stats.
Also just had a thought - this does not cover qualifiers and references of course. Do we want anything there or that is already WDQS domain?TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Lydia_Pintscher added a comment.
Don't have a good answer but https://www.wikidata.org/w/index.php?title=Special:ListProperties/string=500=0 has a list of all of the current string properties.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Smalyshev added a comment.
OK, so outside of external IDs covered by T99899: [Story] Looking up entities by external identifiers, which string properties we want to add to the index? I am still concerned all of them might be too much, but ready to hear other opinions.TASK
Multichill added a comment.
Viaf part is probably covered by T99899TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MultichillCc: Liuxinyu970226, Smalyshev, debt, aude, Lydia_Pintscher, Aklapper, Multichill,
Multichill added a comment.
Both "P217:ГЭ-3836" or just "ГЭ-3836" would be great.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MultichillCc: Smalyshev, debt, aude, Lydia_Pintscher, Aklapper, Multichill,
Smalyshev added a comment.
@Multichill just to be sure, if you could search for P217:ГЭ-3836, with this syntax, it would be fine? We may need to do some infrastructure work before this works properly, but it seems not too hard to implement.TASK DETAILhttps://phabricator.wikimedia.org/T163642EMAIL
Multichill added a comment.
@Smalyshev coming back to the strings. It's just like Commons. I don't use the local search. I use Google. I noticed https://www.wikidata.org/w/index.php?title=Q45962939=""> and I'm pretty sure it's a duplicate.
The item has an image with the link to the source and the
Smalyshev added a comment.
That would require indexing the external identifiers with the property
I think it should be possible. The main question that remains is - do we want to search per-property (e.g. P214:1234 for VIAF ID 1234 specifically) or just something like externalid:1234 which would
Multichill added a comment.
In T163642#3218412, @debt wrote:
This looks to be more wikidata than discovery search task at this time.
"The Discovery Department of Wikimedia Engineering has the mission to make the wealth of knowledge and content in the Wikimedia projects easily discoverable. "
32 matches
Mail list logo