[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-06-27T20:02:06Z] applied fix for T197447 to eqiad wdqs cluster, which involved restart of the servicesTASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Stashbot, MisterSynergy, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Smalyshev added a comment. Applied the temp fix for wdqs2001 and wdqs2002. Seemt to be working. I'll let them to run for a bit with it, if I don't see anything weird, I'll apply it to the rest of the servers.TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Stashbot, MisterSynergy, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-06-26T05:28:57Z] testing fix for T197447 on wdqs1009TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Stashbot, MisterSynergy, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Smalyshev added a comment. Values affected: 4698 056X 3227 156X 5154 1895 5328 9611 7896 3086 0003 6772 0443 5661 6438 8043 5485 0003 7884 5356 0003 9447 4903TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MisterSynergy, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Smalyshev added a comment. Looks like setting option -Dcom.bigdata.btree.keys.KeyBuilder.collator.strength=Identical fixes the issue, but this requires full reindex and almost doubles the size of the keys for strings, which may have impact on space consumed. I'll see if there's a way to fix the immediate problem in a more direct way.TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MisterSynergy, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Smalyshev added a comment. Test case: insert data INSERT { " 4698 056X\u200F" . " 4698 056X" . } WHERE {} Then query: SELECT * WHERE { ?x ?y " 4698 056X" } It should only produce one result, but it produces two now.TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MisterSynergy, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark
Smalyshev added a comment. The reason seems to be that Blazegraph is using ICU collation keys, and ICU collator seems to ignore U-200F by default. We may need to do a patch to change that. Relevant code is in: https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata/src/java/com/bigdata/btree/keys/ICUSortKeyGenerator.javaTASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Gehel, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs