[Wikidata-bugs] [Maniphest] T299460: Evaluate Apache Jena

2022-02-06 Thread Thadguidry
Thadguidry added a comment. Hi @AndySeaborne What is the latest benchmarks for loading Wikidata all and truthy with Jena 4.4.0 release annd the new TDB2 xloader with "--threads" argument? I noticed the release notes said this: > == Improved bulk loader > > This r

[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph

2021-11-19 Thread Thadguidry
Thadguidry added a comment. @BenAtOlive I think for bikeshedding or hand-waving discussions, you can just start an new discussion thread in Oxigraph's GitHub Discussions (not Issues). Here: https://github.com/oxigraph/oxigraph/discussions TASK DETAIL https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph

2021-11-19 Thread Thadguidry
Thadguidry added a comment. As someone who has "been there, done that" (even with Apache Geode)... I can tell you that **data locality** is very important when you want to maximize performance. But if the data is maintained as distributed, then the only way to squeeze ou

[Wikidata-bugs] [Maniphest] T220823: Use ElasticSearch for bulk Wikidata entity term lookup

2021-09-15 Thread Thadguidry
Thadguidry added a comment. @Addshore That's what I figured. :-) This issue did feel old and sort of in a dustbin. Agree it should be closed. TASK DETAIL https://phabricator.wikimedia.org/T220823 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph

2021-08-31 Thread Thadguidry
Thadguidry added a comment. @Tpt Looks great! The ROADMAP file was a suggested alternative to the Milestones, sorry didn't make that clear. I much prefer grouping or tagging issues against Milestones as you have done! You have the right idea regarding a single source of truth and ex

[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph

2021-08-26 Thread Thadguidry
Thadguidry added a comment. Hi @Tpt Can you elaborate more in your Milestones and create more Milestone as necessary for your future vision? Like what you mean by "no storage format stability for now", and what that really means to users and what you are thinking about in the

[Wikidata-bugs] [Maniphest] T289428: U+002C comma is not being excluded by default in simple search input box for CirrusSearch

2021-08-22 Thread Thadguidry
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T289428 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Aklapper, Thadguidry, Invadibot, MPhamWMF, maantietaja, Wilmanbeno, CBogen, Akuckartz

[Wikidata-bugs] [Maniphest] T289428: U+002C comma is not being excluded by default in simple search input box for CirrusSearch

2021-08-22 Thread Thadguidry
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T289428 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Aklapper, Thadguidry, Invadibot, MPhamWMF, maantietaja, Wilmanbeno, CBogen, Akuckartz

[Wikidata-bugs] [Maniphest] T289428: U+002C comma is not being excluded by default in simple search input box for CirrusSearch

2021-08-22 Thread Thadguidry
Thadguidry created this task. Thadguidry added projects: Wikidata, CirrusSearch, Elasticsearch. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Discovery-Search. TASK DESCRIPTION (Lydia asked that I write this up, just in case) I thought that

[Wikidata-bugs] [Maniphest] T220823: Use ElasticSearch for bulk Wikidata entity term lookup

2021-08-22 Thread Thadguidry
Thadguidry added a comment. I'd suggest adding **replica shards** (copies of primary shards) that help to both ensure redundancy to protect against failure, but they also vastly increase the capacity for read requests such as searching, like Adam's entity term lookup use case

[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph

2021-08-18 Thread Thadguidry
Thadguidry added subscribers: Tpt, Thadguidry. Thadguidry added a comment. +1 for Oxigraph. @TPT has been putting in a ton of good effort, research, features, and stability. Sponsoring him now in GitHub as well for his effort. As it's being developed in Rust, it automatically

[Wikidata-bugs] [Maniphest] T210961: Add a rank for outdated but correct data

2021-08-16 Thread Thadguidry
Thadguidry added a comment. We'll also want to improve the Help:Ranking page <https://www.wikidata.org/wiki/Help:Ranking#Deprecated_rank> once this proposal task is implemented. TASK DETAIL https://phabricator.wikimedia.org/T210961 EMAIL PREFERENCES https://phabricator.wi

[Wikidata-bugs] [Maniphest] T210961: Add a rank for outdated but correct data

2021-08-16 Thread Thadguidry
Thadguidry added a comment. Agree generally on this proposals' assertions. It makes sense to from a data quality perspective, and since we are actively adding new tools to improve our data quality, then having a new "outdated" rank to represent a "once upon a time thi

[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-08-10 Thread Thadguidry
Thadguidry added a comment. Hi @aidhog Aidan in my opinion I would say "NO, not a good test-case for this need". And the only reason is this... it's ASCII only (chars <128) and doesn't let us unsure proper load handling for all data in all languages, multilingual dat

[Wikidata-bugs] [Maniphest] T285795: Limit languages on EntityStub rdf builders

2021-08-02 Thread Thadguidry
Thadguidry added a comment. Someone needs to add a Documentation task to this. I assume all the new options available and perhaps a reference link to this ticket would go somewhere in here? https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format TASK DETAIL https

[Wikidata-bugs] [Maniphest] T219037: Display constraint clarifications in violation messages

2021-07-10 Thread Thadguidry
Thadguidry added a comment. I'd like to see this made a bit higher priority? It seems it would be fairly trivial to implement with a good impact. TASK DETAIL https://phabricator.wikimedia.org/T219037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailprefer

[Wikidata-bugs] [Maniphest] T236493: Adding a new lexeme should constraint languages form to languages

2021-01-09 Thread Thadguidry
Thadguidry added a comment. To Reproduce: 1. Create a new Lexeme 2. **Lemma:** type `chevrette` 3. **Language of Lemma:** type `cajun` and look at dropdown listing 4. Notice that `Louisiana French` Q3083213 is at the bottom of dropdown list instead of top of list. TASK DETAIL

[Wikidata-bugs] [Maniphest] T266212: improve Wikidata autocomplete service

2020-10-22 Thread Thadguidry
Thadguidry added a comment. In Freebase, we offered word, phrase, and full (exact match). I think the wbsearchentities API could offer something similar, although with a slight cost of indexing. Besides `name` we also supported `alias{full}`. Using alias: matched both name and aliases

[Wikidata-bugs] [Maniphest] T238362: Blazegraph write performance tuning

2020-10-11 Thread Thadguidry
Thadguidry added a comment. @Gehel Hi Guillaume Isn't the streaming updater work done now by @dcausse ? Is it time for your tuning engineers to revisit some of this or not really? TASK DETAIL https://phabricator.wikimedia.org/T238362 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T258687: The streaming updater should read its events from multiple DC streams

2020-09-02 Thread Thadguidry
Thadguidry added a comment. @dcausse Dunno if this might help but could a simple window help or where you use KeyedProcessFunction <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html> on a KeyedStream? If the stream is unkeyed (or initia

[Wikidata-bugs] [Maniphest] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-09-02 Thread Thadguidry
Thadguidry added a comment. > - the output of this is a simple event without any data saying: do a diff between rev X and Y, fully delete entity QXYZ, ... Is that supposed to be "data saving" ? > rdf diff generation: materialize the command and fetch the data from w

[Wikidata-bugs] [Maniphest] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-09-02 Thread Thadguidry
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: tfmorris, revi, Mholloway, Ladsgroup, Multichill, darthmon_wmde, Iamamz3, Smalyshev

[Wikidata-bugs] [Maniphest] T203643: Sometimes Special:MergeLexemes gives summary on target lexeme, and sometimes not

2020-08-22 Thread Thadguidry
Thadguidry added a parent task: T261049: Propagate the error to UX for merge failure when Lemma's do not exactly match. . TASK DETAIL https://phabricator.wikimedia.org/T203643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lea_Lacroix

[Wikidata-bugs] [Maniphest] T261049: Propagate the error to UX for merge failure when Lemma's do not exactly match.

2020-08-22 Thread Thadguidry
Thadguidry added a subtask: T203643: Sometimes Special:MergeLexemes gives summary on target lexeme, and sometimes not. TASK DETAIL https://phabricator.wikimedia.org/T261049 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: VIGNERON

[Wikidata-bugs] [Maniphest] T261049: Propagate the error to UX for merge failure when Lemma's do not exactly match.

2020-08-22 Thread Thadguidry
Thadguidry created this task. Thadguidry added a project: Wikidata Lexicographical data. Restricted Application added a project: Wikidata. TASK DESCRIPTION **BUG:** Merge dialog shows continuing...but no error is given to the user when trying to merge lemma that do not match exactly. Try

[Wikidata-bugs] [Maniphest] [Commented On] T249868: take into account additional Properties for mapping to schema.org

2020-05-07 Thread Thadguidry
Thadguidry added a comment. If it helps or is needed, the query that you can use is here: SELECT ?wd ?wdLabel ?corrName ?schema { values (?corr ?corrName) {(wdt:P2235 "superProp") (wdt:P2236 "subProp") (wdt:P1628 "equivProp") (wdt:

[Wikidata-bugs] [Maniphest] [Updated] T249868: take into account additional Properties for mapping to schema.org

2020-04-09 Thread Thadguidry
Thadguidry added a comment. @Lydia_Pintscher Oops! You forgot to include the main one also !!! Equivalent Property P1628 <https://phabricator.wikimedia.org/P1628> :-) TASK DETAIL https://phabricator.wikimedia.org/T249868 EMAIL PREFERENCES https://phabricator.wikimed

[Wikidata-bugs] [Maniphest] [Commented On] T214884: linking Schemas in statements

2020-02-10 Thread Thadguidry
Thadguidry added a comment. Is there anything inherently wrong or technically infeasible or undesirable, if an id used 2 letters? ES45 versus E45 <https://phabricator.wikimedia.org/E45> ? TASK DETAIL https://phabricator.wikimedia.org/T214884 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Edited] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-12 Thread Thadguidry
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony, Prisshahlla, Lahi

[Wikidata-bugs] [Maniphest] [Commented On] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-12 Thread Thadguidry
Thadguidry added a comment. Thanks, updated ticket. TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony

[Wikidata-bugs] [Maniphest] [Edited] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-07 Thread Thadguidry
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony, Prisshahlla, Lahi

[Wikidata-bugs] [Maniphest] [Commented On] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-07 Thread Thadguidry
Thadguidry added a comment. @dcausse Yes, I mean running a full text search. Fulltext searches are cheap when you index terms in multiple ways. Why would you not want to index terms in multiple ways? Freebase was able to leverage this quite easily with Lucene/Solr indexes and provided

[Wikidata-bugs] [Maniphest] [Commented On] T214884: linking Schemas in statements

2019-10-29 Thread Thadguidry
Thadguidry added a comment. TODO: Just wanted to highlight that once decisions are made... please ensure to update the Glossary item <https://www.wikidata.org/wiki/Wikidata:Glossary> ! Currently it reads: > EntitySchema is a special type of Wikidata page containing a document

[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata

2019-07-08 Thread Thadguidry
Thadguidry added a comment. @dbarratt in the Wikibase ontology I could not find those properties in the OWL document returned. Sorry, I'm getting caught up with your schema layouts as fast as I can :-) I expected my parser to retrieve information about their description, range, domai

[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata

2019-07-08 Thread Thadguidry
Thadguidry added a comment. Something is amiss with these...not found. "wikibase": "http://wikiba.se/ontology#";, "statements": { "@id": "wikibase:statements" }, "ident