[Wikidata-bugs] [Maniphest] T229655: bad interaction of lang() with wikibase:label

2022-02-07 Thread Igorkim78
Igorkim78 removed Igorkim78 as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T229655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, Invadibot, MPhamWMF

[Wikidata-bugs] [Maniphest] T233204: Mixup of unicode characters in Query Service

2020-11-03 Thread Igorkim78
Igorkim78 added a comment. If you will consider changing collator configuration, note, that collator type should NOT be changed from the default value ICU: com.bigdata.btree.keys.KeyBuilder.collator=ICU There are collator type options JDK and ASCII, but both would not be usable, as JDK

[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph

2020-01-31 Thread Igorkim78
Igorkim78 added a comment. @Aklapper , Thank you! Fixed the commit message. TASK DETAIL https://phabricator.wikimedia.org/T236663 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Addshore, Aklapper, Igorkim78, Gehel, Un1tY, Hook696

[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph

2020-01-30 Thread Igorkim78
Igorkim78 added a comment. Changeset is https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/532373/ TASK DETAIL https://phabricator.wikimedia.org/T236663 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Addshore, Aklapper

[Wikidata-bugs] [Maniphest] [Commented On] T229655: bad interaction of lang() with wikibase:label

2020-01-30 Thread Igorkim78
Igorkim78 added a comment. The issue caused by a combination of Service node producing variable ?coDescription, which is not explicitely defined in the main query, so optimizers assume this variable not bound and do not bother with proper order of the lang function evaluation. Fixing might

[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph

2020-01-16 Thread Igorkim78
Igorkim78 added a comment. Performance measured on dump from 20191202: https://dumps.wikimedia.org/wikidatawiki/entities/20191202/ Baseline tIme to load: 4264m29.914s, 714218864640 bytes Improvements proposed: 1. One-path loading (when data is loaded into SPO index only and POS

[Wikidata-bugs] [Maniphest] [Updated] T237089: Create CQS puppet configs by applying query_service module

2019-12-23 Thread Igorkim78
Igorkim78 added a comment. The configuration changes for SDC data are as follows (note that namespace 'sdc' is used to store RDF data in blazegraph journal, might be changed as needed): - Blazegraph journal config (RWStore.properties) replace the similar configuration for WDQS

[Wikidata-bugs] [Maniphest] [Commented On] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-03 Thread Igorkim78
Igorkim78 added a comment. We need statistics on how many triples use bnode as an object: {code} select ?p (count(*)as ?cnt) { ?s ?p ?o . filter (isBlank(?o)) } group by ?p {code} and as a subject (if any) {code} select ?p (count(*)as ?cnt) { ?s ?p ?o

[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service

2019-11-19 Thread Igorkim78
Igorkim78 added a comment. output of iostat -x 1 and sudo iotop ? TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper

[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service

2019-11-19 Thread Igorkim78
Igorkim78 added a comment. Are there thread dumps from Blazegraph available? What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs? How many per minute/hour? TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Updated] T231411: Test new Updater service

2019-11-18 Thread Igorkim78
Igorkim78 added a subtask: T238555: Create endpoint to extract low level data for a list of entity IDs.. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel

[Wikidata-bugs] [Maniphest] [Updated] T238555: Create endpoint to extract low level data for a list of entity IDs.

2019-11-18 Thread Igorkim78
Igorkim78 added a parent task: T231411: Test new Updater service. TASK DETAIL https://phabricator.wikimedia.org/T238555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712

[Wikidata-bugs] [Maniphest] [Updated] T231411: Test new Updater service

2019-11-18 Thread Igorkim78
Igorkim78 added a subtask: T238557: Allow for logging recently updated entities. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper

[Wikidata-bugs] [Maniphest] [Updated] T238557: Allow for logging recently updated entities

2019-11-18 Thread Igorkim78
Igorkim78 added a parent task: T231411: Test new Updater service. TASK DETAIL https://phabricator.wikimedia.org/T238557 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Mathew.onipe, Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde

[Wikidata-bugs] [Maniphest] [Updated] T238557: Allow for logging recently updated entities

2019-11-18 Thread Igorkim78
Igorkim78 added a project: Wikidata-Query-Service. Igorkim78 added a comment. Restricted Application added a project: Wikidata. Thanks! Yes it is Wikidata-Query-Service TASK DETAIL https://phabricator.wikimedia.org/T238557 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Updated] T238555: Create endpoint to extract low level data for a list of entity IDs.

2019-11-18 Thread Igorkim78
Igorkim78 added a project: Wikidata-Query-Service. Igorkim78 added a comment. Restricted Application added a project: Wikidata. Thanks, yes it is Wikidata-Query-Service TASK DETAIL https://phabricator.wikimedia.org/T238555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T238232: blazegraph journal on wdqs1005 is oversized

2019-11-13 Thread Igorkim78
Igorkim78 added a comment. Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are actually used (~5% of space unused). While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used (~47% of space unused). Most of the %FileWaste or reserved for 8K

[Wikidata-bugs] [Maniphest] [Edited] T234968: Measure performance impact of code optimization and/or blazegraph settings on real traffic data

2019-10-23 Thread Igorkim78
Igorkim78 updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T234968 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: JAllemandou, Mathew.onipe, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana

[Wikidata-bugs] [Maniphest] [Updated] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-10-23 Thread Igorkim78
Igorkim78 added a comment. Added link to the task T236251 <https://phabricator.wikimedia.org/T236251>: Add header returning time millis to first solution similar to TTFB measured in Blazegraph. The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while analyzin

[Wikidata-bugs] [Maniphest] [Commented On] T235540: SPARQL query causes StackOverflowError and fails to execute

2019-10-16 Thread Igorkim78
Igorkim78 added a comment. The LabelService optimizer was fixed (so it will not throw NPEs) this August, by reusing Blazegraph core utility com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an introspection on variables used in filters and other clauses, so

[Wikidata-bugs] [Maniphest] [Claimed] T231411: Test new Updater service

2019-10-09 Thread Igorkim78
Igorkim78 claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, darthmon_wmde

[Wikidata-bugs] [Maniphest] [Commented On] T227365: WDQS/Blazegraph data loading has timeout

2019-10-07 Thread Igorkim78
Igorkim78 added a comment. There is a context param queryTimeout set to 10 minutes in web.xml, which is applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you might apply it locally (or just

[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-30 Thread Igorkim78
Igorkim78 added a comment. These characters are indeed mapped to the same term in the DB. SELECT ( ConstantNode(TermId(1415304733L)[⓬]) AS VarNode(negativeCircled) ) ( ConstantNode(TermId(1415304733L)[⑫]) AS VarNode(circled) ) Blazegraph uses ICU collation as default key builder

[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service

2019-08-29 Thread Igorkim78
Igorkim78 added a comment. Differences in bnodes might be tolerated with additional replacement. The cleanup stage could be merged with initial sed+sort zcat wikidata.jnl.1.data.gz | sed 's/ : .*$//;s/_:t[^,>]*/bnode/g' | grep -v http://wikiba.se/ontology#timestamp | sort | g

[Wikidata-bugs] [Maniphest] [Commented On] T229655: bad interaction of lang() with wikibase:label

2019-08-02 Thread Igorkim78
Igorkim78 added a comment. Looking at query exetution plans, ProjectionOp for the query with lang() for coDescription got arranged prior to materialization of coDescription, so it (along with its lang) has not got the way to the projection. The reason for such behavior needs some more

[Wikidata-bugs] [Maniphest] [Commented On] T175840: Using label service twice in one query results in obscure error message

2019-07-01 Thread Igorkim78
Igorkim78 added a comment. Fixed optional support and added testcase for that code path. Service projectedVars actually include both inbound and outbound variables (those which are params for the service and those which are produced by labels lookup. But for the check if service node could

[Wikidata-bugs] [Maniphest] [Commented On] T175840: Using label service twice in one query results in obscure error message

2019-06-25 Thread Igorkim78
Igorkim78 added a comment. The idea for the change is to replace runLast hint with more complicated logic. So there are 3 steps: - first 'most probable optimal' placement to allow for EmptyLabelServiceOptimizer to see the variables to process. - then EmptyLabelServiceOptimizer adds

[Wikidata-bugs] [Maniphest] [Updated] T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected

2019-05-07 Thread Igorkim78
Igorkim78 added a comment. The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping statement pattern of the LabelService

[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-05-06 Thread Igorkim78
Igorkim78 added a comment. Additionally tested configuration option with only Raw records disabled, comparing to original baseline: - takes 1.7% more time, produces journal of 9.2% less bytes, 77% less allocations with their overall size 38.9% less, though the are 2.9% more blobs

[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-05-06 Thread Igorkim78
Igorkim78 added a comment. Configuration options are assigned in RWStore.properties. Particular options are: - Inlined Value and Reference URIs: > com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory=org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory$V

[Wikidata-bugs] [Maniphest] [Commented On] T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected

2019-05-06 Thread Igorkim78
Igorkim78 added a comment. This seems to be optimizers order problem. CompareBOp executes to check if "Ada"@en equals to ?langLabel several times but the ?langLabel is not bound on all occasions: while running **//ASTDeferredIVResolution//** whi

[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-04-29 Thread Igorkim78
Igorkim78 added a comment. Complete test logs attached F28854747: logs.zip <https://phabricator.wikimedia.org/F28854747> TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorki

[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-04-29 Thread Igorkim78
Igorkim78 added a comment. Load performance for the tested configurations on isolated environment (i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO) F28854691: Load performance.png <https://phabricator.wikimedia.org/F28854691> Query performance on simple queries (select

[Wikidata-bugs] [Maniphest] [Updated] T213375: Inline value and reference URIs

2019-04-29 Thread Igorkim78
Igorkim78 added a comment. Attached results of the load 100 ttl.gz files with different configurations F28854613: Results.xls <https://phabricator.wikimedia.org/F28854613> - original baseline (commit blazegraph 895a4f3bd003ddb4b1f31257f642ca3616bca79b <https://phabricator.wiki

[Wikidata-bugs] [Maniphest] [Claimed] T213375: Inline value and reference URIs

2019-04-22 Thread Igorkim78
Igorkim78 claimed this task. Igorkim78 added a comment. Changeset created to support reference URIs inlining: https://gerrit.wikimedia.org/r/#/c/wikidata/query/blazegraph/+/505642 Baseline collected for performance test: Data files loaded: 100 ttl gz files into an empty journal