Igorkim78 removed Igorkim78 as the assignee of this task.
TASK DETAIL
https://phabricator.wikimedia.org/T229655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, Invadibot, MPhamWMF
Igorkim78 added a comment.
If you will consider changing collator configuration, note, that collator
type should NOT be changed from the default value ICU:
com.bigdata.btree.keys.KeyBuilder.collator=ICU
There are collator type options JDK and ASCII, but both would not be usable,
as JDK
Igorkim78 added a comment.
@Aklapper , Thank you! Fixed the commit message.
TASK DETAIL
https://phabricator.wikimedia.org/T236663
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Addshore, Aklapper, Igorkim78, Gehel, Un1tY, Hook696
Igorkim78 added a comment.
Changeset is https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/532373/
TASK DETAIL
https://phabricator.wikimedia.org/T236663
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Addshore, Aklapper
Igorkim78 added a comment.
The issue caused by a combination of Service node producing variable
?coDescription, which is not explicitely defined in the main query, so
optimizers assume this variable not bound and do not bother with proper order
of the lang function evaluation. Fixing might
Igorkim78 added a comment.
Performance measured on dump from 20191202:
https://dumps.wikimedia.org/wikidatawiki/entities/20191202/
Baseline tIme to load: 4264m29.914s, 714218864640 bytes
Improvements proposed:
1. One-path loading (when data is loaded into SPO index only and POS
Igorkim78 added a comment.
The configuration changes for SDC data are as follows (note that namespace
'sdc' is used to store RDF data in blazegraph journal, might be changed as
needed):
- Blazegraph journal config (RWStore.properties)
replace the similar configuration for WDQS
Igorkim78 added a comment.
We need statistics on how many triples use bnode as an object:
{code}
select ?p (count(*)as ?cnt) {
?s ?p ?o .
filter (isBlank(?o))
}
group by ?p
{code}
and as a subject (if any)
{code}
select ?p (count(*)as ?cnt) {
?s ?p ?o
Igorkim78 added a comment.
output of
iostat -x 1
and
sudo iotop
?
TASK DETAIL
https://phabricator.wikimedia.org/T231411
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper
Igorkim78 added a comment.
Are there thread dumps from Blazegraph available?
What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs?
How many per minute/hour?
TASK DETAIL
https://phabricator.wikimedia.org/T231411
EMAIL PREFERENCES
https://phabricator.wikimedia.org
Igorkim78 added a subtask: T238555: Create endpoint to extract low level data
for a list of entity IDs..
TASK DETAIL
https://phabricator.wikimedia.org/T231411
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel
Igorkim78 added a parent task: T231411: Test new Updater service.
TASK DETAIL
https://phabricator.wikimedia.org/T238555
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712
Igorkim78 added a subtask: T238557: Allow for logging recently updated entities.
TASK DETAIL
https://phabricator.wikimedia.org/T231411
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper
Igorkim78 added a parent task: T231411: Test new Updater service.
TASK DETAIL
https://phabricator.wikimedia.org/T238557
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Mathew.onipe, Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde
Igorkim78 added a project: Wikidata-Query-Service.
Igorkim78 added a comment.
Restricted Application added a project: Wikidata.
Thanks! Yes it is Wikidata-Query-Service
TASK DETAIL
https://phabricator.wikimedia.org/T238557
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
Igorkim78 added a project: Wikidata-Query-Service.
Igorkim78 added a comment.
Restricted Application added a project: Wikidata.
Thanks, yes it is Wikidata-Query-Service
TASK DETAIL
https://phabricator.wikimedia.org/T238555
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
Igorkim78 added a comment.
Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are
actually used (~5% of space unused).
While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used
(~47% of space unused).
Most of the %FileWaste or reserved for 8K
Igorkim78 updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T234968
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: JAllemandou, Mathew.onipe, dcausse, Igorkim78, Aklapper, darthmon_wmde,
DannyS712, Nandana
Igorkim78 added a comment.
Added link to the task T236251 <https://phabricator.wikimedia.org/T236251>:
Add header returning time millis to first solution similar to TTFB measured in
Blazegraph.
The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while
analyzin
Igorkim78 added a comment.
The LabelService optimizer was fixed (so it will not throw NPEs) this August,
by reusing Blazegraph core utility
com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an
introspection on variables used in filters and other clauses, so
Igorkim78 claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T231411
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorkim78
Cc: Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen,
Abbe98, Smalyshev, darthmon_wmde
Igorkim78 added a comment.
There is a context param queryTimeout set to 10 minutes in web.xml, which is
applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x
times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you
might apply it locally (or just
Igorkim78 added a comment.
These characters are indeed mapped to the same term in the DB.
SELECT ( ConstantNode(TermId(1415304733L)[⓬]) AS VarNode(negativeCircled) )
( ConstantNode(TermId(1415304733L)[⑫]) AS VarNode(circled) )
Blazegraph uses ICU collation as default key builder
Igorkim78 added a comment.
Differences in bnodes might be tolerated with additional replacement. The
cleanup stage could be merged with initial sed+sort
zcat wikidata.jnl.1.data.gz | sed 's/ : .*$//;s/_:t[^,>]*/bnode/g' | grep
-v http://wikiba.se/ontology#timestamp | sort | g
Igorkim78 added a comment.
Looking at query exetution plans, ProjectionOp for the query with lang() for
coDescription got arranged prior to materialization of coDescription, so it
(along with its lang) has not got the way to the projection. The reason for
such behavior needs some more
Igorkim78 added a comment.
Fixed optional support and added testcase for that code path.
Service projectedVars actually include both inbound and outbound variables
(those which are params for the service and those which are produced by labels
lookup. But for the check if service node could
Igorkim78 added a comment.
The idea for the change is to replace runLast hint with more complicated
logic. So there are 3 steps:
- first 'most probable optimal' placement to allow for
EmptyLabelServiceOptimizer to see the variables to process.
- then EmptyLabelServiceOptimizer adds
Igorkim78 added a comment.
The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext,
StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection
from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping
statement pattern of the LabelService
Igorkim78 added a comment.
Additionally tested configuration option with only Raw records disabled,
comparing to original baseline:
- takes 1.7% more time, produces journal of 9.2% less bytes, 77% less
allocations with their overall size 38.9% less, though the are 2.9% more blobs
Igorkim78 added a comment.
Configuration options are assigned in RWStore.properties. Particular options
are:
- Inlined Value and Reference URIs:
>
com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory=org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory$V
Igorkim78 added a comment.
This seems to be optimizers order problem.
CompareBOp executes to check if "Ada"@en equals to ?langLabel several times
but the ?langLabel is not bound on all occasions:
while running **//ASTDeferredIVResolution//**
whi
Igorkim78 added a comment.
Complete test logs attached F28854747: logs.zip
<https://phabricator.wikimedia.org/F28854747>
TASK DETAIL
https://phabricator.wikimedia.org/T213375
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Igorki
Igorkim78 added a comment.
Load performance for the tested configurations on isolated environment
(i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO)
F28854691: Load performance.png <https://phabricator.wikimedia.org/F28854691>
Query performance on simple queries (select
Igorkim78 added a comment.
Attached results of the load 100 ttl.gz files with different configurations
F28854613: Results.xls <https://phabricator.wikimedia.org/F28854613>
- original baseline (commit blazegraph
895a4f3bd003ddb4b1f31257f642ca3616bca79b
<https://phabricator.wiki
Igorkim78 claimed this task.
Igorkim78 added a comment.
Changeset created to support reference URIs inlining:
https://gerrit.wikimedia.org/r/#/c/wikidata/query/blazegraph/+/505642
Baseline collected for performance test:
Data files loaded: 100 ttl gz files into an empty journal
35 matches
Mail list logo