Smalyshev created this task.
Smalyshev added subscribers: Smalyshev, Manybubbles, Haasepeter, Beebs.systap, 
Thompsonbry.systap, Thompsonbry.
Smalyshev added projects: Search-Team, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  For updating the data in the graph, we use a query that deletes the old data. 
The query to delete reference values that are not used by any other statements, 
and it looks like this, with SELECT replaced with DELETE:
  
  ```
  SELECT ?s ?p ?o
  WHERE {
    <http://www.wikidata.org/entity/Q30> ?statementPred ?statement .
    FILTER( STRSTARTS(STR(?statement), 
"http://www.wikidata.org/entity/statement/";) ) .
    ?statement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
    # Since references are shared we can only clear the values on them when 
they are no longer used
    # anywhere else.
    FILTER NOT EXISTS {
      ?otherStatement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
      ?otherEntity ?otherStatementPred ?otherStatement .
      FILTER ( ?otherEntity != <http://www.wikidata.org/entity/Q23>  ) .
    }
    ?ref ?expandedValuePred ?s .
    # Without this filter we'd try to delete stuff from entities.  For example 
that pattern above matches
    #   ref:_ v:P143 entity:Q328
    # so we'd try to clear everything from Q328 (enwiki).  So we filter where 
?s is in the value prefix.
    FILTER( STRSTARTS(STR(?s), "http://www.wikidata.org/entity/value/";) ) .
    ?s ?p ?o .
  }
  ```
  
  This query is very slow (had to kill it after 1+ minute). However, the query 
without FILTER NOT EXISTS runs under a second:
  
  ```
  SELECT ?s ?p ?o
  WHERE {
    <http://www.wikidata.org/entity/Q30> ?statementPred ?statement .
    FILTER( STRSTARTS(STR(?statement), 
"http://www.wikidata.org/entity/statement/";) ) .
    ?statement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
    ?ref ?expandedValuePred ?s .
    # Without this filter we'd try to delete stuff from entities.  For example 
that pattern above matches
    #   ref:_ v:P143 entity:Q328
    # so we'd try to clear everything from Q328 (enwiki).  So we filter where 
?s is in the value prefix.
    FILTER( STRSTARTS(STR(?s), "http://www.wikidata.org/entity/value/";) ) .
    ?s ?p ?o .
  }
  ```
  This query produces 8 triples, belonging to 2 separate subjects. So the 
reason of the slowdown is FILTER NOT EXISTS. Interestingly enough, this query:
  
  ```
  SELECT ?s ?p ?o
  WHERE {
    <http://www.wikidata.org/entity/Q30> ?statementPred ?statement .
    FILTER( STRSTARTS(STR(?statement), 
"http://www.wikidata.org/entity/statement/";) ) .
    ?statement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
    # Since references are shared we can only clear the values on them when 
they are no longer used
    # anywhere else.
    FILTER NOT EXISTS {
      ?otherStatement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
      ?otherEntity ?otherStatementPred ?otherStatement .
    }
    ?ref ?expandedValuePred ?s .
    # Without this filter we'd try to delete stuff from entities.  For example 
that pattern above matches
    #   ref:_ v:P143 entity:Q328
    # so we'd try to clear everything from Q328 (enwiki).  So we filter where 
?s is in the value prefix.
    FILTER( STRSTARTS(STR(?s), "http://www.wikidata.org/entity/value/";) ) .
    ?s ?p ?o .
  }
  ```
  
  note the internal != filter deleted - also is slow, though in theory it 
should be failing very fast, since without such filter it is clear FILTER NOT 
EXISTS contadicts the previous conditions and at least one data set satisfying 
that condition exists - it's the same data set we have from first three lines 
of the query. 
  
  So looks like these is some issue in processing FILTER NOT EXISTS here, 
somehow it is not optimal. 
  The data can be seen in db01 labs machine, in namespace `wdq`.

TASK DETAIL
  https://phabricator.wikimedia.org/T96094

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Thompsonbry, Thompsonbry.systap, Beebs.systap, Haasepeter, Manybubbles, 
Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, 
JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to