dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  As wdqs user I want triples shared by multiple entities to be treated 
separately in the streaming updater so that they are not deleted when an entity 
stops referencing them.
  
  Some shared statements are still present in the rdf stream, these are 
identified at consumption but should be handled and categorized when producing 
them.
  
    java.lang.IllegalArgumentException: Cannot add/delete the same triple for a 
different entity (should probably be considered as a shared statement)
            at 
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.lambda$findInvalidStatements$6(PatchAccumulator.java:74)
            at java.util.HashMap.forEach(HashMap.java:1289)
            at 
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.findInvalidStatements(PatchAccumulator.java:71)
            at 
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:54)
            at 
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:108)
            at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
            at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
            at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
            at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
            at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
            at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
            at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
            at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
            at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
            at 
org.wikidata.query.rdf.updater.consumer.KafkaStreamConsumer.poll(KafkaStreamConsumer.java:131)
            at 
org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.lambda$run$0(StreamingUpdaterConsumer.java:46)
            at 
org.wikidata.query.rdf.common.TimerCounter.time(TimerCounter.java:51)
            at 
org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.run(StreamingUpdaterConsumer.java:46)
            at 
org.wikidata.query.rdf.updater.consumer.StreamingUpdate.main(StreamingUpdate.java:49)
  
  AC:
  
  - the producer should identify all shared triples properly
  - the consumer should continue to fail when such triples are detected but the 
log message should be clearer and includes the triple and the entities it 
belongs to
  - bonus: the consumer should have a way to "fixup" these triples by 
"re-categorizing" them on the fly so that the rdf stream does not have to be 
re-generated

TASK DETAIL
  https://phabricator.wikimedia.org/T266751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to