dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata.
TASK DESCRIPTION As wdqs user I want triples shared by multiple entities to be treated separately in the streaming updater so that they are not deleted when an entity stops referencing them. Some shared statements are still present in the rdf stream, these are identified at consumption but should be handled and categorized when producing them. java.lang.IllegalArgumentException: Cannot add/delete the same triple for a different entity (should probably be considered as a shared statement) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.lambda$findInvalidStatements$6(PatchAccumulator.java:74) at java.util.HashMap.forEach(HashMap.java:1289) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.findInvalidStatements(PatchAccumulator.java:71) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:54) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:108) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at org.wikidata.query.rdf.updater.consumer.KafkaStreamConsumer.poll(KafkaStreamConsumer.java:131) at org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.lambda$run$0(StreamingUpdaterConsumer.java:46) at org.wikidata.query.rdf.common.TimerCounter.time(TimerCounter.java:51) at org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.run(StreamingUpdaterConsumer.java:46) at org.wikidata.query.rdf.updater.consumer.StreamingUpdate.main(StreamingUpdate.java:49) AC: - the producer should identify all shared triples properly - the consumer should continue to fail when such triples are detected but the log message should be clearer and includes the triple and the entities it belongs to - bonus: the consumer should have a way to "fixup" these triples by "re-categorizing" them on the fly so that the rdf stream does not have to be re-generated TASK DETAIL https://phabricator.wikimedia.org/T266751 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs