dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.
TASK DESCRIPTION
As wdqs user I want triples shared by multiple entities to be treated
separately in the streaming updater so that they are not deleted when an entity
stops referencing them.
Some shared statements are still present in the rdf stream, these are
identified at consumption but should be handled and categorized when producing
them.
java.lang.IllegalArgumentException: Cannot add/delete the same triple for a
different entity (should probably be considered as a shared statement)
at
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.lambda$findInvalidStatements$6(PatchAccumulator.java:74)
at java.util.HashMap.forEach(HashMap.java:1289)
at
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.findInvalidStatements(PatchAccumulator.java:71)
at
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:54)
at
org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:108)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at
org.wikidata.query.rdf.updater.consumer.KafkaStreamConsumer.poll(KafkaStreamConsumer.java:131)
at
org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.lambda$run$0(StreamingUpdaterConsumer.java:46)
at
org.wikidata.query.rdf.common.TimerCounter.time(TimerCounter.java:51)
at
org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.run(StreamingUpdaterConsumer.java:46)
at
org.wikidata.query.rdf.updater.consumer.StreamingUpdate.main(StreamingUpdate.java:49)
AC:
- the producer should identify all shared triples properly
- the consumer should continue to fail when such triples are detected but the
log message should be clearer and includes the triple and the entities it
belongs to
- bonus: the consumer should have a way to "fixup" these triples by
"re-categorizing" them on the fly so that the rdf stream does not have to be
re-generated
TASK DETAIL
https://phabricator.wikimedia.org/T266751
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86,
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll,
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs