Harej created this task. Harej added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Internet-Archive.
TASK DESCRIPTION **List of steps to reproduce** (step by step, including full links if applicable): - Create a namespace `wcd` (or any name not being used) using the create namespace script - Attempt to munge a TTL dump from WBStack that uses federated properties from Wikidata (in this instance, wikipediacitations.wiki.opencura.com) using this command: `./munge.sh -c 50000 -f data/wikipediacitations-dump-2021-11-27.ttl.gz -d data/wikipediaCitationsMungeOut -- --conceptUri http://www.wikidata.org --commonsUri http://wikipediacitations.wiki.opencura.com` **What happens?**: A large stream of errors that look like this: 16:47:12.839 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] WARN o.w.q.r.t.r.EntityMungingRdfHandler - Error munging Q6673 org.wikidata.query.rdf.tool.rdf.Munger$BadSubjectException: Unrecognized subjects: [http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-95C82B26-B1DF-4D45-A956-2843C93FF507, http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-66DB22B2-2F9C-4398-A059-6815DF8348A4, http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-62597022-39EF-4825-9F03-F4D710B43BB0, http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-C70B917E-8F15-4CA5-96B0-00D07760CE52, http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-6C9A0D03-B8FD-46C4-AE31-3BF7D8C98040]. Expected only sitelinks and subjects starting with http://wikipediacitations.wiki.opencura.com/wiki/Special:EntityData/ and [http://www.wikidata.org/entity/, http://wikipediacitations.wiki.opencura.com/entity/] at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.finishCommon(Munger.java:800) at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.munge(Munger.java:413) at org.wikidata.query.rdf.tool.rdf.Munger.munge(Munger.java:144) at org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler.munge(EntityMungingRdfHandler.java:163) at org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler.handleStatement(EntityMungingRdfHandler.java:115) at org.wikidata.query.rdf.tool.rdf.DelegatingRdfHandler.handleStatement(DelegatingRdfHandler.java:43) at org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler.handleStatement(NormalizingRdfHandler.java:62) at org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler.lambda$flushStatementBuffer$3(AsyncRDFHandler.java:112) at org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer.run(AsyncRDFHandler.java:151) **What should have happened instead?**: The munge script should have been able to recognize the respective URIs and created a munged dataset. On investigation by WBStack and Internet Archive, we believe this error is caused by references to Wikimedia Commons being hardcoded into the script. If this were more abstract then it should be possible for other Wikibases with federative properties to use it as well. **Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc**: Although I am using the 0.3.40 version of the Docker image, I am using `wikidata-query-tools-0.3.92-SNAPSHOT-jar-with-dependencies.jar` provided to me by the Search Platform team. TASK DETAIL https://phabricator.wikimedia.org/T296656 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Harej Cc: Aklapper, Harej, MPhamWMF, CBogen, Biazzzzoo, Philoserf, Ironie, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, Hydriz, aude, Tobias1984, Manybubbles, Jay8g
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org