Harej created this task.
Harej added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Internet-Archive.

TASK DESCRIPTION
  **List of steps to reproduce** (step by step, including full links if 
applicable):
  
  - Create a namespace `wcd` (or any name not being used) using the create 
namespace script
  - Attempt to munge a TTL dump from WBStack that uses federated properties 
from Wikidata (in this instance, wikipediacitations.wiki.opencura.com) using 
this command: `./munge.sh -c 50000 -f 
data/wikipediacitations-dump-2021-11-27.ttl.gz -d 
data/wikipediaCitationsMungeOut -- --conceptUri http://www.wikidata.org 
--commonsUri http://wikipediacitations.wiki.opencura.com`
  
  **What happens?**:
  
  A large stream of errors that look like this:
  
    16:47:12.839 
[org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] WARN  
o.w.q.r.t.r.EntityMungingRdfHandler - Error munging Q6673
    org.wikidata.query.rdf.tool.rdf.Munger$BadSubjectException: Unrecognized 
subjects:  
[http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-95C82B26-B1DF-4D45-A956-2843C93FF507,
 
http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-66DB22B2-2F9C-4398-A059-6815DF8348A4,
 
http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-62597022-39EF-4825-9F03-F4D710B43BB0,
 
http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-C70B917E-8F15-4CA5-96B0-00D07760CE52,
 
http://wikipediacitations.wiki.opencura.com/entity/statement/Q6673-6C9A0D03-B8FD-46C4-AE31-3BF7D8C98040].
  Expected only sitelinks and subjects starting with 
http://wikipediacitations.wiki.opencura.com/wiki/Special:EntityData/ and 
[http://www.wikidata.org/entity/, 
http://wikipediacitations.wiki.opencura.com/entity/]
            at 
org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.finishCommon(Munger.java:800)
            at 
org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.munge(Munger.java:413)
            at org.wikidata.query.rdf.tool.rdf.Munger.munge(Munger.java:144)
            at 
org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler.munge(EntityMungingRdfHandler.java:163)
            at 
org.wikidata.query.rdf.tool.rdf.EntityMungingRdfHandler.handleStatement(EntityMungingRdfHandler.java:115)
            at 
org.wikidata.query.rdf.tool.rdf.DelegatingRdfHandler.handleStatement(DelegatingRdfHandler.java:43)
            at 
org.wikidata.query.rdf.tool.rdf.NormalizingRdfHandler.handleStatement(NormalizingRdfHandler.java:62)
            at 
org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler.lambda$flushStatementBuffer$3(AsyncRDFHandler.java:112)
            at 
org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer.run(AsyncRDFHandler.java:151)
  
  **What should have happened instead?**:
  
  The munge script should have been able to recognize the respective URIs and 
created a munged dataset.
  
  On investigation by WBStack and Internet Archive, we believe this error is 
caused by references to Wikimedia Commons being hardcoded into the script. If 
this were more abstract then it should be possible for other Wikibases with 
federative properties to use it as well.
  
  **Software version (if not a Wikimedia wiki), browser information, 
screenshots, other information, etc**: Although I am using the 0.3.40 version 
of the Docker image, I am using 
`wikidata-query-tools-0.3.92-SNAPSHOT-jar-with-dependencies.jar` provided to me 
by the Search Platform team.

TASK DETAIL
  https://phabricator.wikimedia.org/T296656

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Harej
Cc: Aklapper, Harej, MPhamWMF, CBogen, Biazzzzoo, Philoserf, Ironie, 
Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, Hydriz, aude, Tobias1984, Manybubbles, Jay8g
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to