Nikerabbit created this task.
Nikerabbit added projects: Wikibase-Containers, User-Nikerabbit.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  Symptoms:
  
  - The script uses all CPU for hours, without producing any output.
  
  Steps to reproduce:
  
  1. Install wdqs using wikibase-docker (version 0.3.10)
  2. docker-compose exec wdqs mkdir -p data/split
  3. time docker-compose exec wdqs curl -L 
https://nimiarkisto.fi/dumps/nimiarkisto.fi-CC-BY-4.0_2020-09-09.rdf.bz2 -o 
data/dump.rdf.bz2
  4. time docker-compose exec wdqs ./munge.sh -c 50000 -f data/dump.rdf.bz2 -d 
data/split -l en,fi,sv -s
  
  I have checked that this is not just slow. With Wikidata Lexemes dump it does 
output to the log and to the split files. With Nimiarkisto dump I only get:
  
    root@nimiarkisto-qs:~/nimiarkisto-qs# time docker-compose exec wdqs 
./munge.sh -c 5000 -f data/dump.rdf -d data/split -l en,fi,sv -s
    #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n
    10:03:13.441 [main] INFO  org.wikidata.query.rdf.tool.Munge - Switching to 
data/split/wikidump-000000001.ttl.gz
    ^C
  
  And the file `data/split/wikidump-000000001.ttl.gz` contains no output.
  
  Is it possible to enable more verbose logging to debug this further?

TASK DETAIL
  https://phabricator.wikimedia.org/T263427

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikerabbit
Cc: Aklapper, Nikerabbit, Samantha_Alipio_WMDE, Akuckartz, darthmon_wmde, 
Jelabra, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Asahiko, despens, Wikidata-bugs, aude, Nemo_bis, 
Addshore, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to