On 27.08.2016 07:18, Sumit Asthana wrote:
Hi,

I'm trying to use offline wikidata dump
<https://dumps.wikimedia.org/wikidatawiki/entities/20160822/> but when I
run an example from Wikidata Toolkit - EntityStatisticsProcessor
<https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examples/EntityStatisticsProcessor.java>,
I hit the following error - https://dpaste.de/TNpd.

Apparently it is unable to parse the dump but I can't seem to figure it
out. Help would be appreciated :)

This happens if your dump download was incomplete. It seems that (recently) the download is sometimes interrupted and needs to be resumed to get the whole file. Our implementation is not smart enough to fix this and ends up with an incomplete dump.

You can download the dump in any way you like, including using a browser with "safe as". I prefer to use wget. You just need to put it into the right directory where WDTK also puts dumps. When you start WDTK, it reports the file to be downloaded and the place where it puts the download, so this is one way to find out.

Dump files are the ones found at https://dumps.wikimedia.org/other/wikidata/ (with the file names used there). They go into the directory named like ./dumpfiles/wikidatawiki/json-20160801 (for the dump https://dumps.wikimedia.org/other/wikidata/20160801.json.gz). The dumpfiles directory is under the directory from where you run your program.

Best,

Markus



-Thanks,
Sumit


On Sat, Aug 27, 2016 at 1:18 AM, Stas Malyshev <smalys...@wikimedia.org
<mailto:smalys...@wikimedia.org>> wrote:

    Hi!

    > For example "I want to know the number of statements on an average with
    > dead external reference links".

    Since there are over a million links in references, you probably may
    want to use dump - either JSON or RDF, and looking for references there.
    It would be relatively easy to find those in reference statements.
    However, checking a million links might require some careful planning :)
    --
    Stas Malyshev
    smalys...@wikimedia.org <mailto:smalys...@wikimedia.org>

    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wikidata
    <https://lists.wikimedia.org/mailman/listinfo/wikidata>




_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to