OK, thanks!

BTW, I could confirm the NPE solved by adding that json-YYYYMMDD/ subdir...

Another question: is it possible to cancel to process of parsing a
datadump file programmatically? I saw the time out, but integrating it
in a GUI where the user may push a cancel button, and would be nice if
I could propagate that, and stop the actual processing...

Egon

On Sun, Jan 18, 2015 at 3:23 PM, Markus Krötzsch
<[email protected]> wrote:
> The issue was fixed in master now. I also added some more INFO-type messages
> that will report about the dump files found online and locally.
>
> Cheers,
>
> Markus
>
>
> On 18.01.2015 14:26, Markus Krötzsch wrote:
>>
>> On 18.01.2015 10:58, Egon Willighagen wrote:
>>>
>>> On Sat, Jan 17, 2015 at 11:04 PM, Markus Krötzsch
>>> <[email protected]> wrote:
>>>>
>>>> It is easy to fix this (though I will not fix it tonight, but
>>>> tomorrow) by
>>>> just adjusting the HTML strings we parse for.
>>>
>>>
>>> Sure! I have subscribed to the bug report.
>>>
>>> As an intermediate workaround for me, what file name pattern is used
>>> in the local cache?
>>>
>>> I had manually downloaded a file (and made it available as torrent
>>> because it was only at about 1 MB/s, [0]) and put this in the folder,
>>> but it was not recognized... the file on the server is:
>>> http://dumps.wikimedia.org/other/wikidata/20150112.json.gz
>>>
>>> But as 20150112.json.gz it is not detected... I noted the the json-*
>>> pattern in the code, but json-20150112.json.gz didn't work either...
>>
>>
>> The dump files are put into subdirectories of the current directory
>> ("."), for example:
>>
>> ./dumpfiles/wikidatawiki/json-20150105/20150105.json.gz
>> (JSON dump)
>>
>>
>> ./dumpfiles/wikidatawiki/current-20141009/wikidatawiki-20141009-pages-meta-current.xml.bz2
>>
>> (current revision XML dump)
>>
>> If you create a directory of this form and put a file in there with the
>> file name as found online, then the tool will find it.
>>
>>>
>>> BTW, a second question, is there a way to list all local (JSON) dumps
>>> using the WDTK api?
>>
>>
>> Yes, though it's not very convenient right now. To restrict to local
>> files, you can use the DumpProcessingController in offline mode (then it
>> only looks at local files):
>>
>>
>> DumpProcessingController dumpProcessingController =
>>      new DumpProcessingController("wikidatawiki");
>> dumpProcessingController.setOfflineMode(true);
>>
>> List<MwDumpFile> localJsonDumps =
>>      dumpProcessingController.
>>      getWmfDumpFileManager().
>>      findAllDumps(DumpContentType.JSON);
>>
>> This gives you a list of MwDumpFile objects that you can access to get
>> their date (getDateStamp()) and also to access the file contents.
>>
>> I think we should log some additional messages about the files that are
>> found and used.
>>
>> Cheers,
>>
>> Markus
>>
>>>
>>>> We should also improve our error reporting for this case, obviously.
>>>
>>>
>>> Yeah, that's an art no software I ever worked with mastered... it's
>>> hard! But it's important... I was completely looking in the wrong
>>> place... mind you, monitoring logging messages can be hard too, when
>>> WDTK is used in other environments, such as Bioclipse, and you cannot
>>> rely on those message to show up :(
>>>
>>> Thanks for immediately looking into it and looking forward to pointers
>>> for my two questions,
>>>
>>> greetings,
>>>
>>> Egon
>>>
>>
>
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l



-- 
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen

_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to