Hi Alan,
The SitelinksExample shows how to get the basic language-links data. In
Wikidata, sites are encoded by IDs such as "enwiki" or "frwikivoyage".
To find out what they mean in terms of URLs, you need to get the
interlanguage information first. The example shows you how to do this.
The site link information for a particular item can be found in the
ItemDocument for that item. There are two ways of getting an ItemDocument:
(1) You process the dump file to process all items one by one (in the
order in which they appear in the dump). This is best if you want to
look at very many items, or if you must work completely in offline mode.
(2) You fetch individual items from the Web API individually (random
access). This is best if you only need the links for a few selected
items only (fetching hundreds from the API is quick, fetching millions
is infeasible).
You can find many examples for doing things along the lines of (1) with
WDTK. For (2), see the example FetchOnlineDataExample (this is only part
of the development version of v0.5.0 so far, which you can find on github).
In either case, you can direclty read out any sitelink from the
ItemDocument object. It will give you the article title, the site id
("enwiki" etc.), and the list of badges (if any). To turn this into a
URL, you would use code as in the SitelinksExample.
Cheers,
Markus
On 17.04.2015 15:18, Alan Said wrote:
Hi all,
I am trying to use the Wikidata Toolkit to extract interlanguage links
for certain pages from Wikipedia.
So far, I've tried different attempts based on the code provided in
SiteLinksExample
(https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-examples/src/main/java/org/wikidata/wdtk/examples/SitelinksExample.java)
without any success. I've realized that this is likely not the correct
approach.
Optimally I'd like to do this while processing a local file, I've
downloaded a pages-meta-current.xml.bz2 file, but I can't really get my
head around how to go ahead with this.
Any pointers are appreciated.
Best,
Alan
--
Alan Said
Recorded Future
e: [email protected] <mailto:[email protected]>
t: @alansaid
w: www.alansaid.com <http://www.alansaid.com>
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l