Re: [Wikidata] Wikidata HDT dump

2017-12-15 Thread Wouter Beek
Hi Wikidata community,

Somebody pointed me to the following issue:
https://phabricator.wikimedia.org/T179681  Unfortunately I'm not able
to log in there with the "Phabricator" so I cannot edit the issue
directly.  I'm sending this email instead.

The issue seems to be stalled because it is not possible to create HDT
files that contain more than 2B triples.  However, this is possible in
a specific 64 bit branch, which is how I created the downloadable
version I've sent a few days ago.  As indicated, I can create these
files for the community if there is a use case.

---
Cheers,
Wouter.

Email: wou...@triply.cc
WWW: http://triply.cc
Tel: +31647674624


On Tue, Dec 12, 2017 at 11:24 AM, Wouter Beek  wrote:
> Hi list,
>
> I'm sorry, I was under the impression that I had already shared this
> resource with you earlier, but I haven't...
>
> On 7 Nov I created an HDT file based on the then current download link
> from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
>
> You can download this HDT file and it's index from the following locations:
>   - http://lod-a-lot.lod.labs.vu.nl/data/wikidata.hdt (~45GB)
>   - http://lod-a-lot.lod.labs.vu.nl/data/wikidata.hdt.index.v1-1 (~28GB)
>
> You may need to compile with 64bit support, because there are more
> than 2B triples (https://github.com/rdfhdt/hdt-cpp/tree/develop-64).
> (To be exact, there are 4,579,973,187 triples in this file.)
>
> PS: If this resource turns out to be useful to the community we can
> offer an updated HDT file at a to be determined interval.
>
> ---
> Cheers,
> Wouter Beek.
>
> Email: wou...@triply.cc
> WWW: http://triply.cc
> Tel: +31647674624
>
> On Tue, Nov 7, 2017 at 6:31 PM, Laura Morales  wrote:
>>> drops `a wikibase:Item` and `a wikibase:Statement` types
>>
>> off topic but... why drop `a wikibase:Item`? Without this it seems 
>> impossible to retrieve a list of items.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-12-12 Thread Wouter Beek
Hi list,

I'm sorry, I was under the impression that I had already shared this
resource with you earlier, but I haven't...

On 7 Nov I created an HDT file based on the then current download link
from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz

You can download this HDT file and it's index from the following locations:
  - http://lod-a-lot.lod.labs.vu.nl/data/wikidata.hdt (~45GB)
  - http://lod-a-lot.lod.labs.vu.nl/data/wikidata.hdt.index.v1-1 (~28GB)

You may need to compile with 64bit support, because there are more
than 2B triples (https://github.com/rdfhdt/hdt-cpp/tree/develop-64).
(To be exact, there are 4,579,973,187 triples in this file.)

PS: If this resource turns out to be useful to the community we can
offer an updated HDT file at a to be determined interval.

---
Cheers,
Wouter Beek.

Email: wou...@triply.cc
WWW: http://triply.cc
Tel: +31647674624

On Tue, Nov 7, 2017 at 6:31 PM, Laura Morales  wrote:
>> drops `a wikibase:Item` and `a wikibase:Statement` types
>
> off topic but... why drop `a wikibase:Item`? Without this it seems impossible 
> to retrieve a list of items.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Wouter Beek
Hi Ghislain,

> @Wouter: See here https://dumps.wikimedia.org/wikidatawiki/entities/ ?

Thanks for the pointer!  I'm downloading from
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz now.

The Content-Type header for that URI seems incorrect to me: it says
`application/octet-stream`, but the file actually contains `text/turtle'.
(For specifying the compression mechanism the Content-Encoding header
should be used.)

The first part of the Turtle data stream seems to contain syntax errors for
some of the XSD decimal literals.  The first one appears on line 13,291:

```text/turtle
 <
http://wikiba.se/ontology-beta#geoPrecision> "1.0E-6"^^<
http://www.w3.org/2001/XMLSchema#decimal> .
```

Notice that scientific notation is not allowed in the lexical form of
decimals according to XML Schema Part 2: Datatypes
.  (It is allowed in floats
and doubles.)  Is this a known issue or should I report this somewhere?

---
Cheers!,
Wouter.

Email: wou...@triply.cc
WWW: http://triply.cc
Tel: +31647674624
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-27 Thread Wouter Beek
Dear Laura, others,

If somebody points me to the RDF datadump of Wikidata I can deliver an
HDT version for it, no problem.  (Given the current cost of memory I
do not believe that the memory consumption for HDT creation is a
blocker.)

---
Cheers,
Wouter Beek.

Email: wou...@triply.cc
WWW: http://triply.cc
Tel: +31647674624


On Fri, Oct 27, 2017 at 5:08 PM, Laura Morales  wrote:
> Hello everyone,
>
> I'd like to ask if Wikidata could please offer a HDT [1] dump along with the 
> already available Turtle dump [2]. HDT is a binary format to store RDF data, 
> which is pretty useful because it can be queried from command line, it can be 
> used as a Jena/Fuseki source, and it also uses orders-of-magnitude less space 
> to store the same data. The problem is that it's very impractical to generate 
> a HDT, because the current implementation requires a lot of RAM processing to 
> convert a file. For Wikidata it will probably require a machine with 
> 100-200GB of RAM. This is unfeasible for me because I don't have such a 
> machine, but if you guys have one to share, I can help setup the rdf2hdt 
> software required to convert Wikidata Turtle to HDT.
>
> Thank you.
>
> [1] http://www.rdfhdt.org/
> [2] https://dumps.wikimedia.org/wikidatawiki/entities/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Improve Wikidata links to Dutch municipalities

2017-05-17 Thread Wouter Beek
Hi Andy,

Thank you for the useful feedback.  I'm trying to understand what's
the best way for us to link our LOD activities to Wikidata's.  Your
comments (and also Egon's BTW) are thus much appreciated.

On Wed, May 17, 2017 at 5:55 PM, Andy Mabbett  wrote:
> Other's have answered this, but I would also urge you to import
> Wikidata IDs, and to show them on your pages, like [2].

That's a good idea.  That way a LOD agent can traverse from Kadaster
to Wikidata (in addition to the other way round).  What's the best
practice for doing this on our end?  Should we create a linkset with
statements of the form [1] or are there other/better approaches?

[1]  owl:sameAs 

---
Cheers,
Wouter.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Improve Wikidata links to Dutch municipalities

2017-05-17 Thread Wouter Beek
On Wed, May 17, 2017 at 6:14 PM, Andy Mabbett  wrote:
> Purge the page, per:
>
> https://en.wikipedia.org/wiki/Wikipedia:Purge#Purge_request_to_server
>
> (that's advice for Wikipedia, but the principles are the same)
>
> Also, please give specific URLs for examples, not "one of the...".

Let me indeed be a bit more specific...

For instance the Dutch municipality of Apeldoorn
(https://www.wikidata.org/wiki/Q3018561) should refer to the Kadaster
resource IRI and not to a document IRI (as is currently the case).

Specifically, the value of property "BAG-code for Dutch locations"
(P981) should be changed for each Dutch municipality.  I've edited
P981's property "RDF URI template" (P1921) to the correct IRI
template, but this template is not yet used for the Dutch municipality
instances (also not after purging the cache).

There is also a "formatter URL" property (P1630).  Maybe that
conflicts with the IRI template that I have edited?  Or maybe it takes
precedence over the RDF URI?

---
Cheers,
Wouter.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Improve Wikidata links to Dutch municipalities

2017-05-17 Thread Wouter Beek
Hi Andy,

Thank you for the useful feedback.  I'm trying to understand what's
the best way for us to link our LOD activities to Wikidata's.  Your
comments (and also Egon's BTW) are thus much appreciated.

On Wed, May 17, 2017 at 5:55 PM, Andy Mabbett  wrote:
> Other's have answered this, but I would also urge you to import
> Wikidata IDs, and to show them on your pages, like [2].

That's a good idea.  That way a LOD agent can traverse from Kadaster
to Wikidata (in addition to the other way round).  What's the best
practice for doing this on our end?  Should we create a linkset with
statements of the form [1] or are there other/better approaches?

[1]  owl:sameAs 

---
Cheers,
Wouter.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Improve Wikidata links to Dutch municipalities

2017-05-17 Thread Wouter Beek
Hi,

Thanks to Egon I now understand that I can change the RDF URI
template.  The old template was pointing to the Linked Data browser
(i.e., HTML documents).  I've changed this to the URI template of the
identifiers for Dutch municipalities.

When I visit one of the Dutch municipalities, it still has the old URI
value for the property (P981).  Will the new template that I have
added now be used in future scrapes?  IOW: how to update the
instances?

---
Cheers!,
Wouter.

On Wed, May 17, 2017 at 5:52 PM, Andy Mabbett  wrote:
> On 17 May 2017 at 16:28, Andra Waagmeester  wrote:
>
>> Just to add to Egon's comment IRI's can also be added to Wikidata using the
>> Property P2888 - Exact Match.
>
> I wouldn't advise doing this when we already have the ID in a specific 
> property.
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Improve Wikidata links to Dutch municipalities

2017-05-17 Thread Wouter Beek
Hi Wikidata community,

Kadaster, the Dutch cadastre agency is publishing its base registries as
Linked Open Data.  This means that we have minted unique identifiers for
all buildings, roads, etc in the Netherlands.

Wikidata currently includes links to the Kadaster, but these do not point
to the proper IRIs.  For example, Dutch municipalities in Wikidata
currently link to an HTML viewer of Kadaster [1], but they should point to
our dereferenceable IRI [2].

Low-hanging fruit seems to be to update the Kadaster links for all Dutch
municipalities.  What's the best way to go about this?  Can we make these
corrections ourselves / what is the procedure for editing?

---
Cheers!,
Wouter.

[1] https://bagviewer.kadaster.nl/lvbag/bag-viewer/index.html#?
objectId=3560&detailsObjectId=3560

[2] http://brt.basisregistraties.overheid.nl/top10nl/id/plaats/129612200
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Improve Wikidata links to Dutch municipalities

2017-05-17 Thread Wouter Beek
Hi Wikidata community,

Kadaster, the Dutch cadastre agency is publishing its base registries as
Linked Open Data.  This means that we have minted unique identifiers for
all buildings, roads, etc in the Netherlands.

Wikidata currently includes links to the Kadaster, but these do not point
to the proper IRIs.  For example, Dutch municipalities in Wikidata
currently link to an HTML viewer of Kadaster [1], but they should point to
our dereferenceable IRI [2].

Low-hanging fruit seems to be to update the Kadaster links for all Dutch
municipalities.  What's the best way to go about this?  Can we make these
corrections ourselves / what is the procedure for editing?

---
Cheers!,
Wouter.

[1]
https://bagviewer.kadaster.nl/lvbag/bag-viewer/index.html#?objectId=3560&detailsObjectId=3560

[2] http://brt.basisregistraties.overheid.nl/top10nl/id/plaats/129612200
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata