Re: [Wikidata] Do you use the Wikidata entity dump dcatap.rdf?

2017-09-27 Thread Stas Malyshev
Hi!

> is anyone using the Wikidata entity dump dcatap.rdf at
> https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf?
> 
> It is very rarely used and is thus causing us a (probably) undue
> maintenance burden, because of which we plan to remove it.

What's the issue with it? I don't use it but it seems to be part of
standard for dataset descriptions, so I wonder if the issues can be
fixed. I don't know too much about it but from the description is seems
to be very automatable.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Do you use the Wikidata entity dump dcatap.rdf?

2017-09-27 Thread Jan Ainali
If it is used, (although rarely) and you are not sure if it is causing you
any undue burden, why remove a metadata description for linked data
recommended by the EU?

Med vänliga hälsningar
Jan Ainali
http://ainali.com

2017-09-27 12:04 GMT+02:00 Marius Hoch :

> Hi folks,
>
> is anyone using the Wikidata entity dump dcatap.rdf at
> https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf?
>
> It is very rarely used and is thus causing us a (probably) undue
> maintenance burden, because of which we plan to remove it.
>
> If anyone is making use of it, please speak up so that we can keep it or
> find a viable alternative.
>
> Cheers,
> Marius
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread John Erling Blad
The most important thing for my problem would be to encode quantity and
geopos. The test case is lake sizes to encode proper localized descriptions.

Unless someone already have a working solution I would encode this as
sparse logarithmic vectors, probably also with log of pairwise differences.

Encoding of qualifiers is interesting, but would require encoding of a
topic map, and that adds an additional layer of complexity.

How to encode the values are not so much the problem, but avoiding
reimplementing this yet another time… ;)

On Wed, Sep 27, 2017 at 1:23 PM, Thomas Pellissier Tanon <
tho...@pellissier-tanon.fr> wrote:

> Just an idea of a very sparse but hopefully not so bad encoding (I have
> not actually tested it).
>
> NB: I am going to use a lot the terms defined in the glossary [1].
>
> A value could be encoded by a vector:
> - for entity ids it is a vector V that have the dimension of the number of
> existing entities such that V[q] = 1 if, and only if, it is the entity q
> and V[q] = 0 if not.
> - for time : a vector with year, month, day, hours, minutes, seconds,
> is_precision_year, is_precision_month, ..., is_gregorian, is_julian (or
> something similar)
> - for geo coordinates latitude, longitude, is_earth, is_moon...
> - string/language strings: an encoding depending on your use case
> ...
> Example : To encode "Q2" you would have the vector {0,1,0}
> To encode the year 2000 you would have {2000,0..., is_precision_decade =
> 0,is_precision_year=1,is_precision_month=0,...,is_gregorian=true,...}
>
> To encode a snak you build a big vector by concatenating the vector of the
> value if it is P1, if it is P2... (you use the property datatype to pick a
> good vector shape) + you add two cells per property to encode is_novalue,
> is_somevalue. To encode "P31: Q5" you would have a vector V =
> {0,,0,0,0,0,1,0,} with 1 only for  V[P31_offset + Q5_offset]
>
> To encode a claim you could concatenate the main snak vector + the
> qualifiers vectors that is the merge of the snak vector for all qualifiers
> (i.e. you build the vector for all snak and you sum them) such that the
> qualifier vectors encode all qualifiers at the same time. it allows to
> check that a qualifiers is set just by picking the right cell in the
> vector. But it will do bad things if there are two qualifiers with the same
> property and having a datatype like time or geocoordinates. But I don't
> think it really a problem.
> Example: to encode the claim with "P31: Q5" main snak and qualifiers "P42:
> Q42, P42: Q44" we would have a vector V such that V[P31_offset + Q5_offset]
> = 1, V[qualifiers_offset + P42_offset + Q42_offset] = 1 and
> V[qualifiers_offset + P42_offset + Q44_offset] = 1 and 0 elsewhere.
>
> I am not sure how to encode statements references (merge all of them and
> encode it just like the qualifiers vector is maybe a first step but is bad
> if we have multiple references).  For the rank you just need 3 booleans
> is_preferred, is_normal and is_deprecated.
>
> Cheers,
>
> Thomas
>
> [1] https://www.wikidata.org/wiki/Wikidata:Glossary
>
>
> > Le 27 sept. 2017 à 12:41, John Erling Blad  a écrit :
> >
> > Is there anyone that has done any work on how to encode statements as
> features for neural nets? I'm mostly interested in sparse encoders for
> online training of live networks.
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Do you use the Wikidata entity dump dcatap.rdf?

2017-09-27 Thread Marius Hoch

Hi folks,

is anyone using the Wikidata entity dump dcatap.rdf at 
https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf?


It is very rarely used and is thus causing us a (probably) undue 
maintenance burden, because of which we plan to remove it.


If anyone is making use of it, please speak up so that we can keep it or 
find a viable alternative.


Cheers,
Marius

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread Thomas Pellissier Tanon
Just an idea of a very sparse but hopefully not so bad encoding (I have not 
actually tested it).

NB: I am going to use a lot the terms defined in the glossary [1].

A value could be encoded by a vector:
- for entity ids it is a vector V that have the dimension of the number of 
existing entities such that V[q] = 1 if, and only if, it is the entity q and 
V[q] = 0 if not.
- for time : a vector with year, month, day, hours, minutes, seconds, 
is_precision_year, is_precision_month, ..., is_gregorian, is_julian (or 
something similar)
- for geo coordinates latitude, longitude, is_earth, is_moon...
- string/language strings: an encoding depending on your use case
...
Example : To encode "Q2" you would have the vector {0,1,0}
To encode the year 2000 you would have {2000,0..., is_precision_decade = 
0,is_precision_year=1,is_precision_month=0,...,is_gregorian=true,...}

To encode a snak you build a big vector by concatenating the vector of the 
value if it is P1, if it is P2... (you use the property datatype to pick a good 
vector shape) + you add two cells per property to encode is_novalue, 
is_somevalue. To encode "P31: Q5" you would have a vector V = 
{0,,0,0,0,0,1,0,} with 1 only for  V[P31_offset + Q5_offset]

To encode a claim you could concatenate the main snak vector + the qualifiers 
vectors that is the merge of the snak vector for all qualifiers (i.e. you build 
the vector for all snak and you sum them) such that the qualifier vectors 
encode all qualifiers at the same time. it allows to check that a qualifiers is 
set just by picking the right cell in the vector. But it will do bad things if 
there are two qualifiers with the same property and having a datatype like time 
or geocoordinates. But I don't think it really a problem.
Example: to encode the claim with "P31: Q5" main snak and qualifiers "P42: Q42, 
P42: Q44" we would have a vector V such that V[P31_offset + Q5_offset] = 1, 
V[qualifiers_offset + P42_offset + Q42_offset] = 1 and V[qualifiers_offset + 
P42_offset + Q44_offset] = 1 and 0 elsewhere.

I am not sure how to encode statements references (merge all of them and encode 
it just like the qualifiers vector is maybe a first step but is bad if we have 
multiple references).  For the rank you just need 3 booleans is_preferred, 
is_normal and is_deprecated.

Cheers,

Thomas

[1] https://www.wikidata.org/wiki/Wikidata:Glossary


> Le 27 sept. 2017 à 12:41, John Erling Blad  a écrit :
> 
> Is there anyone that has done any work on how to encode statements as 
> features for neural nets? I'm mostly interested in sparse encoders for online 
> training of live networks.
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



signature.asc
Description: Message signed with OpenPGP
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] WikiPedia to WikiData URIs

2017-09-27 Thread Timothy Holborn
any chance of converting
https://en.wikipedia.org/wiki/Linked_data#Principles to say -
https://en.wikidata.org/wiki/Linked_data#Principles via automated tools?

or perhaps some sort  of agent method that adds literals from wikipedia for
an RDF agent?

Or some other similar method / idea?

Tim.H.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread John Erling Blad
Is there anyone that has done any work on how to encode statements as
features for neural nets? I'm mostly interested in sparse encoders for
online training of live networks.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata