Re: [Wikidata] Multiple properties/identifiers for the same resource

Jerven Tjalling Bolleman Tue, 03 May 2016 02:30:08 -0700

Or in other words, I would like a formatter for
Property:P1019 and most instance of wiki:Q19847637
should be instances of Property:P1019 instead.


Regards,
Jerven

On 03/05/16 11:21, Jerven Tjalling Bolleman wrote:

Hi Egon, All,

If something is identifiable by a String, but there is no
official RDF serialization. Then I would make the identifier property in
question a subproperty of

  https://www.wikidata.org/wiki/Property:P973

The value should then be the "most" official url of a webpage
having the thing identified by the string as its topic.

Lets take the example of Property:P351 (Entrez Gene ID)
There is no offical RDF for Entrez Gene. So I would have made it
a

p:P351 instanceof p:P973 .

Then formatter URL would need a capturing group because it is the
inverse of the current logic.

p:P351 p:P1630
"https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=default&list_uids=(.+)"


Then the example

wd:Q14911732 p:P351 "1017"

Would instead be recorded in the backend as

wd:Q14911732 p:P351
<https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=default&list_uids=1017>


I do believe the current UI view is correct, I just think URLs/URIs
should be the preferred solution for identifiers in WikiData.

Regards,
Jerven



On 02/05/16 20:00, Egon Willighagen wrote:

Hi Jerven, all

On Fri, Apr 29, 2016 at 3:29 PM, Jerven Tjalling Bolleman
<[email protected]> wrote:

Could I be so bold to suggest that in Wikidata we should strive
to use external URI's for identifiers not Strings.

For example in Wikidata, there are a lot of UniProt accessions.
e.g. behind the property https://www.wikidata.org/wiki/P352
and there is a formatter for a URL.

I think this is the wrong way round, there should be an URL/URI there
and a formatter to generate a local string for display purposes.

And of course for chembl the URL/URI to use would be

   <http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL101690?

There a 2 advantages to this. It allows easier federates queries from
the source databases into wikidata (no URI conversions etc..)
The second is that these URIs are clearly not ambiguous.


What would you suggest for identifiers that do not have an official
RDF serialization?

Egon

Regards,
Jerven


On 28/04/16 23:49, Julie McMurry wrote:


"One should also point out to the authorities maintaining these IDs


that they should spend some effort on producing a workable solution for
this. It seems they should be the first to provide a resolver service
(or maybe it would be an "ID search engine" if it is so complicated).

With the qualifiers in place, Wikidata can also be used to achieve
this,
of course, but it seems we are just manually reverse engineering
something that should be done at the site of whoever is controlling the
ID registration."

Well said, Markus. A most hearty agreement here on my side and one
colleagues and I have been trying to raise awareness of for a long time
now (http://bit.ly/id-guidance). One of the challenges is that
databases
are already being asked to do more with less. They can see the utility
of such a service to others, but when I've asked DBs before (not naming
names), traction has been limp (I've yet to ask Chembl). Sometimes it
works out though. For instance, KEGG used to have 12 different
type-specific URLs, corresponding to:

kegg.compound
kegg.disease
kegg.drug
kegg.environ
kegg.genes
kegg.genome
kegg.glycan
kegg.metagenome
kegg.module
kegg.orthology
kegg.pathway
kegg.reaction

Thankfully, they've collapsed those to a single URL pattern.

The databases that find it the toughest are not those who simply don't
embed typing, but rather those that don't embed typing AND ALSO have
local identifiers that would otherwise collide. For instance, a
prominent bio database is in this boat (not naming names) and would
like
to make things better but it is hard and messy due to the collisions.

FYI 345 of the 560+ records in the identifiers.org
<http://identifiers.org> corpus are type-specific at the level of
identifiers.org <http://identifiers.org>'s namespace; these roll up to
~300 providers.

The question though is what WikiData is trying to accomplish. Say you
encounter the chembl ID CHEMBL308052
<http://linkedchemistry.info/chembl/chemblid/CHEMBL308052> do you need
to retrieve the type of the entity for reasons other than determining
what URL to use?

How are you representing entity labels / IDs to users?

Best,
Julie









_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
-------------------------------------------------------------------
Jerven Bolleman                        [email protected]
SIB Swiss Institute of Bioinformatics  Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.sib.swiss - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------


_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
-------------------------------------------------------------------
Jerven Bolleman                        [email protected]
SIB Swiss Institute of Bioinformatics  Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.sib.swiss - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Multiple properties/identifiers for the same resource

Reply via email to