On 11-03-13 6:28 PM, "Peter Ansell" <[email protected]> wrote:
>I didn't realise that the pathway_id (and other similar primary keys)
>would be published identifiers. If there are permanent identifiers
>provided by the dataset, such as "hsa05200" it is relatively easy to
>construct a URI using that, without resorting to adding an identifier
>like 238947 that may not be permanent.
The number you are seeing was introduced because of a technical
difficulty during KEGG-data import, but I think you got a very good point
that the pathway identifier should be exposed in the URI instead. I will
look into it, because we should not really expose the internally used
number to the outside world anyway.
>On the other hand though, BioMart may be in a good position to provide
>a discussion table for this subject though.[...] Ideally providers should
>be able to define a single
>authoritative URL for each item and publish using it, but they haven't
>been able to do this easily so far.
If you know other people that you think should join this discussion,
please direct them to this thread in the mailing list. We are having a
great opportunity here to provide a bespoken interface that suits the
needs of the user community, rather than risking to cater for parts of the
standards that are not practically relevant and ignoring those features
people would actually like to see (because we cannot implement everything
that the W3C defines).
>They may be able to easily define the URL format for each record in a
>single place for each table, for example;
>
>biomart://dcc-dev.res.oicr.on.ca/pathway_config_1/kegg/kegg_pathway/${path
>way_id}
>
>where ${pathway_id} was replaced by that field for each record.
This looks good and I was preparing for something along those lines
already. You will be able to make RDF relevant decisions on a
per-attribute basis in BioMart, which is more fine grained. An example is
in the making..
>In my opinion it isn't useful to create URIs based on strings that
>aren't designed as permanent unambiguous identifiers. For example, the
>gene known as TP53 may change meaning, but hsa05200 is likely to
>either stay as the same meaning or be completely deprecated rather
>than have its meaning gradually change. This shouldn't need to be a
>large part of the discussion as in relational databases we always have
>primary key sets to fall back on and unique URIs can be created solely
>based on them.
Sure. So, I will expose data such as the associated gene name as
literal, whereas unique identifiers will be URIs. That makes indeed much
more sense.
Thanks,
Joachim
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users