Re: [BioMart Users] [mart-dev] Adding URIs to columns and items

Peter Ansell Mon, 14 Mar 2011 14:22:08 -0700

On 15 March 2011 00:31, Joachim Baran <[email protected]> wrote:
>
> On 11-03-13 6:28 PM, "Peter Ansell" <[email protected]> wrote:
>>I didn't realise that the pathway_id (and other similar primary keys)
>>would be published identifiers. If there are permanent identifiers
>>provided by the dataset, such as "hsa05200" it is relatively easy to
>>construct a URI using that, without resorting to adding an identifier
>>like 238947 that may not be permanent.
>  The number you are seeing was introduced because of a technical
> difficulty during KEGG-data import, but I think you got a very good point
> that the pathway identifier should be exposed in the URI instead. I will
> look into it, because we should not really expose the internally used
> number to the outside world anyway.


This was the reason that I suggested defining a Primary URI for each
table. The Primary URI is used as the Subject when you convert a
<FieldName,FieldValue> pair into a RDF <Subject, Predicate, Object>
triple. The FieldName would map to the Predicate URI. The Field Value
would generally be used directly as a Literal, except where it is a
foreign key and has been mapped to a URI.

>>On the other hand though, BioMart may be in a good position to provide
>>a discussion table for this subject though.[...] Ideally providers should
>>be able to define a single
>>authoritative URL for each item and publish using it, but they haven't
>>been able to do this easily so far.
>  If you know other people that you think should join this discussion,
> please direct them to this thread in the mailing list. We are having a
> great opportunity here to provide a bespoken interface that suits the
> needs of the user community, rather than risking to cater for parts of the
> standards that are not practically relevant and ignoring those features
> people would actually like to see (because we cannot implement everything
> that the W3C defines).

It is important to distinguish between SPARQL Results bindings that
results from SPARQL SELECT and ASK queries, and RDF triples that
result from SPARQL CONSTRUCT and DESCRIBE queries. With both types of
queries you don't have to re-implement the entire standard, that is
what the RDF library (currently Jena) is about. I have more experience
with the Sesame library, but Jena is just as functional and
maintained.

For Construct and Describe queries the RDF library will parse the
users SPARQL query into an abstract query map for you. When you have
processed that map (basically a bunch of BasicGraphPattern's) you
perform the BioMart query and load up the in-memory RDF triple set or
SPARQL results bindings.

Jena will provide all of the query and serialisation aspects (to
SPARQL Results XML, SPARQL Results JSON, RDF/XML, Turtle, NTriples,
RDF/JSON), as well reasoning when you are ready. You only have to
define the mappings between the abstract SPARQL model that Jena parses
to, and the mapping to either RDF triples or SPARQL results bindings
to support all 4 types of queries (although ASK isn't typically used
from what I have seen but it isn't hard to implement).

You could even choose to only support output to RDF/SPARQL at first,
and then implement the input and processing of SPARQL queries later. I
would really like RDF output before SPARQL input or even SPARQL
results bindings output.

>>They may be able to easily define the URL format for each record in a
>>single place for each table, for example;
>>
>>biomart://dcc-dev.res.oicr.on.ca/pathway_config_1/kegg/kegg_pathway/${path
>>way_id}
>>
>>where ${pathway_id} was replaced by that field for each record.
>  This looks good and I was preparing for something along those lines
> already. You will be able to make RDF relevant decisions on a
> per-attribute basis in BioMart, which is more fine grained. An example is
> in the making..

If you are outputting to RDF triples, you will need to create Subject
URIs (from my investigation of the way BioMart works). If you are
outputting to SPARQL Results bindings you don't need subject URIs as
you just have <BindingName,BindingValue> pairs to output, and
BindingName isn't even a URI, it is just a literal string.

Cheers,

Peter
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Re: [BioMart Users] [mart-dev] Adding URIs to columns and items

Reply via email to