[Helen, James: Marco asks about representing Array Express expression
data for inclusion in the framework of our Semantic Web demo.]
Hi Marco,
This is definitely something we are interested in doing. There are a
number of aspects of this problem - what are you interested in - some
examples are:
1) Representing the information about the samples, experiment,
protocols leading to the hybridization, technical aspects of the
hybridization, etc.
2) Representing what the computed intensity of the spots on an array,
as well as how those were computed (e.g. MAS5, rma, d-chip, etc)
3) Representing which genes are thought to be relatively highly
expressed by interpreting the intensity of the spots as amount of
expression of certain genes.
The first listed is one of the motivations for OBI (http://
obi.sourceforge.net/) Helen Parkinson and James Malone work on
microarray informatics and curation at the EBI. I'm ccing them on
this email.
I've split apart the second and third based on experience that says
that the relationship between spot and gene is not always
straightforward and is best represented as parts in the
representation. For 2, aspects of the data treatment fall in the
scope of OBI, and I recommend chatting with Helen/James about them.
For other part there is some work on either representing or
incorporating information about the probes (do I remember correctly
that affymetrix does have some RDF?), and then there is a choice of
how to represent the numbers associated with each spot. At least
Jonathan Rees and I could talk about that, although there are others
here as well.
Finally, for 3 there is the matter of recording which probes are
believed to be associated with which genes, and then representing the
mapping. Our mapping of orthology might be a guide there. You could
choose to represent all genes that have probes on the chip, or choose
an approach more along the line that the Allen Institute for Brain
Science uses - choosing some number of the most expressed genes, as a
summary. AIBS has XML for such summaries, and one project that is on
the queue is to represent those. See, e.g., http://www.brain-map.org/
mouse/gene/browserXml.html for the top level - http://www.brain-
map.org/mouse/Hypothalamus/GeneExpression/1.xml list (along with a
bunch of auxiliary information) the top genes expressed in the
Hypothalamus.
A key part of the exercise is getting clear on exactly what you want
to be saying with the RDF, and what sort of questions you want to be
answering. The first part is up to you. I'd recommend thinking about
the data and constructing an english sentence that expresses what you
think the content of the data is. We can work from there on the
mechanics of translating it to RDF/OWL. To get some experience with
questions and how they are formulated in the demo, have a look at
http://esw.w3.org/topic/HCLS/HCLSIG_Demo_QueryScratch.
One thing to think about is that where as most standalone databases
and web sites integrate a lot of external information, in order to
supply their users with adequate information to interpret what they
are getting, in our scenario we are integrating the primary
resources, and so don't want to integrate other people's integrated
data. So thought should be given to what information a particular
source, for example Array Express, uniquely provides, and focus
attention on representing that information, with the expectation that
it will be able to reunited with the other data somewhere on the
Semantic Web.
Well, hope that helps get you started. This forum is a perfect place
to ask followup questions or bounce ideas off of people, so don't
hesitate to use it.
Regards,
Alan
On Aug 13, 2007, at 1:31 PM, Marco Brandizi wrote:
Eric Neumann wrote:
I'll also add that there were many (young) researchers wanting to
get involved in Semantic Web activities. I strongly encouraged
them to participate with HCLSIG and pointed them to our pages and
mailing list.
Hi all,
I'd like as well make my congratulations to Eric for his
presentation. I am one of those who expressed interest in
collaboration.
Eric, during his presentation briefly mentioned that it should be
relatively easy to "cook" some data one may have available in non-
RDF format, so that they may be integrated in the demo. My idea is
to experiment the export of gene expression data available in
public repositories (mainly ArrayExpress). At the moment I am
trying to review ISMB materials and I wonder if there are pointers,
on the wiki or somewhere else, about this point. Something like a
brief tutorial, that could guide me from choosing proper
ontologies which are already used by the demo, to using the
technology the demo is using too, to getting some simple result.
Thanks in advance for any help.
--
======================================================================
=========
Marco Brandizi <[EMAIL PROTECTED]>
NET Project - Software Engineer
http://www.ebi.ac.uk/net-project
European Bioinformatics Institute
Hinxton, CB10 1SD, United Kingdom
Tel.: +44 (0)1223 49 2613
Fax: +44 (0)1223 49 4468
http://www.ebi.ac.uk/~brandizi