Re: HCLS Demo at ISMB/ECCB, How to contribute to the demo?

Alan Ruttenberg Mon, 13 Aug 2007 19:17:58 -0700

[Helen, James: Marco asks about representing Array Express expressiondata for inclusion in the framework of our Semantic Web demo.]


Hi Marco,

This is definitely something we are interested in doing. There are anumber of aspects of this problem - what are you interested in - someexamples are:

1) Representing the information about the samples, experiment,protocols leading to the hybridization, technical aspects of thehybridization, etc.2) Representing what the computed intensity of the spots on an array,as well as how those were computed (e.g. MAS5, rma, d-chip, etc)3) Representing which genes are thought to be relatively highlyexpressed by interpreting the intensity of the spots as amount ofexpression of certain genes.

The first listed is one of the motivations for OBI (http://obi.sourceforge.net/) Helen Parkinson and James Malone work onmicroarray informatics and curation at the EBI. I'm ccing them onthis email.

I've split apart the second and third based on experience that saysthat the relationship between spot and gene is not alwaysstraightforward and is best represented as parts in therepresentation. For 2, aspects of the data treatment fall in thescope of OBI, and I recommend chatting with Helen/James about them.For other part there is some work on either representing orincorporating information about the probes (do I remember correctlythat affymetrix does have some RDF?), and then there is a choice ofhow to represent the numbers associated with each spot. At leastJonathan Rees and I could talk about that, although there are othershere as well.

Finally, for 3 there is the matter of recording which probes arebelieved to be associated with which genes, and then representing themapping. Our mapping of orthology might be a guide there. You couldchoose to represent all genes that have probes on the chip, or choosean approach more along the line that the Allen Institute for BrainScience uses - choosing some number of the most expressed genes, as asummary. AIBS has XML for such summaries, and one project that is onthe queue is to represent those. See, e.g., http://www.brain-map.org/mouse/gene/browserXml.html for the top level - http://www.brain-map.org/mouse/Hypothalamus/GeneExpression/1.xml list (along with abunch of auxiliary information) the top genes expressed in theHypothalamus.

A key part of the exercise is getting clear on exactly what you wantto be saying with the RDF, and what sort of questions you want to beanswering. The first part is up to you. I'd recommend thinking aboutthe data and constructing an english sentence that expresses what youthink the content of the data is. We can work from there on themechanics of translating it to RDF/OWL. To get some experience withquestions and how they are formulated in the demo, have a look athttp://esw.w3.org/topic/HCLS/HCLSIG_Demo_QueryScratch.

One thing to think about is that where as most standalone databasesand web sites integrate a lot of external information, in order tosupply their users with adequate information to interpret what theyare getting, in our scenario we are integrating the primaryresources, and so don't want to integrate other people's integrateddata. So thought should be given to what information a particularsource, for example Array Express, uniquely provides, and focusattention on representing that information, with the expectation thatit will be able to reunited with the other data somewhere on theSemantic Web.

Well, hope that helps get you started. This forum is a perfect placeto ask followup questions or bounce ideas off of people, so don'thesitate to use it.


Regards,
Alan


On Aug 13, 2007, at 1:31 PM, Marco Brandizi wrote:

Eric Neumann wrote:
I'll also add that there were many (young) researchers wanting toget involved in Semantic Web activities. I strongly encouragedthem to participate with HCLSIG and pointed them to our pages andmailing list.
Hi all,
I'd like as well make my congratulations to Eric for hispresentation. I am one of those who expressed interest incollaboration.
Eric, during his presentation briefly mentioned that it should berelatively easy to "cook" some data one may have available in non-RDF format, so that they may be integrated in the demo. My idea isto experiment the export of gene expression data available inpublic repositories (mainly ArrayExpress). At the moment I amtrying to review ISMB materials and I wonder if there are pointers,on the wiki or somewhere else, about this point. Something like abrief tutorial, that could guide me from choosing properontologies which are already used by the demo, to using thetechnology the demo is using too, to getting some simple result.
Thanks in advance for any help.


--
===============================================================================
Marco Brandizi <[EMAIL PROTECTED]>

NET Project - Software Engineer
http://www.ebi.ac.uk/net-project

European Bioinformatics Institute
Hinxton, CB10 1SD, United Kingdom
Tel.: +44 (0)1223 49 2613
Fax: +44 (0)1223 49 4468

http://www.ebi.ac.uk/~brandizi

Re: HCLS Demo at ISMB/ECCB, How to contribute to the demo?

Reply via email to