On Wed, Jan 6, 2010 at 11:08 AM, Helena Deus <[email protected]> wrote:
> Hi Jim, > > This is great! I noticed you already add the links both to the raw data > files and the processed data files, am I right in assuming this data comes > from the SDRF? > Yes, these are comments embedded in SDRF, and the nodes for those files are explicitly mentioned in SDRF too. > I see you intergrated the MGED ontology with the data nicelly, have you > attempted a few SPARQL queries, for example, retrieve all raw data files > from "mged:arabidopsis_thaliana"? > I haven't yet tried any SPARQL queries like that, but that was the goal of handling the Terms and Term Sources the way I did. Also, I noticed that in your ontology you don't separate each sample > hydridization raw file, probably because they are all distributed in the ftp > as a compressed folder. For example, I see that inside raw data file archive > "E-MEXP-986.raw.1.zip" there are 4 text files: > 1d1S15.txt.txt, 2d1S15.txt.txt, 2d1S22.txt.txt and 4d1S22.txt.txt. Since > it's possible to add a link from a Sample to each of these .txt files, do > you think it would be useful to add this information in the raw rdf file? > Other SDRF files may link directly to a file (the ones that I've written do), so in my mind it's a matter of GIGO. I don't currently go beyond what is in the IDF and SDRF (in other words, what's being parsed by Limpopo), and I'm trying to keep second-guessing to a minimum. One thing I hope this tool exposes is the effects of certain kinds of curation on the available data structures, and maybe some best practices can come out of it. Jim -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology Informatics Yale School of Medicine [email protected] | (203) 785-6330 http://krauthammerlab.med.yale.edu PhD Student Tetherless World Constellation Rensselaer Polytechnic Institute [email protected] http://tw.rpi.edu
