[base] summary of how the new core batch importers can be used for importing whole experiments

Bob MacCallum Mon, 05 Jan 2009 04:59:37 -0800

Just to say that I found this post from Micha a few months back very
useful/inspiring.


I'm able to import two-colour experiments quite quickly from three text files
(it might work with a single file but I didn't try it):

1. biosources to labeled extracts (all one-to-one information)
2. labeled extracts to hybs (two-to-one information)
3. hybs, scans, raw bioassays, raw data files (all one-to-one information)

Then the only manual steps are to inherit the relevant biosource's annotation
to the rawbioassay and to run the data import plugin for each raw bioassay.

Now that a project's default protocols are added by the batch importers
(2.9.x), I don't bother with these in the text files, which simplifies things
greatly.

Thanks!

Bob.


Micha Bayer writes:
 > Hi,
 > 
 > I thought it might be useful for others to have a step-by-step account
 > of how to import a whole experiment with the new core batch importers,
 > so here it goes:
 > 
 > My aim was to import the hybridization-related information for a whole
 > experiment in a way that would provide MIAME-compliance for this aspect
 > of the data. I also wanted to see if I can do this working from a single
 > spreadsheet (rather than having separate ones for each importer), and
 > one can, which is great.
 > 
 > The column headers of the spreadsheet I used looked like this in the
 > end:
 > 
 > RawBioAssay  FileName        ArrayName       ArrayBatch
 > ArraySlide   Platform        RawDataType     Scan    Hybridization
 > LabeledExtract       Dye     Extract Sample  BioSource
 > Protocol[image_analysis]     Protocol[scanning]
 > Protocol[hybridization]      Protocol[labeling]      Protocol[extraction]
 > Protocol[treatment]  StrainOrLine    Time
 > 
 > In this case the last two columns contained annotation (experimental
 > factors) specific to my experiment. 
 > 
 > Using this spreadsheet (and suitable import configs), I ran each of the
 > batch importers for BioSource, Sample, Extract, Labelled Extract,
 > Hybridization, Scan and Raw Bioassay, in this order. I had to first
 > manually create a project, protocols and an array design, but that's
 > fine since it is infrequent stuff, compared to the other entities. 
 > 
 > The last thing I did was to run the annotation batch importer on my new
 > raw bioassays, which works but is not ideal because of the lack of
 > inheritance from the appropriate entities upstream (this will be fixed
 > in BASE 2.9 though, see this thread:
 > http://www.mail-archive.com/basedb-users@lists.sourceforge.net/msg01596.
 > html). 
 > 
 > All in all the import of the hybs using the batch importers only takes
 > about 10 minutes -- that's getting very acceptable. Nice work, guys!! 
 > 
 > There is still some manual repetition involved but one could get round
 > this by writing a fairly simple plugin that just calls all the other
 > batch importers in turn. I'll add that to my TODO list but it might take
 > me some time to get round to this as I am snowed under with lots of
 > other stuff. 
 > 
 > Attached below is a more detailed point-by-point walk-through of what I
 > did. Hope this is of use. 
 > 
 > cheers 
 > Micha
 > 
 > 
 > 1.   Create all required protocols manually or check suitable
 > protocols already exist.
 > 2.   Create new array design manually or check suitable design
 > already exists.
 > 3.   Create a new project with default settings for platform, raw
 > data type, array design and protocols. These will be associated  with
 > all entities created from here on.
 > 4.   Set the new project active - this will make it the current
 > project.
 > 5.   Format your hybridization data as per example above and save as
 > tab delimited text. Make sure that the names of existing entities you
 > refer to in the spreadsheet match those in the database, if you are
 > planning on matching by name. Upload this file to BASE.
 > 6.   Upload raw data files to BASE and unzip in suitable directory
 > (if you want to have the files associated). (N.B. This example here does
 > not include storing raw data in the database) 
 > 7.   Create a suitable import configurations for each of the batch
 > importers - this only needs to be done once if the same spreadsheet
 > format is used for later imports. 
 > 8.   Batch-import all required entities by selecting the list view of
 > each of them in turn and running their respective batch import plugin
 > with the spreadsheet as input - import configs should be detected
 > automatically. It's best to stick to the following order:
 > 
 > a.   BioSource
 > b.   Sample
 > c.   Extract
 > d.   Labelled Extract
 > e.   Hybridization
 > f.   Scan
 > g.   Raw Bioassay
 > 
 > 9.   Select all newly created bioassays and Click "New
 > Experiment...". This will associate all selected bioassays with the new
 > experiment.
 > 
 > ANNOTATION
 > This is a (fairly dirty) temporary workaround which does not use
 > inheritance. From BASE 2.9 on it should be possible to use inheritance
 > with the mass annotation importer. 
 > 
 > 10.  Check that suitable annotation types (= experimental factors)
 > exist (Administrate -> Types -> Annotation Types) or create new ones
 > with names that match the entries in the spreadsheet. 
 > 11.  Batch-annotate all raw bioassays in the experiment. In the list
 > view of the Raw Bioassays, select "Import..." and then select the
 > Annotation Importer from the list of plugins available. A suitable
 > import config should be detected automatically. This will annotate each
 > RawBioassay with the appropriate factor value combination. 
 > 
 > 
 > ==================================
 > Dr Micha M Bayer
 > Bioinformatics Specialist
 > Genetics Programme
 > The Scottish Crop Research Institute
 > Invergowrie
 > Dundee
 > DD2 5DA
 > Scotland, UK
 > Telephone +44(0)1382 562731 ext. 2309
 > Fax +44(0)1382 562426
 > http://www.scri.ac.uk/staff/michabayer
 > ==================================
 >  
 > 
 > 
 > ______________________________________________________________________
 > SCRI, Invergowrie, Dundee, DD2 5DA.  
 > The Scottish Crop Research Institute is a charitable company limited by
 > guarantee. 
 > Registered in Scotland No: SC 29367.
 > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
 > 
 > 
 > DISCLAIMER:
 > 
 > This email is from the Scottish Crop Research Institute, but the views 
 > expressed by the sender are not necessarily the views of SCRI and its 
 > subsidiaries.  This email and any files transmitted with it are
 > confidential
 > 
 > to the intended recipient at the e-mail address to which it has been 
 > addressed.  It may not be disclosed or used by any other than that
 > addressee.
 > If you are not the intended recipient you are requested to preserve this
 > 
 > confidentiality and you must not use, disclose, copy, print or rely on
 > this 
 > e-mail in any way. Please notify postmas...@scri.ac.uk quoting the 
 > name of the sender and delete the email from your system.
 > 
 > Although SCRI has taken reasonable precautions to ensure no viruses are 
 > present in this email, neither the Institute nor the sender accepts any 
 > responsibility for any viruses, and it is your responsibility to scan
 > the email and the attachments (if any).
 > ______________________________________________________________________
 > 
 > -------------------------------------------------------------------------
 > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 > Build the coolest Linux based applications with Moblin SDK & win great prizes
 > Grand prize is a trip for two to an Open Source event anywhere in the world
 > http://moblin-contest.org/redirect.php?banner_id=100&url=/
 > _______________________________________________
 > The BASE general discussion mailing list
 > basedb-users@lists.sourceforge.net
 > unsubscribe: send a mail with subject "unsubscribe" to
 > basedb-users-requ...@lists.sourceforge.net

-- 
Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
Division of Cell and Molecular Biology | Imperial College London |
Phone +442075941945 | Email r.maccal...@imperial.ac.uk

------------------------------------------------------------------------------
_______________________________________________
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net

[base] summary of how the new core batch importers can be used for importing whole experiments

Reply via email to