Re: [base] high-throughput transcriptome sequencing data

2008-10-22 Thread Bob MacCallum

Hi again,

I saved you the trouble of cutting and pasting to make a new ticket:

Handling short read transcript sequence data
http://base.thep.lu.se/ticket/1153

Is there any way for me to subscribe to this ticket (get email updates when
changed)?

cheers,
Bob.

-- 
Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
Division of Cell and Molecular Biology | Imperial College London |
Phone +442075941945 | Email [EMAIL PROTECTED]

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


Re: [base] high-throughput transcriptome sequencing data

2008-10-21 Thread Jari Häkkinen
Hi Bob,

Sorry for not commenting your thoughs on storing sequencing data in 
BASE. We have discussed it here and will be looking into it later. This 
will become an issue for us also since we have sequencing equipment 
generating huge quantities of data. So far we are only getting friendly 
with the new machine.

I think we should add your thoughts to a ticket for further discussions. 
I hope we can be more active on this issue later this autumn. How urgent 
is this issue for you?

Regarding SNP-data and other arrays with huge quantities of information. 
We have decided to avoid storing this sort of data in the database 
tables, it would probably cripple BASE. We have prepared BASE during the 
last releases for storing raw data in files instead of the raw tables, 
and also to store data in files also when analysing the data, i.e. in 
the analysis tree. However, there is no plug-ins that take advantages of 
these new features but they will appear.

We are not currenlty working with Affymetrix SNP data wrt BASE. However, 
we now have the Affymetrix platform available at our department and may 
soon face the challenges of getting that data into BASE (it has not been 
decided to store that data in BASE yet). Maybe the Uppsala people has 
something on this, http://madr.lcb.uu.se/ ?

On our side we are interested in getting Illumina SNP data into BASE and 
have slow progress towards realising it but we expect that to appear 
during the winter. We are supposed to write a specification on how we 
want to see this in BASE but there is very little written so far.

If you are getting confused when I talk about experimental equipment at 
our department it is natural. I am at the Dept. of Oncology now but I am 
still using my old mailing address in this list.


Cheers,

Jari

Bob MacCallum wrote:
 Hi again.
 
 Any thoughts on this (see below) at all?  Please reply off-list if you are
 feeling shy.
 
 I'd also like to raise some general questions about scalability.
 
 1. Is anyone working on an Affymetrix SNP plugin?
 
 2. Is anyone doing anything with tiling arrays?
 
 I realise that archiving the .CEL files is no problem.  Using BASE to run
 analysis programs on those files is possible through plugins.  But storing
 per-feature data in the analysis tables is going to break, when you have
 millions of features, right?
 
 cheers,
 Bob.
 
 Bob MacCallum writes:
   
   I'm just thinking out loud about how to incorporate high throughput
   transcriptome sequencing data into BASE.  It's some way off, but I'm 
 assuming
   that it will be cheap and quantitative enough to replace arrays at some 
 point
   during the renewal period of our project (2009-2014).
   
   1. Create an array design with all genes of interest (ideally this would 
 be
  the largest set possible, e.g. known genes + predicted genes of all
  qualities, perhaps even predicted genes from the new sequence data).  
 The
  layout would be fictitious, of course (what's the minimum one can get 
 away
  with?).
   
   2. Create a rawbioassay to correspond to each sequencing run.
   
   Then *one* of 3a/b/c for each sequencing run/rawbioassay:
   
   3a. Outside BASE, align the new sequences to genome or transcript sequences
   and calculate intensities for each gene on the array design and 
 dump
   into a tab delimited raw data file.  Attach that file to the 
 rawbioassay
   and import numeric data as usual.
   
   3b. Upload the text file of sequences to the raw bioassay's data file.
   Create a BASE plugin to do the the alignment and quantification as in 
 3a,
   and load the numeric data into the database.
   
   3c. Similar to 3b, but calculate the intensities at the create root 
 bioassay
   step, similar to the Affymetrix RMA plugin.
   
   4. continue with analysis as normal.  biosources, samples etc can be 
 linked to
  the bioassay too, of course.
   
   I guess a new raw data type (for Generic platform) would have to be
   created for 3a (and 3b?) but that's not difficult.
   
   Is it possible to go with 3a, but also attach the sequence file to the raw
   bioassay (or scan?) - something like keeping tiff files for scans?  Just 
 for
   documentation purposes.
   
   Any thoughts from the community or developers?
   
   cheers,
   Bob.
   
   -- 
   Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
   Division of Cell and Molecular Biology | Imperial College London |
   Phone +442075941945 | Email [EMAIL PROTECTED]
   
   -
   This SF.Net email is sponsored by the Moblin Your Move Developer's 
 challenge
   Build the coolest Linux based applications with Moblin SDK  win great 
 prizes
   Grand prize is a trip for two to an Open Source event anywhere in the world
   http://moblin-contest.org/redirect.php?banner_id=100url=/
   ___
   The BASE general discussion mailing list