Re: [base] high-throughput transcriptome sequencing data
There is a cc field for tickets. This will generate email to the listed mail addresses when the ticket changes but the drawback is that the mail addresses are visible for everyone. Jari Nicklas Nordborg wrote: > Bob MacCallum wrote: >> Is there any way for me to subscribe to this ticket (get email updates when >> changed)? > > If you have a RSS enabled mail client/browser/other program, you can use > the "RSS feed" link at the bottom of that page. > > /Nicklas > > - > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ___ > The BASE general discussion mailing list > basedb-users@lists.sourceforge.net > unsubscribe: send a mail with subject "unsubscribe" to > [EMAIL PROTECTED] - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]
Re: [base] high-throughput transcriptome sequencing data
Bob MacCallum wrote: > Is there any way for me to subscribe to this ticket (get email updates when > changed)? If you have a RSS enabled mail client/browser/other program, you can use the "RSS feed" link at the bottom of that page. /Nicklas - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]
Re: [base] high-throughput transcriptome sequencing data
Hi again, I saved you the trouble of cutting and pasting to make a new ticket: Handling short read transcript sequence data http://base.thep.lu.se/ticket/1153 Is there any way for me to subscribe to this ticket (get email updates when changed)? cheers, Bob. -- Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups | Division of Cell and Molecular Biology | Imperial College London | Phone +442075941945 | Email [EMAIL PROTECTED] - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]
Re: [base] high-throughput transcriptome sequencing data
Hi Jari, Thanks for updating me/the list on this. Jari Häkkinen writes: > Hi Bob, > > Sorry for not commenting your thoughs on storing sequencing data in > BASE. We have discussed it here and will be looking into it later. This > will become an issue for us also since we have sequencing equipment > generating huge quantities of data. So far we are only getting friendly > with the new machine. > > I think we should add your thoughts to a ticket for further discussions. > I hope we can be more active on this issue later this autumn. How urgent > is this issue for you? Not really so urgent. We are writing a contract renewal (August2009-July2014) proposal due just after Christmas and so need to know roughly what's technically possible. I'd be happy to be involved in your discussions and offer some development time (most likely during the new contract). To start with there will probably be high throughput transcript sequencing for improving gene models, which doesn't really need to touch BASE. I am then just assuming that sequence data might usurp arrays for expression studies. > Regarding SNP-data and other arrays with huge quantities of information. > We have decided to avoid storing this sort of data in the database > tables, it would probably cripple BASE. We have prepared BASE during the > last releases for storing raw data in files instead of the raw tables, > and also to store data in files also when analysing the data, i.e. in > the analysis tree. However, there is no plug-ins that take advantages of > these new features but they will appear. Ah, thanks, I didn't realise the analysis tree could handle files too. > We are not currenlty working with Affymetrix SNP data wrt BASE. However, > we now have the Affymetrix platform available at our department and may > soon face the challenges of getting that data into BASE (it has not been > decided to store that data in BASE yet). Maybe the Uppsala people has > something on this, http://madr.lcb.uu.se/ ? > > On our side we are interested in getting Illumina SNP data into BASE and > have slow progress towards realising it but we expect that to appear > during the winter. We are supposed to write a specification on how we > want to see this in BASE but there is very little written so far. > I'll see what we can do if/when we have some Affy SNP data in the coming months. I imagine file-based analysis plugins are not too difficult to implement. many thanks again. cheers, Bob. > > Bob MacCallum wrote: > > Hi again. > > > > Any thoughts on this (see below) at all? Please reply off-list if you are > > feeling shy. > > > > I'd also like to raise some general questions about scalability. > > > > 1. Is anyone working on an Affymetrix SNP plugin? > > > > 2. Is anyone doing anything with tiling arrays? > > > > I realise that archiving the .CEL files is no problem. Using BASE to run > > analysis programs on those files is possible through plugins. But storing > > per-feature data in the analysis tables is going to break, when you have > > millions of features, right? > > > > cheers, > > Bob. > > > > Bob MacCallum writes: > > > > > > I'm just thinking out loud about how to incorporate high throughput > > > transcriptome sequencing data into BASE. It's some way off, but I'm > > assuming > > > that it will be cheap and quantitative enough to replace arrays at some > > point > > > during the renewal period of our project (2009-2014). > > > > > > 1. Create an "array design" with all genes of interest (ideally this > > would be > > >the largest set possible, e.g. known genes + predicted genes of all > > >qualities, perhaps even predicted genes from the new sequence data). > > The > > >layout would be fictitious, of course (what's the minimum one can > > get away > > >with?). > > > > > > 2. Create a rawbioassay to correspond to each sequencing run. > > > > > > Then *one* of 3a/b/c for each sequencing run/rawbioassay: > > > > > > 3a. Outside BASE, align the new sequences to genome or transcript > > sequences > > > and calculate "intensities" for each gene on the "array design" and > > dump > > > into a tab delimited raw data file. Attach that file to the > > rawbioassay > > > and import numeric data as usual. > > > > > > 3b. Upload the text file of sequences to the raw bioassay's "data file". > > > Create a BASE plugin to do the the alignment and quantification as > > in 3a, > > > and load the numeric data into the database. > > > > > > 3c. Similar to 3b, but calculate the intensities at the "create root > > bioassay" > > > step, similar to the Affymetrix RMA plugin. > > > > > > 4. continue with analysis as normal. biosources, samples etc can be > > linked to > > >the bioassay too, of course. > > > > > > I guess a new raw data type (for "Gen
Re: [base] high-throughput transcriptome sequencing data
Hi Bob, Sorry for not commenting your thoughs on storing sequencing data in BASE. We have discussed it here and will be looking into it later. This will become an issue for us also since we have sequencing equipment generating huge quantities of data. So far we are only getting friendly with the new machine. I think we should add your thoughts to a ticket for further discussions. I hope we can be more active on this issue later this autumn. How urgent is this issue for you? Regarding SNP-data and other arrays with huge quantities of information. We have decided to avoid storing this sort of data in the database tables, it would probably cripple BASE. We have prepared BASE during the last releases for storing raw data in files instead of the raw tables, and also to store data in files also when analysing the data, i.e. in the analysis tree. However, there is no plug-ins that take advantages of these new features but they will appear. We are not currenlty working with Affymetrix SNP data wrt BASE. However, we now have the Affymetrix platform available at our department and may soon face the challenges of getting that data into BASE (it has not been decided to store that data in BASE yet). Maybe the Uppsala people has something on this, http://madr.lcb.uu.se/ ? On our side we are interested in getting Illumina SNP data into BASE and have slow progress towards realising it but we expect that to appear during the winter. We are supposed to write a specification on how we want to see this in BASE but there is very little written so far. If you are getting confused when I talk about experimental equipment at our department it is natural. I am at the Dept. of Oncology now but I am still using my old mailing address in this list. Cheers, Jari Bob MacCallum wrote: > Hi again. > > Any thoughts on this (see below) at all? Please reply off-list if you are > feeling shy. > > I'd also like to raise some general questions about scalability. > > 1. Is anyone working on an Affymetrix SNP plugin? > > 2. Is anyone doing anything with tiling arrays? > > I realise that archiving the .CEL files is no problem. Using BASE to run > analysis programs on those files is possible through plugins. But storing > per-feature data in the analysis tables is going to break, when you have > millions of features, right? > > cheers, > Bob. > > Bob MacCallum writes: > > > > I'm just thinking out loud about how to incorporate high throughput > > transcriptome sequencing data into BASE. It's some way off, but I'm > assuming > > that it will be cheap and quantitative enough to replace arrays at some > point > > during the renewal period of our project (2009-2014). > > > > 1. Create an "array design" with all genes of interest (ideally this would > be > >the largest set possible, e.g. known genes + predicted genes of all > >qualities, perhaps even predicted genes from the new sequence data). > The > >layout would be fictitious, of course (what's the minimum one can get > away > >with?). > > > > 2. Create a rawbioassay to correspond to each sequencing run. > > > > Then *one* of 3a/b/c for each sequencing run/rawbioassay: > > > > 3a. Outside BASE, align the new sequences to genome or transcript sequences > > and calculate "intensities" for each gene on the "array design" and > dump > > into a tab delimited raw data file. Attach that file to the > rawbioassay > > and import numeric data as usual. > > > > 3b. Upload the text file of sequences to the raw bioassay's "data file". > > Create a BASE plugin to do the the alignment and quantification as in > 3a, > > and load the numeric data into the database. > > > > 3c. Similar to 3b, but calculate the intensities at the "create root > bioassay" > > step, similar to the Affymetrix RMA plugin. > > > > 4. continue with analysis as normal. biosources, samples etc can be > linked to > >the bioassay too, of course. > > > > I guess a new raw data type (for "Generic" platform) would have to be > > created for 3a (and 3b?) but that's not difficult. > > > > Is it possible to go with 3a, but also attach the sequence file to the raw > > bioassay (or scan?) - something like keeping tiff files for scans? Just > for > > documentation purposes. > > > > Any thoughts from the community or developers? > > > > cheers, > > Bob. > > > > -- > > Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups | > > Division of Cell and Molecular Biology | Imperial College London | > > Phone +442075941945 | Email [EMAIL PROTECTED] > > > > - > > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > > Build the coolest Linux based applications with Moblin SDK & win great > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > htt
[base] high-throughput transcriptome sequencing data
Hi again. Any thoughts on this (see below) at all? Please reply off-list if you are feeling shy. I'd also like to raise some general questions about scalability. 1. Is anyone working on an Affymetrix SNP plugin? 2. Is anyone doing anything with tiling arrays? I realise that archiving the .CEL files is no problem. Using BASE to run analysis programs on those files is possible through plugins. But storing per-feature data in the analysis tables is going to break, when you have millions of features, right? cheers, Bob. Bob MacCallum writes: > > I'm just thinking out loud about how to incorporate high throughput > transcriptome sequencing data into BASE. It's some way off, but I'm assuming > that it will be cheap and quantitative enough to replace arrays at some point > during the renewal period of our project (2009-2014). > > 1. Create an "array design" with all genes of interest (ideally this would be >the largest set possible, e.g. known genes + predicted genes of all >qualities, perhaps even predicted genes from the new sequence data). The >layout would be fictitious, of course (what's the minimum one can get away >with?). > > 2. Create a rawbioassay to correspond to each sequencing run. > > Then *one* of 3a/b/c for each sequencing run/rawbioassay: > > 3a. Outside BASE, align the new sequences to genome or transcript sequences > and calculate "intensities" for each gene on the "array design" and dump > into a tab delimited raw data file. Attach that file to the rawbioassay > and import numeric data as usual. > > 3b. Upload the text file of sequences to the raw bioassay's "data file". > Create a BASE plugin to do the the alignment and quantification as in 3a, > and load the numeric data into the database. > > 3c. Similar to 3b, but calculate the intensities at the "create root > bioassay" > step, similar to the Affymetrix RMA plugin. > > 4. continue with analysis as normal. biosources, samples etc can be linked > to >the bioassay too, of course. > > I guess a new raw data type (for "Generic" platform) would have to be > created for 3a (and 3b?) but that's not difficult. > > Is it possible to go with 3a, but also attach the sequence file to the raw > bioassay (or scan?) - something like keeping tiff files for scans? Just for > documentation purposes. > > Any thoughts from the community or developers? > > cheers, > Bob. > > -- > Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups | > Division of Cell and Molecular Biology | Imperial College London | > Phone +442075941945 | Email [EMAIL PROTECTED] > > - > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ___ > The BASE general discussion mailing list > basedb-users@lists.sourceforge.net > unsubscribe: send a mail with subject "unsubscribe" to > [EMAIL PROTECTED] -- Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups | Division of Cell and Molecular Biology | Imperial College London | Phone +442075941945 | Email [EMAIL PROTECTED] - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]
[base] high-throughput transcriptome sequencing data
I'm just thinking out loud about how to incorporate high throughput transcriptome sequencing data into BASE. It's some way off, but I'm assuming that it will be cheap and quantitative enough to replace arrays at some point during the renewal period of our project (2009-2014). 1. Create an "array design" with all genes of interest (ideally this would be the largest set possible, e.g. known genes + predicted genes of all qualities, perhaps even predicted genes from the new sequence data). The layout would be fictitious, of course (what's the minimum one can get away with?). 2. Create a rawbioassay to correspond to each sequencing run. Then *one* of 3a/b/c for each sequencing run/rawbioassay: 3a. Outside BASE, align the new sequences to genome or transcript sequences and calculate "intensities" for each gene on the "array design" and dump into a tab delimited raw data file. Attach that file to the rawbioassay and import numeric data as usual. 3b. Upload the text file of sequences to the raw bioassay's "data file". Create a BASE plugin to do the the alignment and quantification as in 3a, and load the numeric data into the database. 3c. Similar to 3b, but calculate the intensities at the "create root bioassay" step, similar to the Affymetrix RMA plugin. 4. continue with analysis as normal. biosources, samples etc can be linked to the bioassay too, of course. I guess a new raw data type (for "Generic" platform) would have to be created for 3a (and 3b?) but that's not difficult. Is it possible to go with 3a, but also attach the sequence file to the raw bioassay (or scan?) - something like keeping tiff files for scans? Just for documentation purposes. Any thoughts from the community or developers? cheers, Bob. -- Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups | Division of Cell and Molecular Biology | Imperial College London | Phone +442075941945 | Email [EMAIL PROTECTED] - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]