Re: [base] Dealing wiht duplicate spots
On 2011-11-22 09.43, Nicklas Nordborg wrote: > On 2011-11-21 21:32, Nantel Andre wrote: > >> >> We don't have any "regular" users yet since we are still figuring out >> how it works. I am doing everything from my administrator account. > > Have you created a file format configuration for the files you are > using? This is done in Administrate -> Plug-ins& Extensions -> Plug-in > configurations. Create a new configuration for the 'Raw data importer' > plug-in and use the 'Test with file' function to get regular > expressions, etc. correct. > >> Anyway that's all the time I can give to this problem until later >> this week. This whole process has been much much more difficult than >> we expected. Please don't take this the wrong way but it might be a >> problem for your team as well if you ever hope to expand on your user >> base. It's not like we're new at this, we've been using microarrays >> since 1999 but I was getting tired of sending thousands of dollars to >> Agilent every year. > > Setting up a BASE server is not trivial. We have recently set up a new > BASE installation for a project here and I guess we have spent several > months just setting up lab procedures, data import procedures, and also > made some customizations to BASE in order to further streamline the data > entry. > > We don't have an explicit goal to get more users. We are primarily using > BASE at our own site to solve our own "problems". The Illumina extension > package is an example of that. Unfortunately, we don't have the > resources to develop things targeted for other platforms than what we > are using ourselves. This doesn't mean that BASE can't be used with > other platforms, but in order to get the most out of BASE one need to > invest time and maybe also resources for customization. In may > experience at least a week would be needed for initial testing and > prototyping and then maybe a couple of months for setting up a > production-ready server, formalizing lab and data handling procedures, > developing custom plug-ins and extensions, etc. Two internal large scale uses are outlined in the BASE 3 document http://base.thep.lu.se/chrome/site/latest/html/why_base.html To get the most of BASE you need to dedicate resources to customize BASE to fit your needs and to learn what BASE can do for you. BASE use is a process, you start small and then extend your use of BASE as appropriate. In our lab (well some projects not all admittedly) we collect all important information collected during labwork in BASE. Therefore BASE is an integral part when setting up labprocesses where we make minor changes to lab work where needed and create customizations in BASE that fit our needs. The illumina extension was mentioned by Nicklas, but the reggie extension, http://baseplugins.thep.lu.se/wiki/net.sf.basedb.reggie is the tool most appreciated by lab personnel. We track a lot of information about our samples and use the biomaterial LIMS extensively and reggie streamlines data input but also makes crosschecks and catches errors in data entry. Currently, we are working on defining our sequencing procedure in the lab and mirroring it in BASE. Our larger projects span over long time (years) and handles large sample sets. These projects use BASE to get organized, reduce errors, collect information that normally ends up in labbooks, to be able to share data, and as a analysis platform. And we have noticed improvements in labwork too :) Of course, our extensive BASE engagement is made easier because our PIs and funding agencies understand the benefits from organization and sharing of data/information. However, I want to emphasises, BASE is also usable for smaller groups and projects. One can ignore biomaterial and array LIMS, and directly create raw bioassays. Collect these into an experiment and head on to do analysis - The README for the affymetrix plug-in (http://baseplugins.thep.lu.se/wiki/se.lu.thep.affymetrix) outlines what needs to be done to get started with basically an empty BASE. The steps are general and can be adopted for other platforms. Cheers, Jari -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
On 2011-11-21 21:32, Nantel Andre wrote: > > We don't have any "regular" users yet since we are still figuring out > how it works. I am doing everything from my administrator account. Have you created a file format configuration for the files you are using? This is done in Administrate -> Plug-ins & Extensions -> Plug-in configurations. Create a new configuration for the 'Raw data importer' plug-in and use the 'Test with file' function to get regular expressions, etc. correct. > Anyway that's all the time I can give to this problem until later > this week. This whole process has been much much more difficult than > we expected. Please don't take this the wrong way but it might be a > problem for your team as well if you ever hope to expand on your user > base. It's not like we're new at this, we've been using microarrays > since 1999 but I was getting tired of sending thousands of dollars to > Agilent every year. Setting up a BASE server is not trivial. We have recently set up a new BASE installation for a project here and I guess we have spent several months just setting up lab procedures, data import procedures, and also made some customizations to BASE in order to further streamline the data entry. We don't have an explicit goal to get more users. We are primarily using BASE at our own site to solve our own "problems". The Illumina extension package is an example of that. Unfortunately, we don't have the resources to develop things targeted for other platforms than what we are using ourselves. This doesn't mean that BASE can't be used with other platforms, but in order to get the most out of BASE one need to invest time and maybe also resources for customization. In may experience at least a week would be needed for initial testing and prototyping and then maybe a couple of months for setting up a production-ready server, formalizing lab and data handling procedures, developing custom plug-ins and extensions, etc. /Nicklas -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
On 2011-11-21, at 2:51 PM, Nicklas Nordborg wrote: > On 2011-11-21 20:31, Nantel Andre wrote: >> On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote: >> >>> The demo data set is a GenePix data set and for each spot the file >>> contains the coordinates, reporter id and a lot of measured intensities >>> and other values. So if your data is similar to this, then I think you >>> don't need any special new importer. Use the generic "Raw data flat file >>> importer" and make sure to map spot coordinates, reporter id and >>> intensity columns that are in your data files. >>> >>> Maybe you can post a few lines from the files you are working with so I >>> don't have to guess... >> >> Here is our data format. We never got the generic importer working even >> after updating the raw-data-types.xml file to our version of the Imagene >> files. When we try to use that, we get a "no plug-in" error. > > This is usually due to lack of permission for the logged in user. Make > sure that the file format configuration has been shared properly. > > The file seems to be equivalent to and contain information that is very > similar to GenePix files. I don't see any reason why the generic raw > data importer shouldn't work in this case. > > We don't have any "regular" users yet since we are still figuring out how it works. I am doing everything from my administrator account. Anyway that's all the time I can give to this problem until later this week. This whole process has been much much more difficult than we expected. Please don't take this the wrong way but it might be a problem for your team as well if you ever hope to expand on your user base. It's not like we're new at this, we've been using microarrays since 1999 but I was getting tired of sending thousands of dollars to Agilent every year. Thanks, -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
On 2011-11-21 20:31, Nantel Andre wrote: > On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote: > >> The demo data set is a GenePix data set and for each spot the file >> contains the coordinates, reporter id and a lot of measured intensities >> and other values. So if your data is similar to this, then I think you >> don't need any special new importer. Use the generic "Raw data flat file >> importer" and make sure to map spot coordinates, reporter id and >> intensity columns that are in your data files. >> >> Maybe you can post a few lines from the files you are working with so I >> don't have to guess... > > Here is our data format. We never got the generic importer working even > after updating the raw-data-types.xml file to our version of the Imagene > files. When we try to use that, we get a "no plug-in" error. This is usually due to lack of permission for the logged in user. Make sure that the file format configuration has been shared properly. The file seems to be equivalent to and contain information that is very similar to GenePix files. I don't see any reason why the generic raw data importer shouldn't work in this case. /Nicklas -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote: > The demo data set is a GenePix data set and for each spot the file > contains the coordinates, reporter id and a lot of measured intensities > and other values. So if your data is similar to this, then I think you > don't need any special new importer. Use the generic "Raw data flat file > importer" and make sure to map spot coordinates, reporter id and > intensity columns that are in your data files. > > Maybe you can post a few lines from the files you are working with so I > don't have to guess... Here is our data format. We never got the generic importer working even after updating the raw-data-types.xml file to our version of the Imagene files. When we try to use that, we get a "no plug-in" error. Begin Header version 9.0. DateTue Nov 15 13:49:21 EST 2011 Image File C:\Documents and Settings\All Users\Documents\Andre\29 Avr\14261085_cy3.tif Page0 Page Name Invertedfalse Image File C:\Documents and Settings\All Users\Documents\Andre\29 Avr\14261085_cy5.tif Page0 Page Name Invertedfalse Begin Field Dimensions Field MetarowsMetacolsRowsCols A 12 4 16 18 End Field Dimensions Begin Measurement parameters Segmentation Method auto Signal Low 0.0 Signal High 0.0 Background Low 0.0 Background High 0.0 Background Buffer 2.0 Background Width5.0 End Measurement parameters End Header Begin Normalization Background measure Bckgr. Mean Correction method Local Sliding window false Take logtrue Log baseBase 2 Normalization type Lowess Scope SubGrid Smoothing 0.2 Using control spots Use all spots End Normalization Begin Raw Data Field MR MC Row Col.GeneID Annotation 1Flag N. Signal Mean, ch1 N. Signal Mean, ch2 N. Signal Median, ch1 N. Signal Median, ch2 N. Signal Mode, ch1 N. Signal Mode, ch2 A 1 1 1 1 orf19.5337 0 10.0631 10.2463 9.9099 9.9951 9.9646 9.9756 A 1 1 1 2 orf19.5337 0 9.1418 9.3703 9.1036 9.1544 9.2753 9.5736 A 1 1 1 3 orf19.5642 0 13.5228 13.3390 13.4454 13.2718 13.5748 13.3829 A 1 1 1 4 orf19.5642 0 12.3849 12.4358 12.3075 12.4157 12.3422 12.4249 A 1 1 1 5 orf19.3173 0 11.9102 11.9314 11.8002 11.8241 11.8276 11.8270 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
On 2011-11-21 19:23, Nantel Andre wrote: > That issue goes beyond the ImaGene format. When doing contact > printing of microarrays it is still fairly common to spot the same > probe 2 or more times as shown in the example here > (http://www.digitalapoptosis.com/2005/10/12/dna-microarray/). > > It is also very common for commercial arrays (Agilent for example) to > have multiple copies of their control spots. I wouldn't call that duplicate spots. They are different spots on different coordinates that just happen to have the same reporter/gene. The importer should import those as two separate entries and it is then up to down-stream analysis if they should be merged into a single value and what kind of average method to use for merging. > > We'll take a look at the Illumina plug-ins but I'm sure that there is > a more elegant solution. Does the ImaGene data file have spot coordinates in the data files or not? If not, then the Illumina case might be useful, but not otherwise. > What were are trying to do is, > theoretically, simple since our data is already > background-substracted and normalized (ImaGene does a perfectably > acceptable job in doing that). We have two collums with ch1 and ch2 > normalized intensities in Log2 and a Flag column showing spots that > have to be filtered out. We were hoping to use BASE to help up > organize our experiments before sending them off to MeV. > > In the Base2 demo server, the demoHyb1 bioassay is similar to our > situation. When I open that item and click on the "Raw data" tab, It > appears that the spot coordinate were imported and then used to > produce columns entitled [Rep] Name and [Rep] ID that clearly come > from duplicate spots. That's what we are trying to replicate. The demo data set is a GenePix data set and for each spot the file contains the coordinates, reporter id and a lot of measured intensities and other values. So if your data is similar to this, then I think you don't need any special new importer. Use the generic "Raw data flat file importer" and make sure to map spot coordinates, reporter id and intensity columns that are in your data files. Maybe you can post a few lines from the files you are working with so I don't have to guess... /Nicklas -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
That issue goes beyond the ImaGene format. When doing contact printing of microarrays it is still fairly common to spot the same probe 2 or more times as shown in the example here (http://www.digitalapoptosis.com/2005/10/12/dna-microarray/). It is also very common for commercial arrays (Agilent for example) to have multiple copies of their control spots. We'll take a look at the Illumina plug-ins but I'm sure that there is a more elegant solution. What were are trying to do is, theoretically, simple since our data is already background-substracted and normalized (ImaGene does a perfectably acceptable job in doing that). We have two collums with ch1 and ch2 normalized intensities in Log2 and a Flag column showing spots that have to be filtered out. We were hoping to use BASE to help up organize our experiments before sending them off to MeV. In the Base2 demo server, the demoHyb1 bioassay is similar to our situation. When I open that item and click on the "Raw data" tab, It appears that the spot coordinate were imported and then used to produce columns entitled [Rep] Name and [Rep] ID that clearly come from duplicate spots. That's what we are trying to replicate. Thanks, On 2011-11-21, at 12:36 PM, Nicklas Nordborg wrote: > On 2011-11-21 17:08, Nantel Andre wrote: >> Greetings, >> >> We are currently trying to integrate BASE into our lab and writing the >> necessary plug-ins to import normalized data from ImaGene. Let's just >> say that the lack of documentation has been "challenging". >> >> Right now we are trying to figure out how to deal with duplicate spots. >> From the GenePix examples in the Base2 server we see cases of raw >> biosassays wtih [raw] columns defined by the unique spot coordinates as >> well as “rep‘ columns containing the duplicated GeneID classifiers. Any >> suggestions on how to do this? > > What do you mean with a duplicate spots? Physically a spot is a spot and > there can of course only be one on a given position. However there is > nothing in BASE that prevents two or more spots from referencing the > same gene/reporter. Then there are some platforms (for example Illumina > BeadArrays) that doesn't have spots in the classical meaning. In this > case one need to construct a unique "feature id" for each "spot > equivalent" that one wants to measure. In the Illumina case, the unique > ID is provided by the "Illumicode" column in the data files. This value > is then mapped to gene/reporter annotations via a BGX file, and the > importer is also calculating means, etc for all entries with the same > "Illumicode" value. See > http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina for more > information about how the Illumina platform has been implemented. > > I don't know what kind of data files that are generated by the ImaGene > platform, so it is hard to advice on exactly how to the same thing for > ImaGene. > > /Nicklas > > -- > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > ___ > The BASE general discussion mailing list > basedb-users@lists.sourceforge.net > unsubscribe: send a mail with subject "unsubscribe" to > basedb-users-requ...@lists.sourceforge.net -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] Dealing wiht duplicate spots
On 2011-11-21 17:08, Nantel Andre wrote: > Greetings, > > We are currently trying to integrate BASE into our lab and writing the > necessary plug-ins to import normalized data from ImaGene. Let's just > say that the lack of documentation has been "challenging". > > Right now we are trying to figure out how to deal with duplicate spots. > From the GenePix examples in the Base2 server we see cases of raw > biosassays wtih [raw] columns defined by the unique spot coordinates as > well as “rep‘ columns containing the duplicated GeneID classifiers. Any > suggestions on how to do this? What do you mean with a duplicate spots? Physically a spot is a spot and there can of course only be one on a given position. However there is nothing in BASE that prevents two or more spots from referencing the same gene/reporter. Then there are some platforms (for example Illumina BeadArrays) that doesn't have spots in the classical meaning. In this case one need to construct a unique "feature id" for each "spot equivalent" that one wants to measure. In the Illumina case, the unique ID is provided by the "Illumicode" column in the data files. This value is then mapped to gene/reporter annotations via a BGX file, and the importer is also calculating means, etc for all entries with the same "Illumicode" value. See http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina for more information about how the Illumina platform has been implemented. I don't know what kind of data files that are generated by the ImaGene platform, so it is hard to advice on exactly how to the same thing for ImaGene. /Nicklas -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
[base] Dealing wiht duplicate spots
Greetings, We are currently trying to integrate BASE into our lab and writing the necessary plug-ins to import normalized data from ImaGene. Let's just say that the lack of documentation has been "challenging". Right now we are trying to figure out how to deal with duplicate spots. From the GenePix examples in the Base2 server we see cases of raw biosassays wtih [raw] columns defined by the unique spot coordinates as well as “rep‘ columns containing the duplicated GeneID classifiers. Any suggestions on how to do this? Thanks, André Nantel, M.Sc., Ph.D. Senior Research Officer and Adjunct Professor Project Manager, Microarray Lab Biotechnology Research Institute National Research Council of Canada 6100 Royalmount Montreal, QC Canada H4P 2R2 andre.nan...@nrc-cnrc.gc.ca (514) 496-6370 http://www.nrc-cnrc.gc.ca/eng/services/bri/microarray.html -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net