Re: [base] Dealing wiht duplicate spots

2011-11-22 Thread Nicklas Nordborg
On 2011-11-21 21:32, Nantel Andre wrote:


 We don't have any regular users yet since we are still figuring out
 how it works. I am doing everything from my administrator account.

Have you created a file format configuration for the files you are 
using? This is done in Administrate - Plug-ins  Extensions - Plug-in 
configurations. Create a new configuration for the 'Raw data importer' 
plug-in and use the 'Test with file' function to get regular 
expressions, etc. correct.

 Anyway that's all the time I can give to this problem until later
 this week. This whole process has been much much more difficult than
 we expected. Please don't take this the wrong way but it might be a
 problem for your team as well if you ever hope to expand on your user
 base. It's not like we're new at this, we've been using microarrays
 since 1999 but I was getting tired of sending thousands of dollars to
 Agilent every year.

Setting up a BASE server is not trivial. We have recently set up a new 
BASE installation for a project here and I guess we have spent several 
months just setting up lab procedures, data import procedures, and also 
made some customizations to BASE in order to further streamline the data 
entry.

We don't have an explicit goal to get more users. We are primarily using 
BASE at our own site to solve our own problems. The Illumina extension 
package is an example of that. Unfortunately, we don't have the 
resources to develop things targeted for other platforms than what we 
are using ourselves. This doesn't mean that BASE can't be used with 
other platforms, but in order to get the most out of BASE one need to 
invest time and maybe also resources for customization. In may 
experience at least a week would be needed for initial testing and 
prototyping and then maybe a couple of months for setting up a 
production-ready server, formalizing lab and data handling procedures, 
developing custom plug-ins and extensions, etc.

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-22 Thread Jari Häkkinen
On 2011-11-22 09.43, Nicklas Nordborg wrote:
 On 2011-11-21 21:32, Nantel Andre wrote:


 We don't have any regular users yet since we are still figuring out
 how it works. I am doing everything from my administrator account.

 Have you created a file format configuration for the files you are
 using? This is done in Administrate -  Plug-ins  Extensions -  Plug-in
 configurations. Create a new configuration for the 'Raw data importer'
 plug-in and use the 'Test with file' function to get regular
 expressions, etc. correct.

 Anyway that's all the time I can give to this problem until later
 this week. This whole process has been much much more difficult than
 we expected. Please don't take this the wrong way but it might be a
 problem for your team as well if you ever hope to expand on your user
 base. It's not like we're new at this, we've been using microarrays
 since 1999 but I was getting tired of sending thousands of dollars to
 Agilent every year.

 Setting up a BASE server is not trivial. We have recently set up a new
 BASE installation for a project here and I guess we have spent several
 months just setting up lab procedures, data import procedures, and also
 made some customizations to BASE in order to further streamline the data
 entry.

 We don't have an explicit goal to get more users. We are primarily using
 BASE at our own site to solve our own problems. The Illumina extension
 package is an example of that. Unfortunately, we don't have the
 resources to develop things targeted for other platforms than what we
 are using ourselves. This doesn't mean that BASE can't be used with
 other platforms, but in order to get the most out of BASE one need to
 invest time and maybe also resources for customization. In may
 experience at least a week would be needed for initial testing and
 prototyping and then maybe a couple of months for setting up a
 production-ready server, formalizing lab and data handling procedures,
 developing custom plug-ins and extensions, etc.


Two internal large scale uses are outlined in the BASE 3 document 
http://base.thep.lu.se/chrome/site/latest/html/why_base.html

To get the most of BASE you need to dedicate resources to customize BASE 
to fit your needs and to learn what BASE can do for you.

BASE use is a process, you start small and then extend your use of BASE 
as appropriate. In our lab (well some projects not all admittedly) we 
collect all important information collected during labwork in BASE. 
Therefore BASE is an integral part when setting up labprocesses where we 
make minor changes to lab work where needed and create customizations in 
BASE that fit our needs. The illumina extension was mentioned by 
Nicklas, but the reggie extension, 
http://baseplugins.thep.lu.se/wiki/net.sf.basedb.reggie is the tool most 
appreciated by lab personnel. We track a lot of information about our 
samples and use the biomaterial LIMS extensively and reggie streamlines 
data input but also makes crosschecks and catches errors in data entry. 
Currently, we are working on defining our sequencing procedure in the 
lab and mirroring it in BASE.

Our larger projects span over long time (years) and handles large sample 
sets. These projects use BASE to get organized, reduce errors, collect 
information that normally ends up in labbooks, to be able to share data, 
and as a analysis platform. And we have noticed improvements in labwork 
too :)

Of course, our extensive BASE engagement is made easier because our PIs 
and funding agencies understand the benefits from organization and 
sharing of data/information.


However, I want to emphasises, BASE is also usable for smaller groups 
and projects. One can ignore biomaterial and array LIMS, and directly 
create raw bioassays. Collect these into an experiment and head on to do 
analysis - The README for the affymetrix plug-in 
(http://baseplugins.thep.lu.se/wiki/se.lu.thep.affymetrix) outlines what 
needs to be done to get started with basically an empty BASE. The steps 
are general and can be adopted for other platforms.


Cheers,

Jari

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nicklas Nordborg
On 2011-11-21 17:08, Nantel Andre wrote:
 Greetings,

 We are currently trying to integrate BASE into our lab and writing the
 necessary plug-ins to import normalized data from ImaGene. Let's just
 say that the lack of documentation has been challenging.

 Right now we are trying to figure out how to deal with duplicate spots.
  From the GenePix examples in the Base2 server we see cases of raw
 biosassays wtih [raw] columns defined by the unique spot coordinates as
 well as “rep‘ columns containing the duplicated GeneID classifiers. Any
 suggestions on how to do this?

What do you mean with a duplicate spots? Physically a spot is a spot and 
there can of course only be one on a given position. However there is 
nothing in BASE that prevents two or more spots from referencing the 
same gene/reporter. Then there are some platforms (for example Illumina 
BeadArrays) that doesn't have spots in the classical meaning. In this 
case one need to construct a unique feature id for each spot 
equivalent that one wants to measure. In the Illumina case, the unique 
ID is provided by the Illumicode column in the data files. This value 
is then mapped to gene/reporter annotations via a BGX file, and the 
importer is also calculating means, etc for all entries with the same 
Illumicode value. See 
http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina for more 
information about how the Illumina platform has been implemented.

I don't know what kind of data files that are generated by the ImaGene 
platform, so it is hard to advice on exactly how to the same thing for 
ImaGene.

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nicklas Nordborg
On 2011-11-21 19:23, Nantel Andre wrote:
 That issue goes beyond the ImaGene format. When doing contact
 printing of microarrays it is still fairly common to spot the same
 probe 2 or more times as shown in the example here
 (http://www.digitalapoptosis.com/2005/10/12/dna-microarray/).

 It is also very common for commercial arrays (Agilent for example) to
 have multiple copies of their control spots.

I wouldn't call that duplicate spots. They are different spots on 
different coordinates that just happen to have the same reporter/gene. 
The importer should import those as two separate entries and it is then 
up to down-stream analysis if they should be merged into a single value 
and what kind of average method to use for merging.


 We'll take a look at the Illumina plug-ins but I'm sure that there is
 a more elegant solution.

Does the ImaGene data file have spot coordinates in the data files or 
not? If not, then the Illumina case might be useful, but not otherwise.

  What were are trying to do is,
 theoretically, simple since our data is already
 background-substracted and normalized (ImaGene does a perfectably
 acceptable job in doing that). We have two collums with ch1 and ch2
 normalized intensities in Log2 and a Flag column showing spots that
 have to be filtered out. We were hoping to use BASE to help up
 organize our experiments before sending them off to MeV.

 In the Base2 demo server, the demoHyb1 bioassay is similar to our
 situation. When I open that item and click on the Raw data tab, It
 appears  that the spot coordinate were imported and then used to
 produce columns entitled [Rep] Name and [Rep] ID that clearly come
 from duplicate spots. That's what we are trying to replicate.

The demo data set is a GenePix data set and for each spot the file 
contains the coordinates, reporter id and a lot of measured intensities 
and other values. So if your data is similar to this, then I think you 
don't need any special new importer. Use the generic Raw data flat file 
importer and make sure to map spot coordinates, reporter id and 
intensity columns that are in your data files.

Maybe you can post a few lines from the files you are working with so I 
don't have to guess...

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nantel Andre
On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote:

 The demo data set is a GenePix data set and for each spot the file 
 contains the coordinates, reporter id and a lot of measured intensities 
 and other values. So if your data is similar to this, then I think you 
 don't need any special new importer. Use the generic Raw data flat file 
 importer and make sure to map spot coordinates, reporter id and 
 intensity columns that are in your data files.
 
 Maybe you can post a few lines from the files you are working with so I 
 don't have to guess...

Here is our data format. We never got the generic importer working even after 
updating the raw-data-types.xml file to our version of the Imagene files. When 
we try to use that, we get a no plug-in error. 

Begin Header
version 9.0.
DateTue Nov 15 13:49:21 EST 2011
Image File  C:\Documents and Settings\All Users\Documents\Andre\29 
Avr\14261085_cy3.tif
Page0
Page Name   
Invertedfalse
Image File  C:\Documents and Settings\All Users\Documents\Andre\29 
Avr\14261085_cy5.tif
Page0
Page Name   
Invertedfalse
Begin Field Dimensions
Field   MetarowsMetacolsRowsCols
A   12  4   16  18
End Field Dimensions
Begin Measurement parameters
Segmentation Method auto
Signal Low  0.0
Signal High 0.0
Background Low  0.0
Background High 0.0
Background Buffer   2.0
Background Width5.0
End Measurement parameters
End Header
Begin Normalization
Background measure  Bckgr. Mean
Correction method   Local
Sliding window  false
Take logtrue
Log baseBase 2
Normalization type  Lowess
Scope   SubGrid
Smoothing   0.2
Using control spots Use all spots
End Normalization
Begin Raw Data
Field   MR  MC  Row Col.GeneID  Annotation 1Flag
N. Signal Mean, ch1 N. Signal Mean, ch2 N. Signal Median, ch1   N. 
Signal Median, ch2   N. Signal Mode, ch1 N. Signal Mode, ch2
A   1   1   1   1   orf19.5337  0   
10.0631 10.2463 9.9099  9.9951  9.9646  9.9756
A   1   1   1   2   orf19.5337  0   
9.1418  9.3703  9.1036  9.1544  9.2753  9.5736
A   1   1   1   3   orf19.5642  0   
13.5228 13.3390 13.4454 13.2718 13.5748 13.3829
A   1   1   1   4   orf19.5642  0   
12.3849 12.4358 12.3075 12.4157 12.3422 12.4249
A   1   1   1   5   orf19.3173  0   
11.9102 11.9314 11.8002 11.8241 11.8276 11.8270


--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nicklas Nordborg
On 2011-11-21 20:31, Nantel Andre wrote:
 On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote:

 The demo data set is a GenePix data set and for each spot the file
 contains the coordinates, reporter id and a lot of measured intensities
 and other values. So if your data is similar to this, then I think you
 don't need any special new importer. Use the generic Raw data flat file
 importer and make sure to map spot coordinates, reporter id and
 intensity columns that are in your data files.

 Maybe you can post a few lines from the files you are working with so I
 don't have to guess...

 Here is our data format. We never got the generic importer working even
 after updating the raw-data-types.xml file to our version of the Imagene
 files. When we try to use that, we get a no plug-in error.

This is usually due to lack of permission for the logged in user. Make 
sure that the file format configuration has been shared properly.

The file seems to be equivalent to and contain information that is very 
similar to GenePix files. I don't see any reason why the generic raw 
data importer shouldn't work in this case.

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nantel Andre

On 2011-11-21, at 2:51 PM, Nicklas Nordborg wrote:

 On 2011-11-21 20:31, Nantel Andre wrote:
 On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote:
 
 The demo data set is a GenePix data set and for each spot the file
 contains the coordinates, reporter id and a lot of measured intensities
 and other values. So if your data is similar to this, then I think you
 don't need any special new importer. Use the generic Raw data flat file
 importer and make sure to map spot coordinates, reporter id and
 intensity columns that are in your data files.
 
 Maybe you can post a few lines from the files you are working with so I
 don't have to guess...
 
 Here is our data format. We never got the generic importer working even
 after updating the raw-data-types.xml file to our version of the Imagene
 files. When we try to use that, we get a no plug-in error.
 
 This is usually due to lack of permission for the logged in user. Make 
 sure that the file format configuration has been shared properly.
 
 The file seems to be equivalent to and contain information that is very 
 similar to GenePix files. I don't see any reason why the generic raw 
 data importer shouldn't work in this case.
 
 

We don't have any regular users yet since we are still figuring out how it 
works. I am doing everything from my administrator account.

Anyway that's all the time I can give to this problem until later this week. 
This whole process has been much much more difficult than we expected. Please 
don't take this the wrong way but it might be a problem for your team as well 
if you ever hope to expand on your user base. It's not like we're new at this, 
we've been using microarrays since 1999 but I was getting tired of sending 
thousands of dollars to Agilent every year.

Thanks,
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
basedb-users-requ...@lists.sourceforge.net