Re: [base] Dealing wiht duplicate spots

2011-11-22 Thread Jari Häkkinen
On 2011-11-22 09.43, Nicklas Nordborg wrote:
> On 2011-11-21 21:32, Nantel Andre wrote:
>
>>
>> We don't have any "regular" users yet since we are still figuring out
>> how it works. I am doing everything from my administrator account.
>
> Have you created a file format configuration for the files you are
> using? This is done in Administrate ->  Plug-ins&  Extensions ->  Plug-in
> configurations. Create a new configuration for the 'Raw data importer'
> plug-in and use the 'Test with file' function to get regular
> expressions, etc. correct.
>
>> Anyway that's all the time I can give to this problem until later
>> this week. This whole process has been much much more difficult than
>> we expected. Please don't take this the wrong way but it might be a
>> problem for your team as well if you ever hope to expand on your user
>> base. It's not like we're new at this, we've been using microarrays
>> since 1999 but I was getting tired of sending thousands of dollars to
>> Agilent every year.
>
> Setting up a BASE server is not trivial. We have recently set up a new
> BASE installation for a project here and I guess we have spent several
> months just setting up lab procedures, data import procedures, and also
> made some customizations to BASE in order to further streamline the data
> entry.
>
> We don't have an explicit goal to get more users. We are primarily using
> BASE at our own site to solve our own "problems". The Illumina extension
> package is an example of that. Unfortunately, we don't have the
> resources to develop things targeted for other platforms than what we
> are using ourselves. This doesn't mean that BASE can't be used with
> other platforms, but in order to get the most out of BASE one need to
> invest time and maybe also resources for customization. In may
> experience at least a week would be needed for initial testing and
> prototyping and then maybe a couple of months for setting up a
> production-ready server, formalizing lab and data handling procedures,
> developing custom plug-ins and extensions, etc.


Two internal large scale uses are outlined in the BASE 3 document 
http://base.thep.lu.se/chrome/site/latest/html/why_base.html

To get the most of BASE you need to dedicate resources to customize BASE 
to fit your needs and to learn what BASE can do for you.

BASE use is a process, you start small and then extend your use of BASE 
as appropriate. In our lab (well some projects not all admittedly) we 
collect all important information collected during labwork in BASE. 
Therefore BASE is an integral part when setting up labprocesses where we 
make minor changes to lab work where needed and create customizations in 
BASE that fit our needs. The illumina extension was mentioned by 
Nicklas, but the reggie extension, 
http://baseplugins.thep.lu.se/wiki/net.sf.basedb.reggie is the tool most 
appreciated by lab personnel. We track a lot of information about our 
samples and use the biomaterial LIMS extensively and reggie streamlines 
data input but also makes crosschecks and catches errors in data entry. 
Currently, we are working on defining our sequencing procedure in the 
lab and mirroring it in BASE.

Our larger projects span over long time (years) and handles large sample 
sets. These projects use BASE to get organized, reduce errors, collect 
information that normally ends up in labbooks, to be able to share data, 
and as a analysis platform. And we have noticed improvements in labwork 
too :)

Of course, our extensive BASE engagement is made easier because our PIs 
and funding agencies understand the benefits from organization and 
sharing of data/information.


However, I want to emphasises, BASE is also usable for smaller groups 
and projects. One can ignore biomaterial and array LIMS, and directly 
create raw bioassays. Collect these into an experiment and head on to do 
analysis - The README for the affymetrix plug-in 
(http://baseplugins.thep.lu.se/wiki/se.lu.thep.affymetrix) outlines what 
needs to be done to get started with basically an empty BASE. The steps 
are general and can be adopted for other platforms.


Cheers,

Jari

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-22 Thread Nicklas Nordborg
On 2011-11-21 21:32, Nantel Andre wrote:

>
> We don't have any "regular" users yet since we are still figuring out
> how it works. I am doing everything from my administrator account.

Have you created a file format configuration for the files you are 
using? This is done in Administrate -> Plug-ins & Extensions -> Plug-in 
configurations. Create a new configuration for the 'Raw data importer' 
plug-in and use the 'Test with file' function to get regular 
expressions, etc. correct.

> Anyway that's all the time I can give to this problem until later
> this week. This whole process has been much much more difficult than
> we expected. Please don't take this the wrong way but it might be a
> problem for your team as well if you ever hope to expand on your user
> base. It's not like we're new at this, we've been using microarrays
> since 1999 but I was getting tired of sending thousands of dollars to
> Agilent every year.

Setting up a BASE server is not trivial. We have recently set up a new 
BASE installation for a project here and I guess we have spent several 
months just setting up lab procedures, data import procedures, and also 
made some customizations to BASE in order to further streamline the data 
entry.

We don't have an explicit goal to get more users. We are primarily using 
BASE at our own site to solve our own "problems". The Illumina extension 
package is an example of that. Unfortunately, we don't have the 
resources to develop things targeted for other platforms than what we 
are using ourselves. This doesn't mean that BASE can't be used with 
other platforms, but in order to get the most out of BASE one need to 
invest time and maybe also resources for customization. In may 
experience at least a week would be needed for initial testing and 
prototyping and then maybe a couple of months for setting up a 
production-ready server, formalizing lab and data handling procedures, 
developing custom plug-ins and extensions, etc.

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nantel Andre

On 2011-11-21, at 2:51 PM, Nicklas Nordborg wrote:

> On 2011-11-21 20:31, Nantel Andre wrote:
>> On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote:
>> 
>>> The demo data set is a GenePix data set and for each spot the file
>>> contains the coordinates, reporter id and a lot of measured intensities
>>> and other values. So if your data is similar to this, then I think you
>>> don't need any special new importer. Use the generic "Raw data flat file
>>> importer" and make sure to map spot coordinates, reporter id and
>>> intensity columns that are in your data files.
>>> 
>>> Maybe you can post a few lines from the files you are working with so I
>>> don't have to guess...
>> 
>> Here is our data format. We never got the generic importer working even
>> after updating the raw-data-types.xml file to our version of the Imagene
>> files. When we try to use that, we get a "no plug-in" error.
> 
> This is usually due to lack of permission for the logged in user. Make 
> sure that the file format configuration has been shared properly.
> 
> The file seems to be equivalent to and contain information that is very 
> similar to GenePix files. I don't see any reason why the generic raw 
> data importer shouldn't work in this case.
> 
> 

We don't have any "regular" users yet since we are still figuring out how it 
works. I am doing everything from my administrator account.

Anyway that's all the time I can give to this problem until later this week. 
This whole process has been much much more difficult than we expected. Please 
don't take this the wrong way but it might be a problem for your team as well 
if you ever hope to expand on your user base. It's not like we're new at this, 
we've been using microarrays since 1999 but I was getting tired of sending 
thousands of dollars to Agilent every year.

Thanks,
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nicklas Nordborg
On 2011-11-21 20:31, Nantel Andre wrote:
> On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote:
>
>> The demo data set is a GenePix data set and for each spot the file
>> contains the coordinates, reporter id and a lot of measured intensities
>> and other values. So if your data is similar to this, then I think you
>> don't need any special new importer. Use the generic "Raw data flat file
>> importer" and make sure to map spot coordinates, reporter id and
>> intensity columns that are in your data files.
>>
>> Maybe you can post a few lines from the files you are working with so I
>> don't have to guess...
>
> Here is our data format. We never got the generic importer working even
> after updating the raw-data-types.xml file to our version of the Imagene
> files. When we try to use that, we get a "no plug-in" error.

This is usually due to lack of permission for the logged in user. Make 
sure that the file format configuration has been shared properly.

The file seems to be equivalent to and contain information that is very 
similar to GenePix files. I don't see any reason why the generic raw 
data importer shouldn't work in this case.

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nantel Andre
On 2011-11-21, at 2:13 PM, Nicklas Nordborg wrote:

> The demo data set is a GenePix data set and for each spot the file 
> contains the coordinates, reporter id and a lot of measured intensities 
> and other values. So if your data is similar to this, then I think you 
> don't need any special new importer. Use the generic "Raw data flat file 
> importer" and make sure to map spot coordinates, reporter id and 
> intensity columns that are in your data files.
> 
> Maybe you can post a few lines from the files you are working with so I 
> don't have to guess...

Here is our data format. We never got the generic importer working even after 
updating the raw-data-types.xml file to our version of the Imagene files. When 
we try to use that, we get a "no plug-in" error. 

Begin Header
version 9.0.
DateTue Nov 15 13:49:21 EST 2011
Image File  C:\Documents and Settings\All Users\Documents\Andre\29 
Avr\14261085_cy3.tif
Page0
Page Name   
Invertedfalse
Image File  C:\Documents and Settings\All Users\Documents\Andre\29 
Avr\14261085_cy5.tif
Page0
Page Name   
Invertedfalse
Begin Field Dimensions
Field   MetarowsMetacolsRowsCols
A   12  4   16  18
End Field Dimensions
Begin Measurement parameters
Segmentation Method auto
Signal Low  0.0
Signal High 0.0
Background Low  0.0
Background High 0.0
Background Buffer   2.0
Background Width5.0
End Measurement parameters
End Header
Begin Normalization
Background measure  Bckgr. Mean
Correction method   Local
Sliding window  false
Take logtrue
Log baseBase 2
Normalization type  Lowess
Scope   SubGrid
Smoothing   0.2
Using control spots Use all spots
End Normalization
Begin Raw Data
Field   MR  MC  Row Col.GeneID  Annotation 1Flag
N. Signal Mean, ch1 N. Signal Mean, ch2 N. Signal Median, ch1   N. 
Signal Median, ch2   N. Signal Mode, ch1 N. Signal Mode, ch2
A   1   1   1   1   orf19.5337  0   
10.0631 10.2463 9.9099  9.9951  9.9646  9.9756
A   1   1   1   2   orf19.5337  0   
9.1418  9.3703  9.1036  9.1544  9.2753  9.5736
A   1   1   1   3   orf19.5642  0   
13.5228 13.3390 13.4454 13.2718 13.5748 13.3829
A   1   1   1   4   orf19.5642  0   
12.3849 12.4358 12.3075 12.4157 12.3422 12.4249
A   1   1   1   5   orf19.3173  0   
11.9102 11.9314 11.8002 11.8241 11.8276 11.8270


--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nicklas Nordborg
On 2011-11-21 19:23, Nantel Andre wrote:
> That issue goes beyond the ImaGene format. When doing contact
> printing of microarrays it is still fairly common to spot the same
> probe 2 or more times as shown in the example here
> (http://www.digitalapoptosis.com/2005/10/12/dna-microarray/).
>
> It is also very common for commercial arrays (Agilent for example) to
> have multiple copies of their control spots.

I wouldn't call that duplicate spots. They are different spots on 
different coordinates that just happen to have the same reporter/gene. 
The importer should import those as two separate entries and it is then 
up to down-stream analysis if they should be merged into a single value 
and what kind of average method to use for merging.

>
> We'll take a look at the Illumina plug-ins but I'm sure that there is
> a more elegant solution.

Does the ImaGene data file have spot coordinates in the data files or 
not? If not, then the Illumina case might be useful, but not otherwise.

 > What were are trying to do is,
> theoretically, simple since our data is already
> background-substracted and normalized (ImaGene does a perfectably
> acceptable job in doing that). We have two collums with ch1 and ch2
> normalized intensities in Log2 and a Flag column showing spots that
> have to be filtered out. We were hoping to use BASE to help up
> organize our experiments before sending them off to MeV.
>
> In the Base2 demo server, the demoHyb1 bioassay is similar to our
> situation. When I open that item and click on the "Raw data" tab, It
> appears  that the spot coordinate were imported and then used to
> produce columns entitled [Rep] Name and [Rep] ID that clearly come
> from duplicate spots. That's what we are trying to replicate.

The demo data set is a GenePix data set and for each spot the file 
contains the coordinates, reporter id and a lot of measured intensities 
and other values. So if your data is similar to this, then I think you 
don't need any special new importer. Use the generic "Raw data flat file 
importer" and make sure to map spot coordinates, reporter id and 
intensity columns that are in your data files.

Maybe you can post a few lines from the files you are working with so I 
don't have to guess...

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nantel Andre
That issue goes beyond the ImaGene format. When doing contact printing of 
microarrays it is still fairly common to spot the same probe 2 or more times as 
shown in the example here 
(http://www.digitalapoptosis.com/2005/10/12/dna-microarray/).

It is also very common for commercial arrays (Agilent for example) to have 
multiple copies of their control spots. 

We'll take a look at the Illumina plug-ins but I'm sure that there is a more 
elegant solution. What were are trying to do is, theoretically, simple since 
our data is already background-substracted and normalized (ImaGene does a 
perfectably acceptable job in doing that). We have two collums with ch1 and ch2 
normalized intensities in Log2 and a Flag column showing spots that have to be 
filtered out. We were hoping to use BASE to help up organize our experiments 
before sending them off to MeV.

In the Base2 demo server, the demoHyb1 bioassay is similar to our situation. 
When I open that item and click on the "Raw data" tab, It appears  that the 
spot coordinate were imported and then used to produce columns entitled [Rep] 
Name and [Rep] ID that clearly come from duplicate spots. That's what we are 
trying to replicate.

Thanks,


On 2011-11-21, at 12:36 PM, Nicklas Nordborg wrote:

> On 2011-11-21 17:08, Nantel Andre wrote:
>> Greetings,
>> 
>> We are currently trying to integrate BASE into our lab and writing the
>> necessary plug-ins to import normalized data from ImaGene. Let's just
>> say that the lack of documentation has been "challenging".
>> 
>> Right now we are trying to figure out how to deal with duplicate spots.
>> From the GenePix examples in the Base2 server we see cases of raw
>> biosassays wtih [raw] columns defined by the unique spot coordinates as
>> well as “rep‘ columns containing the duplicated GeneID classifiers. Any
>> suggestions on how to do this?
> 
> What do you mean with a duplicate spots? Physically a spot is a spot and 
> there can of course only be one on a given position. However there is 
> nothing in BASE that prevents two or more spots from referencing the 
> same gene/reporter. Then there are some platforms (for example Illumina 
> BeadArrays) that doesn't have spots in the classical meaning. In this 
> case one need to construct a unique "feature id" for each "spot 
> equivalent" that one wants to measure. In the Illumina case, the unique 
> ID is provided by the "Illumicode" column in the data files. This value 
> is then mapped to gene/reporter annotations via a BGX file, and the 
> importer is also calculating means, etc for all entries with the same 
> "Illumicode" value. See 
> http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina for more 
> information about how the Illumina platform has been implemented.
> 
> I don't know what kind of data files that are generated by the ImaGene 
> platform, so it is hard to advice on exactly how to the same thing for 
> ImaGene.
> 
> /Nicklas
> 
> --
> All the data continuously generated in your IT infrastructure 
> contains a definitive record of customers, application performance, 
> security threats, fraudulent activity, and more. Splunk takes this 
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> ___
> The BASE general discussion mailing list
> basedb-users@lists.sourceforge.net
> unsubscribe: send a mail with subject "unsubscribe" to
> basedb-users-requ...@lists.sourceforge.net


--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] Dealing wiht duplicate spots

2011-11-21 Thread Nicklas Nordborg
On 2011-11-21 17:08, Nantel Andre wrote:
> Greetings,
>
> We are currently trying to integrate BASE into our lab and writing the
> necessary plug-ins to import normalized data from ImaGene. Let's just
> say that the lack of documentation has been "challenging".
>
> Right now we are trying to figure out how to deal with duplicate spots.
>  From the GenePix examples in the Base2 server we see cases of raw
> biosassays wtih [raw] columns defined by the unique spot coordinates as
> well as “rep‘ columns containing the duplicated GeneID classifiers. Any
> suggestions on how to do this?

What do you mean with a duplicate spots? Physically a spot is a spot and 
there can of course only be one on a given position. However there is 
nothing in BASE that prevents two or more spots from referencing the 
same gene/reporter. Then there are some platforms (for example Illumina 
BeadArrays) that doesn't have spots in the classical meaning. In this 
case one need to construct a unique "feature id" for each "spot 
equivalent" that one wants to measure. In the Illumina case, the unique 
ID is provided by the "Illumicode" column in the data files. This value 
is then mapped to gene/reporter annotations via a BGX file, and the 
importer is also calculating means, etc for all entries with the same 
"Illumicode" value. See 
http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina for more 
information about how the Illumina platform has been implemented.

I don't know what kind of data files that are generated by the ImaGene 
platform, so it is hard to advice on exactly how to the same thing for 
ImaGene.

/Nicklas

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


[base] Dealing wiht duplicate spots

2011-11-21 Thread Nantel Andre
Greetings,

We are currently trying to integrate BASE into our lab and writing the 
necessary plug-ins to import normalized data from ImaGene. Let's just say that 
the lack of documentation has been "challenging".

Right now we are trying to figure out how to deal with duplicate spots. From 
the GenePix examples in the Base2 server we see cases of raw biosassays wtih 
[raw] columns defined by the unique spot coordinates as well as “rep‘ columns 
containing the duplicated GeneID classifiers. Any suggestions on how to do this?

Thanks,

André Nantel, M.Sc., Ph.D.
Senior Research Officer and Adjunct Professor
Project Manager, Microarray Lab

Biotechnology Research Institute
National Research Council of Canada
6100 Royalmount
Montreal, QC
Canada H4P 2R2

andre.nan...@nrc-cnrc.gc.ca

(514) 496-6370
http://www.nrc-cnrc.gc.ca/eng/services/bri/microarray.html









--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net