Re: [Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

2016-01-06 Thread Ludwig Geistlinger
Dear Martin,

I finally found the time to make the code available via GitHub:

https://github.com/lgeistlinger/MapIds

and added you as a collaborator.
This is currently just a quick put-together for you to get an impression.
NA and duplicated mappings currently need to be removed to ensure
uniqueness of featureNames and rownames, respectively (via na.rm=TRUE and
dupl.rm=TRUE).

But there are, of course, other/better ways to summarize over
NA's/Duplicates, e.g. by appropriately passing that on to the 'multiVals'
Argument of AnnotationDbi::mapIds().

Just let me know in case you find that of any use or you find things that
could be improved/extended.

Best,
Ludwig



---

Dear Martin,

Ok, I am tyding that up and provide that via GitHub for you.
BTW, these ranges to Ids and vice versa sounds very cool!


> The original intention of the annotation() slot in ExpressionSet was to
> include the microarray chip identifier, so that one references this when
> translating from probeset to gene identifiers.

One question to that, as I often find my functions to ask for the organism
uder study (which I believe is actually most typically known when
investigating an expression dataset). While there are convenient ways to
ask microarray annotation packages for the organism under study (and thus
infer it from annotation(eset)), I wonder whether there is a similar slot
for SummarizedExperiment, eg an 'organism' slot? Or are there specific
reasons arguing against that?

Best,
Ludwig


> Hi Ludwig --
>
> It would be really great to see what you've put together; can you make
> your code available somewhere, maybe via github?
>
> I think the facilities already in Bioconductor include:
>
> - select() and the OrganismDb (e.g., Homo.sapiens) packages
>
> - (Recently introduced, in bioc-devel) GenomicFeatures::mapIds()
>
> - GSEABase mapIdentifiers()
>
> - The AnnotationFuncs package (some of this functionality might be
> redundant with select() / mapIds(); maybe your idea is a more refined
> version of this?
>
> - biomaRt, including the relatively under-known use of select() with mart
> objects.
>
> I think a particularly valuable development (initial implementation in
> GenomicFeatures::mapIds()) is transparent mapping to / from genomic
> ranges.
>
> The original intention of the annotation() slot in ExpressionSet was to
> include the microarray chip identifier, so that one references this when
> translating from probeset to gene identifiers.
>
> Martin
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Ludwig
> Geistlinger [ludwig.geistlin...@bio.ifi.lmu.de]
> Sent: Thursday, December 17, 2015 5:05 AM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] mapIds methods for ExpressionSet and
> SummarizedExperiment
>
> Dear Bioc Team,
>
> I have implemented mapIds methods mapping featureNames (ExpressionSet) and
> rownames (SummarizedExperiment) between major gene ID types such as
> ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.
>
> Given an ExpressionSet/SummarizedExperiment and an organism under
> investigation such as 'Homo sapiens', the methods are checking whether the
> corresponding org.db package is available, otherwise the package is
> automatically installed and loaded.
> Subsequently, the featureNames/rownames are mapped from the specified
> from.id.type to the desired to.id.type, corresponding to keytypes of the
> org.db package.
> Options to deal with NA and duplicate mappings are also provided in order
> to ensure that featureNames/rownames are unique after the mapping.
>
> Advantage is that end users do not require knowledge of the Bioc
> annotation infrastructure, but rather just need to provide the organism
> under investigation in a convenient format also for non-Biocs.
>
> I have not found something similar in existing packages and I am wondering
> whether this could be something of general interest.
>
> Best,
> Ludwig
>
> --
> Dipl.-Bioinf. Ludwig Geistlinger
>
> Lehr- und Forschungseinheit für Bioinformatik
> Institut für Informatik
> Ludwig-Maximilians-Universität München
> Amalienstrasse 17, 2. Stock, Büro A201
> 80333 München
>
> Tel.: 089-2180-4067
> eMail: ludwig.geistlin...@bio.ifi.lmu.de
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is

Re: [Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

2016-01-06 Thread Morgan, Martin
Thanks Ludwig I'll have a look. Martin


From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Ludwig 
Geistlinger [ludwig.geistlin...@bio.ifi.lmu.de]
Sent: Wednesday, January 06, 2016 12:07 PM
To: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] mapIds methods for ExpressionSet and 
SummarizedExperiment

Dear Martin,

I finally found the time to make the code available via GitHub:

https://github.com/lgeistlinger/MapIds

and added you as a collaborator.
This is currently just a quick put-together for you to get an impression.
NA and duplicated mappings currently need to be removed to ensure
uniqueness of featureNames and rownames, respectively (via na.rm=TRUE and
dupl.rm=TRUE).

But there are, of course, other/better ways to summarize over
NA's/Duplicates, e.g. by appropriately passing that on to the 'multiVals'
Argument of AnnotationDbi::mapIds().

Just let me know in case you find that of any use or you find things that
could be improved/extended.

Best,
Ludwig



---

Dear Martin,

Ok, I am tyding that up and provide that via GitHub for you.
BTW, these ranges to Ids and vice versa sounds very cool!


> The original intention of the annotation() slot in ExpressionSet was to
> include the microarray chip identifier, so that one references this when
> translating from probeset to gene identifiers.

One question to that, as I often find my functions to ask for the organism
uder study (which I believe is actually most typically known when
investigating an expression dataset). While there are convenient ways to
ask microarray annotation packages for the organism under study (and thus
infer it from annotation(eset)), I wonder whether there is a similar slot
for SummarizedExperiment, eg an 'organism' slot? Or are there specific
reasons arguing against that?

Best,
Ludwig


> Hi Ludwig --
>
> It would be really great to see what you've put together; can you make
> your code available somewhere, maybe via github?
>
> I think the facilities already in Bioconductor include:
>
> - select() and the OrganismDb (e.g., Homo.sapiens) packages
>
> - (Recently introduced, in bioc-devel) GenomicFeatures::mapIds()
>
> - GSEABase mapIdentifiers()
>
> - The AnnotationFuncs package (some of this functionality might be
> redundant with select() / mapIds(); maybe your idea is a more refined
> version of this?
>
> - biomaRt, including the relatively under-known use of select() with mart
> objects.
>
> I think a particularly valuable development (initial implementation in
> GenomicFeatures::mapIds()) is transparent mapping to / from genomic
> ranges.
>
> The original intention of the annotation() slot in ExpressionSet was to
> include the microarray chip identifier, so that one references this when
> translating from probeset to gene identifiers.
>
> Martin
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Ludwig
> Geistlinger [ludwig.geistlin...@bio.ifi.lmu.de]
> Sent: Thursday, December 17, 2015 5:05 AM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] mapIds methods for ExpressionSet and
> SummarizedExperiment
>
> Dear Bioc Team,
>
> I have implemented mapIds methods mapping featureNames (ExpressionSet) and
> rownames (SummarizedExperiment) between major gene ID types such as
> ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.
>
> Given an ExpressionSet/SummarizedExperiment and an organism under
> investigation such as 'Homo sapiens', the methods are checking whether the
> corresponding org.db package is available, otherwise the package is
> automatically installed and loaded.
> Subsequently, the featureNames/rownames are mapped from the specified
> from.id.type to the desired to.id.type, corresponding to keytypes of the
> org.db package.
> Options to deal with NA and duplicate mappings are also provided in order
> to ensure that featureNames/rownames are unique after the mapping.
>
> Advantage is that end users do not require knowledge of the Bioc
> annotation infrastructure, but rather just need to provide the organism
> under investigation in a convenient format also for non-Biocs.
>
> I have not found something similar in existing packages and I am wondering
> whether this could be something of general interest.
>
> Best,
> Ludwig
>
> --
> Dipl.-Bioinf. Ludwig Geistlinger
>
> Lehr- und Forschungseinheit für Bioinformatik
> Institut für Informatik
> Ludwig-Maximilians-Universität München
> Amalienstrasse 17, 2. Stock, Büro A201
> 80333 München
>
> Tel.: 089-2180-4067
> eMail: ludwig.geistlin...@bio.ifi.lmu.de
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

Re: [Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

2015-12-21 Thread Ludwig Geistlinger
Dear Martin,

Ok, I am tyding that up and provide that via GitHub for you.
BTW, these ranges to Ids and vice versa sounds very cool!

> The original intention of the annotation() slot in ExpressionSet was to
> include the microarray chip identifier, so that one references this when
> translating from probeset to gene identifiers.

One question to that, as I often find my functions to ask for the organism
uder study (which I believe is actually most typically known when
investigating an expression dataset). While there are convenient ways to
ask microarray annotation packages for the organism under study (and thus
infer it from annotation(eset)), I wonder whether there is a similar slot
for SummarizedExperiment, eg an 'organism' slot? Or are there specific
reasons arguing against that?

Best,
Ludwig




> Hi Ludwig --
>
> It would be really great to see what you've put together; can you make
> your code available somewhere, maybe via github?
>
> I think the facilities already in Bioconductor include:
>
> - select() and the OrganismDb (e.g., Homo.sapiens) packages
>
> - (Recently introduced, in bioc-devel) GenomicFeatures::mapIds()
>
> - GSEABase mapIdentifiers()
>
> - The AnnotationFuncs package (some of this functionality might be
> redundant with select() / mapIds(); maybe your idea is a more refined
> version of this?
>
> - biomaRt, including the relatively under-known use of select() with mart
> objects.
>
> I think a particularly valuable development (initial implementation in
> GenomicFeatures::mapIds()) is transparent mapping to / from genomic
> ranges.
>
> The original intention of the annotation() slot in ExpressionSet was to
> include the microarray chip identifier, so that one references this when
> translating from probeset to gene identifiers.
>
> Martin
> 
> From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Ludwig
> Geistlinger [ludwig.geistlin...@bio.ifi.lmu.de]
> Sent: Thursday, December 17, 2015 5:05 AM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] mapIds methods for ExpressionSet and
> SummarizedExperiment
>
> Dear Bioc Team,
>
> I have implemented mapIds methods mapping featureNames (ExpressionSet) and
> rownames (SummarizedExperiment) between major gene ID types such as
> ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.
>
> Given an ExpressionSet/SummarizedExperiment and an organism under
> investigation such as 'Homo sapiens', the methods are checking whether the
> corresponding org.db package is available, otherwise the package is
> automatically installed and loaded.
> Subsequently, the featureNames/rownames are mapped from the specified
> from.id.type to the desired to.id.type, corresponding to keytypes of the
> org.db package.
> Options to deal with NA and duplicate mappings are also provided in order
> to ensure that featureNames/rownames are unique after the mapping.
>
> Advantage is that end users do not require knowledge of the Bioc
> annotation infrastructure, but rather just need to provide the organism
> under investigation in a convenient format also for non-Biocs.
>
> I have not found something similar in existing packages and I am wondering
> whether this could be something of general interest.
>
> Best,
> Ludwig
>
> --
> Dipl.-Bioinf. Ludwig Geistlinger
>
> Lehr- und Forschungseinheit für Bioinformatik
> Institut für Informatik
> Ludwig-Maximilians-Universität München
> Amalienstrasse 17, 2. Stock, Büro A201
> 80333 München
>
> Tel.: 089-2180-4067
> eMail: ludwig.geistlin...@bio.ifi.lmu.de
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>


-- 
Dipl.-Bioinf. Ludwig Geistlinger

Lehr- und Forschungseinheit für Bioinformatik
Institut für Informatik
Ludwig-Maximilians-Universität München
Amalienstrasse 17, 2. Stock, Büro A201
80333 München

Tel.: 089-2180-4067
eMail: ludwig.geistlin...@bio.ifi.lmu.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

2015-12-18 Thread Morgan, Martin
Hi Ludwig --

It would be really great to see what you've put together; can you make your 
code available somewhere, maybe via github?

I think the facilities already in Bioconductor include:

- select() and the OrganismDb (e.g., Homo.sapiens) packages

- (Recently introduced, in bioc-devel) GenomicFeatures::mapIds()

- GSEABase mapIdentifiers()

- The AnnotationFuncs package (some of this functionality might be redundant 
with select() / mapIds(); maybe your idea is a more refined version of this?

- biomaRt, including the relatively under-known use of select() with mart 
objects.

I think a particularly valuable development (initial implementation in 
GenomicFeatures::mapIds()) is transparent mapping to / from genomic ranges.

The original intention of the annotation() slot in ExpressionSet was to include 
the microarray chip identifier, so that one references this when translating 
from probeset to gene identifiers.

Martin

From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Ludwig 
Geistlinger [ludwig.geistlin...@bio.ifi.lmu.de]
Sent: Thursday, December 17, 2015 5:05 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] mapIds methods for ExpressionSet and  
SummarizedExperiment

Dear Bioc Team,

I have implemented mapIds methods mapping featureNames (ExpressionSet) and
rownames (SummarizedExperiment) between major gene ID types such as
ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.

Given an ExpressionSet/SummarizedExperiment and an organism under
investigation such as 'Homo sapiens', the methods are checking whether the
corresponding org.db package is available, otherwise the package is
automatically installed and loaded.
Subsequently, the featureNames/rownames are mapped from the specified
from.id.type to the desired to.id.type, corresponding to keytypes of the
org.db package.
Options to deal with NA and duplicate mappings are also provided in order
to ensure that featureNames/rownames are unique after the mapping.

Advantage is that end users do not require knowledge of the Bioc
annotation infrastructure, but rather just need to provide the organism
under investigation in a convenient format also for non-Biocs.

I have not found something similar in existing packages and I am wondering
whether this could be something of general interest.

Best,
Ludwig

--
Dipl.-Bioinf. Ludwig Geistlinger

Lehr- und Forschungseinheit für Bioinformatik
Institut für Informatik
Ludwig-Maximilians-Universität München
Amalienstrasse 17, 2. Stock, Büro A201
80333 München

Tel.: 089-2180-4067
eMail: ludwig.geistlin...@bio.ifi.lmu.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

2015-12-18 Thread Morgan, Martin
Oops, those newly added functions in GenomicFeatures are

GenomicFeatures::mapRangesToIds
GenomicFeatures::mapIdsToRanges

Martin

From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Morgan, Martin 
[martin.mor...@roswellpark.org]
Sent: Friday, December 18, 2015 1:15 PM
To: Ludwig Geistlinger; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] mapIds methods for ExpressionSet  and 
SummarizedExperiment

Hi Ludwig --

It would be really great to see what you've put together; can you make your 
code available somewhere, maybe via github?

I think the facilities already in Bioconductor include:

- select() and the OrganismDb (e.g., Homo.sapiens) packages

- (Recently introduced, in bioc-devel) GenomicFeatures::mapIds()

- GSEABase mapIdentifiers()

- The AnnotationFuncs package (some of this functionality might be redundant 
with select() / mapIds(); maybe your idea is a more refined version of this?

- biomaRt, including the relatively under-known use of select() with mart 
objects.

I think a particularly valuable development (initial implementation in 
GenomicFeatures::mapIds()) is transparent mapping to / from genomic ranges.

The original intention of the annotation() slot in ExpressionSet was to include 
the microarray chip identifier, so that one references this when translating 
from probeset to gene identifiers.

Martin

From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Ludwig 
Geistlinger [ludwig.geistlin...@bio.ifi.lmu.de]
Sent: Thursday, December 17, 2015 5:05 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] mapIds methods for ExpressionSet and  
SummarizedExperiment

Dear Bioc Team,

I have implemented mapIds methods mapping featureNames (ExpressionSet) and
rownames (SummarizedExperiment) between major gene ID types such as
ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.

Given an ExpressionSet/SummarizedExperiment and an organism under
investigation such as 'Homo sapiens', the methods are checking whether the
corresponding org.db package is available, otherwise the package is
automatically installed and loaded.
Subsequently, the featureNames/rownames are mapped from the specified
from.id.type to the desired to.id.type, corresponding to keytypes of the
org.db package.
Options to deal with NA and duplicate mappings are also provided in order
to ensure that featureNames/rownames are unique after the mapping.

Advantage is that end users do not require knowledge of the Bioc
annotation infrastructure, but rather just need to provide the organism
under investigation in a convenient format also for non-Biocs.

I have not found something similar in existing packages and I am wondering
whether this could be something of general interest.

Best,
Ludwig

--
Dipl.-Bioinf. Ludwig Geistlinger

Lehr- und Forschungseinheit für Bioinformatik
Institut für Informatik
Ludwig-Maximilians-Universität München
Amalienstrasse 17, 2. Stock, Büro A201
80333 München

Tel.: 089-2180-4067
eMail: ludwig.geistlin...@bio.ifi.lmu.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

2015-12-17 Thread Ludwig Geistlinger
Dear Bioc Team,

I have implemented mapIds methods mapping featureNames (ExpressionSet) and
rownames (SummarizedExperiment) between major gene ID types such as
ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.

Given an ExpressionSet/SummarizedExperiment and an organism under
investigation such as 'Homo sapiens', the methods are checking whether the
corresponding org.db package is available, otherwise the package is
automatically installed and loaded.
Subsequently, the featureNames/rownames are mapped from the specified
from.id.type to the desired to.id.type, corresponding to keytypes of the
org.db package.
Options to deal with NA and duplicate mappings are also provided in order
to ensure that featureNames/rownames are unique after the mapping.

Advantage is that end users do not require knowledge of the Bioc
annotation infrastructure, but rather just need to provide the organism
under investigation in a convenient format also for non-Biocs.

I have not found something similar in existing packages and I am wondering
whether this could be something of general interest.

Best,
Ludwig

-- 
Dipl.-Bioinf. Ludwig Geistlinger

Lehr- und Forschungseinheit für Bioinformatik
Institut für Informatik
Ludwig-Maximilians-Universität München
Amalienstrasse 17, 2. Stock, Büro A201
80333 München

Tel.: 089-2180-4067
eMail: ludwig.geistlin...@bio.ifi.lmu.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel