Re: [Bioc-devel] IPI numbers in annotation packages

2015-10-05 Thread Marc Carlson
You need to scroll down that script a ways...  Look for 'yeast'.

On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald <jmac...@uw.edu> wrote:

> Hi Marc,
>
> That script has this in it:
>
> ## For now just get data for the ones that we have traditionally supported
> ## I don't even know if the other species are available...
> speciesList = c("chipsrc_human.sqlite",
>   "chipsrc_rat.sqlite",
>   "chipsrc_chicken.sqlite",
>   "chipsrc_zebrafish.sqlite",
>   #  "chipsrc_worm.sqlite",
>   #  "chipsrc_fly.sqlite",
>   "chipsrc_mouse.sqlite",
>   "chipsrc_bovine.sqlite"
>   #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
> "activated"
>   ## But to activate arabidopsis, remember you have to pre-add the
> tables...
>   #  "chipsrc_canine.sqlite",
>   #  "chipsrc_rhesus.sqlite",
>   #  "chipsrc_chimp.sqlite",
>   #  "chipsrc_anopheles.sqlite"
>   )
>
> And there is no mention of yeast anywhere. If I search all the scripts for
> say 'INSERT INTO pfam', I get
>
> custom_anno/script/bindb.sql
> 328:INSERT INTO pfam
>
> pfam/script/srcdb_pfam.sql
> 202:-- INSERT INTO pfamb
>
> organism_annotation/script/bindb_yeast.sql
> 441:-- INSERT INTO pfam
>
> yeast/script/bindb.sql
> 241:-- INSERT INTO pfam
>
> The first one is just doing all the metadata tables, and the other three
> are in code blocks that are commented out. Is it possible that you used a
> script that didn't make it into svn?
>
> Jim
>
>
>
> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrj...@gmail.com> wrote:
>
>> Hi Jim,
>>
>> You asked me on Friday where the PFAM Ids for yeast came from and I
>> couldn't recall because at the moment I was at Seattle Childrens (and thus
>> nowhere near my copy of my source code).  But I also said I would look into
>> it for you later (and I have).  Here is what my code tells me:  So ever
>> since IPI shut down, we have been getting the PFAM and IPI data from
>> UniProt.  There is a script in the UniProt.ws package
>> called processDataForBuild.R that is supposed to be called by the script
>> "src_build.sh" (it's the last thing that script does).  That code should
>> get the pfam data from yeast for you.  Please note that yeast required a
>> lot of special code to get it processed.  Nothing with yeast annotations is
>> ever easy.  It's like karmic accounting to compensate for all the bread and
>> beer.  ;)
>>
>> Let me know if you need any more explanations about what is in there.
>> Because of the crazy timing, before I left I build I pushed into devel a
>> fresh set of .DB0s and core packages (in late August) just in case it was
>> too crazy to do a refresh right now.  But it sounds like you won't need
>> that.
>>
>>
>>   Marc
>>
>>
>>
>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmac...@uw.edu>
>> wrote:
>>
>>> I am building the annotation db0 packages for the upcoming Bioconductor
>>> release, which are used to generate all the orgDb and chip annotation
>>> packages that we distribute. Up to the previous release we have always
>>> included IPI identifiers (as part of the table containing the PROSITE and
>>> PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer
>>> maintained (since 2011), and UniProt, which is where we got data for the
>>> last few releases, has now dropped support as well.
>>>
>>> Given that this annotation source is no longer maintained, I decided to
>>> exclude these IDs from the current build of the following db0 packages:
>>>
>>>- rat.db0
>>>- chicken.db0
>>>- zebrafish.db0
>>>- mouse.db0
>>>- bovine.db0
>>>- human.db0
>>>
>>> In addition, it is not clear to me (nor can Marc recall) where the data
>>> for
>>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>>> behind schedule for these packages, I have excluded that table as well.
>>>
>>> If this will break anybody's package, or if there are people who rely on
>>> these IDs, I can just parse out of the last release and deprecate, so you
>>> will have the IDs for one more release. However, if nobody cares about
>>> such
>>> things, I will just go with what we have. Please speak up if this will
>>> affect you.
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Txdb Issues - all exon names are NA's ?

2015-09-23 Thread Marc Carlson
Works for me.

 Marc


On Tue, Sep 22, 2015 at 6:03 PM, Hervé Pagès <hpa...@fredhutch.org> wrote:

> Hi Marc,
>
> On 09/22/2015 05:39 PM, Marc Carlson wrote:
>
>> Herve is right. UCSC doesn't give us this information,  And actually, I
>> think it's pretty rare to see exon names from anybody.   So it's weird
>> to me that they are a default return value for this method.
>>
>
> Ensembl does provide exon names/ids so any TxDb object that was created
> with makeTxDbFromBiomart("ensembl", ...) should have them:
>
>   library(GenomicFeatures)
>   txdb <- makeTxDbFromBiomart("ensembl", dataset="celegans_gene_ensembl")
>   exonsBy(txdb, use.names=TRUE)$Y74C9A.2a.2
>   # GRanges object with 4 ranges and 3 metadata columns:
>   #   seqnames ranges strand |   exon_id  exon_name
> exon_rank
>   #   | 
> 
>   #   [1]I [10413, 10585]  + | 1  WBGene00022276.e1
>  1
>   #   [2]I [11618, 11689]  + | 6  WBGene00022276.e6
>  2
>   #   [3]I [14951, 15160]  + |11 WBGene00022276.e11
>  3
>   #   [4]I [16473, 16842]  + |14 WBGene00022276.e14
>  4
>   #   ---
>   #   seqinfo: 7 sequences (1 circular) from an unspecified genome
>
> Note that the *By() extractors don't let the user choose which column
> to return at the moment so that's why it was decided (a long time ago)
> to return exon internal ids *and* names (better more than less).
>
> H.
>
>
>>Marc
>>
>> On Tue, Sep 22, 2015 at 5:29 PM, Hervé Pagès <hpa...@fredhutch.org
>> <mailto:hpa...@fredhutch.org>> wrote:
>>
>> Hi Sonali,
>>
>> UCSC doesn't provide names for the exons of their gene models.
>> See the table where this data is coming from:
>>
>>
>>
>> https://genome.ucsc.edu/cgi-bin/hgTables?db=hg19_group=genes_track=knownGene_table=knownGene_doSchema=describe+table+schema
>>
>> The exon information is all coming from the exonStarts and exonEnds
>> columns. No exon names!
>>
>> H.
>>
>> PS: Maybe this would better be asked on the support site.
>>
>>
>> On 09/22/2015 04:44 PM, Arora, Sonali wrote:
>>
>>Hi everyone,
>>
>> I was trying to get the exons by gene from a txdb object but I
>> get NA's
>> for all exon_name's. Please advise.
>>
>>   > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>   > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>   > ebg2 <- exonsBy(txdb, by="gene")
>>   >
>>   > ebg2
>> GRangesList object of length 23459:
>> $1
>> GRanges object with 15 ranges and 2 metadata columns:
>>  seqnames   ranges strand   |   exon_id
>>  | 
>>  [1]chr19 [58858172, 58858395]  -   |250809
>>  [2]chr19 [58858719, 58859006]  -   |250810
>>  [3]chr19 [58859832, 58860494]  -   |250811
>>  [4]chr19 [58860934, 58862017]  -   |250812
>>  [5]chr19 [58861736, 58862017]  -   |250813
>>  ...  ...  ...... ...   ...
>> [11]chr19 [58868951, 58869015]  -   |250821
>> [12]chr19 [58869318, 58869652]  -   |250822
>> [13]chr19 [58869855, 58869951]  -   |250823
>> [14]chr19 [58870563, 58870689]  -   |250824
>> [15]chr19 [58874043, 58874214]  -   |250825
>>exon_name
>>  
>>  [1]
>>  [2]
>>  [3]
>>  [4]
>>  [5]
>>  ... ...
>> [11]
>> [12]
>> [13]
>> [14]
>> [15]
>>
>> $10
>> GRanges object with 2 ranges and 2 metadata columns:
>> seqnames   ranges strand | exon_id exon_name
>> [1] chr8 [18248755, 18248855]  + |  113603  
>> [2] chr8 [18257508, 18258723]  + |  113604  
>>
>> ...
>> <23457 more elements>
>> ---
>> seqinfo: 93 sequences (1 circula

Re: [Bioc-devel] Txdb Issues - all exon names are NA's ?

2015-09-22 Thread Marc Carlson
Herve is right. UCSC doesn't give us this information,  And actually, I
think it's pretty rare to see exon names from anybody.   So it's weird to
me that they are a default return value for this method.

  Marc

On Tue, Sep 22, 2015 at 5:29 PM, Hervé Pagès  wrote:

> Hi Sonali,
>
> UCSC doesn't provide names for the exons of their gene models.
> See the table where this data is coming from:
>
>
>
> https://genome.ucsc.edu/cgi-bin/hgTables?db=hg19_group=genes_track=knownGene_table=knownGene_doSchema=describe+table+schema
>
> The exon information is all coming from the exonStarts and exonEnds
> columns. No exon names!
>
> H.
>
> PS: Maybe this would better be asked on the support site.
>
>
> On 09/22/2015 04:44 PM, Arora, Sonali wrote:
>
>>   Hi everyone,
>>
>> I was trying to get the exons by gene from a txdb object but I get NA's
>> for all exon_name's. Please advise.
>>
>>  > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>  > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>  > ebg2 <- exonsBy(txdb, by="gene")
>>  >
>>  > ebg2
>> GRangesList object of length 23459:
>> $1
>> GRanges object with 15 ranges and 2 metadata columns:
>> seqnames   ranges strand   |   exon_id
>> | 
>> [1]chr19 [58858172, 58858395]  -   |250809
>> [2]chr19 [58858719, 58859006]  -   |250810
>> [3]chr19 [58859832, 58860494]  -   |250811
>> [4]chr19 [58860934, 58862017]  -   |250812
>> [5]chr19 [58861736, 58862017]  -   |250813
>> ...  ...  ...... ...   ...
>>[11]chr19 [58868951, 58869015]  -   |250821
>>[12]chr19 [58869318, 58869652]  -   |250822
>>[13]chr19 [58869855, 58869951]  -   |250823
>>[14]chr19 [58870563, 58870689]  -   |250824
>>[15]chr19 [58874043, 58874214]  -   |250825
>>   exon_name
>> 
>> [1]
>> [2]
>> [3]
>> [4]
>> [5]
>> ... ...
>>[11]
>>[12]
>>[13]
>>[14]
>>[15]
>>
>> $10
>> GRanges object with 2 ranges and 2 metadata columns:
>>seqnames   ranges strand | exon_id exon_name
>>[1] chr8 [18248755, 18248855]  + |  113603  
>>[2] chr8 [18257508, 18258723]  + |  113604  
>>
>> ...
>> <23457 more elements>
>> ---
>> seqinfo: 93 sequences (1 circular) from hg19 genome
>>  > testgr <- unlist(ebg2)
>>  > table(is.na(mcols(testgr)$exon_name))
>>
>>TRUE
>> 272776
>>  > sessionInfo()
>> R version 3.2.2 RC (2015-08-09 r68965)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils
>> [7] datasets  methods   base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
>> [2] GenomicFeatures_1.21.29
>> [3] AnnotationDbi_1.31.18
>> [4] Biobase_2.29.1
>> [5] GenomicRanges_1.21.28
>> [6] GenomeInfoDb_1.5.16
>> [7] IRanges_2.3.21
>> [8] S4Vectors_0.7.18
>> [9] BiocGenerics_0.15.6
>>
>> loaded via a namespace (and not attached):
>>   [1] XVector_0.9.4  zlibbioc_1.15.0
>>   [3] GenomicAlignments_1.5.17   BiocParallel_1.3.52
>>   [5] tools_3.2.2SummarizedExperiment_0.3.9
>>   [7] DBI_0.3.1  lambda.r_1.1.7
>>   [9] futile.logger_1.4.1rtracklayer_1.29.27
>> [11] futile.options_1.0.0   bitops_1.0-6
>> [13] RCurl_1.95-4.7 biomaRt_2.25.3
>> [15] RSQLite_1.0.0  Biostrings_2.37.8
>> [17] Rsamtools_1.21.17  XML_3.98-1.3
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] rtracklayer bug?

2015-06-30 Thread Marc Carlson

Hi Arne,

So this time when I look at the bioc-devel email list, I don't see a 
record for this last name (or this email).  In fact the only way I could 
be sure it was you was that your post was the same...  ;) If you want to 
post from gmail, then you will need to subscribe the gmail address to 
the list here:


https://stat.ethz.ch/mailman/listinfo/bioc-devel



 Marc



On 06/30/2015 02:26 AM, Arne Müller wrote:

Hello,


I think there’s a problem in UCSCSession initializer in rtracklayer:

setMethod(initialize, UCSCSession,

   function(.Object, url =http://genome.ucsc.edu/cgi-bin/;,

user =ULL, session = NULL, force = FALSE, ...)

   {

 .Object@url - url

 .Object@views - new.env()

 gwURL - ucscURL(.Object, gateway)

 if (force) {

 gwURL - paste0(gwURL, '?redirect=anual')

 }

 gw - httpGet(gwURL, cookiefile =empfile(), header = TRUE,

   .parseúLSE)

 if (grepl(redirectTd, gw)) {

 url - sub(.*?a href=h([^[:space:]]+cgi-bin/).*,
h\\1, gw)

 return(initialize(.Object, url, user=er, session=session,

   force=UE, ...))

 }

 cookie - grep(Set-[Cc]ookie: hguid[^==, gw)

 if (!length(cookie))

   stop(Failed to obtain 'hguid' cookie)

 hguid - sub(.*Set-Cookie: (hguid[^==[^;]*);.*, \\1, gw)

 .Object@hguid - hguid

 if (!is.null(user)  !is.null(session)) { ## bring in other
session

   ucscGet(.Object, tracks,

   list(hgS_doOtherUser =submit, hgS_otherUserName user,

hgS_otherUserSessionName =ession))

 }

 .Object

   })



Shouldn’t ‘…’ be passed to httpGet that in turn is passed to RCURL, I.e.


gw - httpGet(gwURL, cookiefile =empfile(), header = TRUE,

   .parseúLSE, …) ?

We run an internal instance of the UCSC genome browser and need to pass a
cookie to all http-requests. The problem is that

session =ew ('UCSCSession', url=myInternalURL, cookie=myAuthCookie)


Does not pass the ‘cookie’ argument to httpGet.


Regards,


Arne

[[alternative HTML version deleted]]



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes in AnnotationDbi

2015-06-08 Thread Marc Carlson
OK Jim,

I will put very simple messages in (one liners) that will simply state 
whether the relationship between keys and the requested columns was 1:1, 
1:many, many:1, or many:many.   Hopefully this will represent an 
acceptable compromise.

  Marc


On 06/05/2015 08:37 AM, James W. MacDonald wrote:
 I agree that a warning is probably not the way to go, as it does imply 
 that there might have been something wrong with either the input or 
 output. Plus, not everybody understands the distinction between error 
 and warning.

 And having additional documentation can't possibly hurt. But that 
 assumes that most/some/all of the end users both peruse and understand 
 the documentation, which we all know is not the case. The main issue, 
 for me at least, is that a significant proportion of people seem to 
 think there is some sort of uniqueness imposed on things like Entrez 
 Gene IDs and Hugo symbols, etc. While that is the ultimate goal, we do 
 not have and maybe never will achieve unique IDs for each annotatable 
 object.

 I used to work for a PI who was a very smart, well informed 
 statistical geneticist who was absolutely shocked when I informed her 
 that a) there are SNPs in dbSNP that have more than one RS ID, and 
 that b.) there are RS IDs in dbSNP that have been assigned to multiple 
 SNPs. She just assumed that there was a one-to-one RS ID - SNP mapping.

 So this is to me the crux of the problem. It is perfectly valid to 
 return one-to-many mappings, and that is what should be expected /by 
 those of us who already understand such things. /But for those of us 
 who are ignorant of the details, and those who assume uniqueness of 
 IDs, it would be really nice if they got a message telling them 
 something like

 /Please note that there are one-to-many mappings between the input and 
 output IDs, so the output is longer than your input vector. Please see 
 ?select for more detail./
 /
 /
 And if the message is objectionable to some, you could give the option 
 for people to set a global flag to shut it off. Something like

 if(!pleaseMakeItStop)
   message(message goes here)

 and they could set

 pleaseMakeItStop = TRUE in their .Rprofile

 Is that a reasonable compromise?

 Jim



 On Thu, Jun 4, 2015 at 6:06 PM, Marc Carlson mcarl...@fredhutch.org 
 mailto:mcarl...@fredhutch.org wrote:

 Hi Jim,

 I do agree that the warning was protective for that (this is why I
 put it there).

 But it was also annoying for many and a source of some confusion
 because when people see a warning() they think that something has
 gone wrong with the code that was just run.  And in this case the
 select method was actually doing exactly what it was supposed to
 be doing.  What it was actually warning you about was what you did
 separately in that assignment to fit2...  Which is the step right
 after the select method already did it's work.  And I can
 understand why that seems a little bit confusing since you are
 basically telling someone to be careful with the data you just
 gave them.

 Now I could replace it with a message() I guess, but in cases like
 this where the warning is about something that happens outside of
 the function you are calling, shouldn't that probably be handled
 by documentation?  Or at least, that is the argument that finally
 persuaded me to remove it.  That and that fact that almost every
 call to select() ended up accompanied by the warning you
 mentioned, because it turns out that perfect 1:1 relationships are
 pretty rare for annotation data.  Very often, you are going to get
 back multiple results.

 But I didn't just remove the warning, I also supplied an
 alternative for people who have a real need for consistent 1:1
 mapping.

 The mapIds() method takes most of the same arguments as select,
 except that unlike select(), it only looks up one column and it
 always returns a vector that is the same size as the vector that
 came in.

 So for your example, you could do something like this psuedocode here:

 mapIds(chippackage, featureNames(eset), column=ENTREZID,
 keytype=PROBEID)

 And mapIds will follow a rule specified by the default value for
 the multiVals argument so that you can get back your results in a
 1:1 manner.  And if you don't like any of the options available
 for the multiVals argument, you can make your own function and
 pass it in.


 Anyhow please continue to let us know what you think?


  Marc







 On 06/04/2015 10:50 AM, James W. MacDonald wrote:

 In the last release, the warning message from select() telling
 people that
 their results include one-to-many mappings was removed. While
 some may find
 this warning annoying, I think silently returning something
 unexpected to
 our users is dangerous.

 In other words, for me

Re: [Bioc-devel] Changes in AnnotationDbi

2015-06-04 Thread Marc Carlson

Hi Jim,

I do agree that the warning was protective for that (this is why I put 
it there).


But it was also annoying for many and a source of some confusion because 
when people see a warning() they think that something has gone wrong 
with the code that was just run.  And in this case the select method was 
actually doing exactly what it was supposed to be doing.  What it was 
actually warning you about was what you did separately in that 
assignment to fit2...  Which is the step right after the select method 
already did it's work.  And I can understand why that seems a little bit 
confusing since you are basically telling someone to be careful with the 
data you just gave them.


Now I could replace it with a message() I guess, but in cases like this 
where the warning is about something that happens outside of the 
function you are calling, shouldn't that probably be handled by 
documentation?  Or at least, that is the argument that finally persuaded 
me to remove it.  That and that fact that almost every call to select() 
ended up accompanied by the warning you mentioned, because it turns out 
that perfect 1:1 relationships are pretty rare for annotation data.  
Very often, you are going to get back multiple results.


But I didn't just remove the warning, I also supplied an alternative for 
people who have a real need for consistent 1:1 mapping.


The mapIds() method takes most of the same arguments as select, except 
that unlike select(), it only looks up one column and it always returns 
a vector that is the same size as the vector that came in.


So for your example, you could do something like this psuedocode here:

mapIds(chippackage, featureNames(eset), column=ENTREZID, 
keytype=PROBEID)


And mapIds will follow a rule specified by the default value for the 
multiVals argument so that you can get back your results in a 1:1 
manner.  And if you don't like any of the options available for the 
multiVals argument, you can make your own function and pass it in.



Anyhow please continue to let us know what you think?


 Marc






On 06/04/2015 10:50 AM, James W. MacDonald wrote:

In the last release, the warning message from select() telling people that
their results include one-to-many mappings was removed. While some may find
this warning annoying, I think silently returning something unexpected to
our users is dangerous.

In other words, for me it is a common practice to do something like this:

fit - lmFit(eset, design)
fit2 - eBayes(fit)
gns - select(chippackage, featureNames(eset), c(ENTREZID,SYMBOL))
gns - gns[!duplicated(gns[,1]),]
fit2$genes - gns

I add in the step where dups are removed because I already know they are
there. But a naive user might instead do

fit2$genes - select(chippackage, featureNames(eset),
c(ENTREZID,SYMBOL))

Which will work just fine, but then all the annotation (except for the
first few lines) will now be completely incorrect, and there wasn't a
warning to let the end user know that they may have made a mistake.

lmFit() will parse the featureData slot of an ExpressionSet and use those
data for annotation, so that gives some hypothetical protections, for those
who first put their annotation data into their ExpressionSet. However,
?eSet says:

  ‘featureData’: Contains variables describing features (i.e., rows
   in ‘assayData’) unique to this experiment. Use the
   ‘annotation’ slot to efficiently reference feature data
   common to the annotation package used in the experiment.
   Class: ‘AnnotatedDataFrame-class’

Which to me indicates that the featureData slot isn't really intended to
contain annotation data, but instead some unique information that pertains
to a given experiment. But maybe I misunderstand.

Is the featureData slot actually intended for annotation data? If not, what
is the intended pipeline for annotating data in an ExpressionSet? Am I
alone in being concerned about this?

Best,

Jim




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] AnnotationHubData Error: Access denied: 530

2015-04-13 Thread Marc Carlson
Hi Johannes,

We are already planning to upgrade those objects to have that 
information when they are downloaded...  Sonali is actually working on 
that right now.  She will probably have updated that information by the 
end of the week or so.  It's a lot of files to update, but this is 
already in progress.

So if you are willing to wait a few days you can probably save yourself 
some headaches...


  Marc




On 04/11/2015 12:13 PM, Rainer Johannes wrote:
 Hi Marc,

 you're right. I'll start with option 1. For that it would however be 
 really nice to have the seqinfo available in the GRanges as mentioned 
 in my previous mail. In the meantime I'll try to fetch the chrom 
 lengths myself but would be nice to have all that ready in the GRanges 
 at some point.

 cheers, jo

 On 11 Apr 2015, at 00:54, Marc Carlson mcarl...@fredhutch.org 
 mailto:mcarl...@fredhutch.org wrote:

 On 04/10/2015 12:18 PM, Rainer Johannes wrote:
 dear Sonali, Herve,

 On 10 Apr 2015, at 19:59, Herv� Pag�s hpa...@fredhutch.org 
 mailto:hpa...@fredhutch.orgmailto:hpa...@fredhutch.org wrote:

 Hi Johannes, Sonali,

 On 04/10/2015 09:40 AM, Arora, Sonali wrote:
 Hi Rainer,

 Just to be clear - what do you want to be available from AnnotationHub()
 in the end?

 Currently the GTF files from Ensembl are already present inside the
 AnnotationHub

 library(AnnotationHub)
 ah = AnnotationHub()
 gtf - query(ah, GTF)
 gtf - query(gtf, Ensembl)
 gtf[1]
 gtf[[1]] # returned to you as  GenomicRanges object.

 - why not get the GTF files directly from AnnotationHub instead of
 getting them from the ftp site? Then you can make your EnsDb classes
 from these GRanges.
 It will also make your recipe faster because you will not have to
 download the file and parse the object.

 A GRanges object is not the same as a GTF file and I guess Johannes
 wants access to the GTF file. Are these GTF files available on
 AnnotationHub?


 yes, you're right. I wanted access to the GTF file and most likely 
 understood the AnnotationHub idea wrong... my idea was to build a 
 recipe that takes as input the GTF file (as the 
 makeEnsemblGtfToGRanges) and generates from that the EnsDb SQLite 
 database file. I thought that these SQLite files would be generated 
 on the fly on the user's computer, but I guess that stuff is 
 processed once and stored on your servers, right?
 Hi Johannes,

 So you have several options actually.  We sometimes store the files in
 S3 and then send them down/cache them as requested and other times the
 hub can just point to an existing ftp site (and files get
 transformed/cached on the fly when users ask for them).  So you have
 three choices here:

 1) You could just write a function that takes in one of the processed
 GRanges objects and transforms it into an EnsDb object. This should be
 straightforward and is probably your easiest option since you won't have
 to write a recipe OR have any code included into the AnnotationHub.  You
 can basically just take advantage of the fact that these data are
 already there in the hub waiting to be used.
 2) You could write R code that transforms a GTF file into a sqlite file
 and ALSO a recipe to call that (and create metadata) for all the GTF
 files.  This will be more work than #1 since you will have to write both
 a recipe and port any code that you have for generating the DB files.
 But when you are done you would be able to have your resources come
 right out of the AnnotationHub.
 3) You could write R code to process a GRanges object into an EnsDb
 object and then also write a recipe so that your data resources can be
 served up directly from the AnnotationHub, but still take advantage of
 what is already there (GRanges).  No new data would need to be added to
 the hub since new metadata records could allow users to transform the
 data into EnsDb objects on the fly.  This is an elegant solution, but it
 will still take more effort than option #1.

 If I were you, I would start with option #1.  That way if (after I got
 that working) I still wanted things to be more elegant, then I could
 then add a recipe (thus evolving the strategy into option #3...


  Marc




 @Johannes - Here is one alternative: You could take a different approach
 and implement some equivalent of makeTxDbFromGRanges() for EnsDb
 objects. So people could just do:

  library(ensembldb)
  ensdb - makeEnsDbFromGRanges(gtf[[1]])

 like they can do right now with makeTxDbFromGRanges():

  library(GenomicFeatures)
  txdb - makeTxDbFromGRanges(gtf[[1]])

 That way you don't need a recipe or try to add things to 
 AnnotationHub at all.


 that's a good idea, I will implement that too. just want to make 
 sure that I can get all data I'll need (also the genome build 
 version, Ensembl version etc from the GRanges, most likely I have to 
 guess that from the file name of the RData file).

 @Sonali - These GRanges objects I get from AnnotationHub have no genome
 information and their seqlevels are not sorted:

 seqinfo(gtf[[1

Re: [Bioc-devel] AnnotationHubData Error: Access denied: 530

2015-04-10 Thread Marc Carlson
On 04/10/2015 12:18 PM, Rainer Johannes wrote:
 dear Sonali, Herve,

 On 10 Apr 2015, at 19:59, Herv� Pag�s 
 hpa...@fredhutch.orgmailto:hpa...@fredhutch.org wrote:

 Hi Johannes, Sonali,

 On 04/10/2015 09:40 AM, Arora, Sonali wrote:
 Hi Rainer,

 Just to be clear - what do you want to be available from AnnotationHub()
 in the end?

 Currently the GTF files from Ensembl are already present inside the
 AnnotationHub

 library(AnnotationHub)
 ah = AnnotationHub()
 gtf - query(ah, GTF)
 gtf - query(gtf, Ensembl)
 gtf[1]
 gtf[[1]] # returned to you as  GenomicRanges object.

 - why not get the GTF files directly from AnnotationHub instead of
 getting them from the ftp site? Then you can make your EnsDb classes
 from these GRanges.
 It will also make your recipe faster because you will not have to
 download the file and parse the object.

 A GRanges object is not the same as a GTF file and I guess Johannes
 wants access to the GTF file. Are these GTF files available on
 AnnotationHub?


 yes, you're right. I wanted access to the GTF file and most likely understood 
 the AnnotationHub idea wrong... my idea was to build a recipe that takes as 
 input the GTF file (as the makeEnsemblGtfToGRanges) and generates from that 
 the EnsDb SQLite database file. I thought that these SQLite files would be 
 generated on the fly on the user's computer, but I guess that stuff is 
 processed once and stored on your servers, right?
Hi Johannes,

So you have several options actually.  We sometimes store the files in 
S3 and then send them down/cache them as requested and other times the 
hub can just point to an existing ftp site (and files get 
transformed/cached on the fly when users ask for them).  So you have 
three choices here:

1) You could just write a function that takes in one of the processed 
GRanges objects and transforms it into an EnsDb object. This should be 
straightforward and is probably your easiest option since you won't have 
to write a recipe OR have any code included into the AnnotationHub.  You 
can basically just take advantage of the fact that these data are 
already there in the hub waiting to be used.
2) You could write R code that transforms a GTF file into a sqlite file 
and ALSO a recipe to call that (and create metadata) for all the GTF 
files.  This will be more work than #1 since you will have to write both 
a recipe and port any code that you have for generating the DB files.  
But when you are done you would be able to have your resources come 
right out of the AnnotationHub.
3) You could write R code to process a GRanges object into an EnsDb 
object and then also write a recipe so that your data resources can be 
served up directly from the AnnotationHub, but still take advantage of 
what is already there (GRanges).  No new data would need to be added to 
the hub since new metadata records could allow users to transform the 
data into EnsDb objects on the fly.  This is an elegant solution, but it 
will still take more effort than option #1.

If I were you, I would start with option #1.  That way if (after I got 
that working) I still wanted things to be more elegant, then I could 
then add a recipe (thus evolving the strategy into option #3...


  Marc




 @Johannes - Here is one alternative: You could take a different approach
 and implement some equivalent of makeTxDbFromGRanges() for EnsDb
 objects. So people could just do:

   library(ensembldb)
   ensdb - makeEnsDbFromGRanges(gtf[[1]])

 like they can do right now with makeTxDbFromGRanges():

   library(GenomicFeatures)
   txdb - makeTxDbFromGRanges(gtf[[1]])

 That way you don't need a recipe or try to add things to AnnotationHub at all.


 that's a good idea, I will implement that too. just want to make sure that I 
 can get all data I'll need (also the genome build version, Ensembl version 
 etc from the GRanges, most likely I have to guess that from the file name of 
 the RData file).

 @Sonali - These GRanges objects I get from AnnotationHub have no genome
 information and their seqlevels are not sorted:

seqinfo(gtf[[1]])
   Seqinfo object with 22 sequences from an unspecified genome; no seqlengths:
 seqnames seqlengths isCircular genome
 X  NA   NA   NA
 9  NA   NA   NA
 8  NA   NA   NA
 7  NA   NA   NA
 6  NA   NA   NA
 ... .........
 12 NA   NA   NA
 11 NA   NA   NA
 10 NA   NA   NA
 1  NA   NA   NA
 MT NA   NA   NA

 I know it's easy enough to sort the seqlevels with sortSeqlevels() but
 what about having these things done by the recipe instead?


 I also have a suggestion there: what if you used also the 
 fetchChromLengthsFromEnsembl from the GenomicFeatures package? the GTF files 
 are anyway from Ensembl, so getting the seqinfo from there would make 
 sense... and I wouldn't have to fetch 

Re: [Bioc-devel] Feature Request--add host and port to makeTxDbPackageFromBiomart

2015-03-19 Thread Marc Carlson

This is done BTW.

 Marc

On 02/27/2015 02:43 PM, Marc Carlson wrote:

Hi Sean,

This seems like a solid suggestion.  I have put it into my queue.

 Marc



On 02/27/2015 04:41 AM, Sean Davis wrote:

Hi, Marc.

Since Ensembl has switched to GRCh38 for their most recent builds, to 
get
access to GRCh37 data now requires a different host and port for 
biomaRt.

These are exposed in the makeTxDbFromBiomart, but not the accompanying
functionality to directly make a package.  Would it make sense to add 
host

and port as arguments for the latter?

Thanks,
Sean

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] keys function of org.Pf.plasmo.db gives an error.

2015-03-18 Thread Marc Carlson

Hi Paolo,

The ORF type has never been available for that package.  This is a bug 
in the columns method (which I will now fix).  Thanks for reporting it.


 Marc



On 03/18/2015 03:00 AM, Paolo Martini wrote:

Dear Bioconductor,

I am working with the annotation package org.Pf.plasmo.db.

I tried to get the keys from the ORF column.


library(org.Pf.plasmo.db)
columns(org.Pf.plasmo.db)

  [1] ORF ENZYME  PATHSYMBOL  GENENAME
  [6] GO  EVIDENCEONTOLOGYGOALL   EVIDENCEALL
[11] ONTOLOGYALL ALIAS2ORF


keys(org.Pf.plasmo.db, ORF)

Error in sqliteSendQuery(con, statement, bind.data) :
   error in statement: no such table: sgd


I tried both stable and devel version but neither the stable nor the devel
seemed to work.

To my knowledge sgd is related to S. cerevisiae.

Is the ORF keytype availble for Malaria?

Thanks a lot.


sessionInfo()

R Under development (unstable) (2015-03-16 r67994)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.10

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] org.Pf.plasmo.db_3.1.0 RSQLite_1.0.0  DBI_0.3.1
[4] AnnotationDbi_1.29.17  GenomeInfoDb_1.3.13IRanges_2.1.43
[7] S4Vectors_0.5.22   Biobase_2.27.2 BiocGenerics_0.13.6




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] keys function of org.Pf.plasmo.db gives an error.

2015-03-18 Thread Marc Carlson

Ok I spoke a little too quickly earlier.  Looking back, there should
indeed be a field from columns() called 'ORF' and it is basically used
here as the central gene ID for this single package.  But the code for
keys (and columns) was getting this 'ORF' conflated with the other 'ORF'
used in resources from SGD (which this has nothing to do with since the
data all comes from plasmoDB).

Anyhow, I have fixed the software bugs and pushed a patch to release and
devel.  It should be up in about a day.

  Marc



On 03/18/2015 10:23 AM, Marc Carlson wrote:

Hi Paolo,

The ORF type has never been available for that package.  This is a bug 
in the columns method (which I will now fix).  Thanks for reporting it.


 Marc



On 03/18/2015 03:00 AM, Paolo Martini wrote:

Dear Bioconductor,

I am working with the annotation package org.Pf.plasmo.db.

I tried to get the keys from the ORF column.


library(org.Pf.plasmo.db)
columns(org.Pf.plasmo.db)

  [1] ORF ENZYME  PATHSYMBOL GENENAME
  [6] GO  EVIDENCEONTOLOGYGOALL EVIDENCEALL
[11] ONTOLOGYALL ALIAS2ORF


keys(org.Pf.plasmo.db, ORF)

Error in sqliteSendQuery(con, statement, bind.data) :
   error in statement: no such table: sgd


I tried both stable and devel version but neither the stable nor the 
devel

seemed to work.

To my knowledge sgd is related to S. cerevisiae.

Is the ORF keytype availble for Malaria?

Thanks a lot.


sessionInfo()

R Under development (unstable) (2015-03-16 r67994)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.10

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] org.Pf.plasmo.db_3.1.0 RSQLite_1.0.0  DBI_0.3.1
[4] AnnotationDbi_1.29.17  GenomeInfoDb_1.3.13IRanges_2.1.43
[7] S4Vectors_0.5.22   Biobase_2.27.2 BiocGenerics_0.13.6




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Feature Request--add host and port to makeTxDbPackageFromBiomart

2015-02-27 Thread Marc Carlson

Hi Sean,

This seems like a solid suggestion.  I have put it into my queue.

 Marc



On 02/27/2015 04:41 AM, Sean Davis wrote:

Hi, Marc.

Since Ensembl has switched to GRCh38 for their most recent builds, to get
access to GRCh37 data now requires a different host and port for biomaRt.
These are exposed in the makeTxDbFromBiomart, but not the accompanying
functionality to directly make a package.  Would it make sense to add host
and port as arguments for the latter?

Thanks,
Sean

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] OrganismDb and associated TxDb

2015-02-13 Thread Marc Carlson

Hi Vince,

First of all thank you for using OrganismDb objects.  You raise some 
interesting points though about keeping these APIs better synchronized 
that I feel point to some deficiencies in the current design.  I spoke 
with Herve about this and we are puzzling over possibly using 
inheritance to make this a bit easier for maintenance.



 Marc



On 02/13/2015 10:01 AM, Vincent Carey wrote:

Gviz has a nice way of working with TxDb instances to derive gene models.
It can be cumbersome to refer to a TxDb instance, and the Homo.sapiens
OrganismDb instance is very convenient to work with.

I do not see any straightforward way to extract a reference to a TxDb from
Homo.sapiens.  I could traverse the graph slot but class?OrganismDb
makes no reference to this.

In summary, I think it would be good to document the OrganismDb API and
to think about preferences for using OrganismDb as opposed to
TxDb and OrgDb (org.Hs.eg.db) whenever possible.

BTW I attempted to 'patch' Gviz by substituting OrganismDb for TxDb --
there are only two references to TxDb in Gviz ... and it would seem that
the necessary operations apply to OrganismDb just as well as to TxDb.
But the APIs are not in sync ... I ran into seqlevels0 ... and that is
something
of a mystery.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Package submission with library requirement

2015-01-30 Thread Marc Carlson

Hi Avinash,

So the argument for the importance of reproducible research *definitely* 
resonates with us here as it is a major goal of ours.  However while the 
decision to use the same library as your paper helps to make the 
immediate work more reproducible, it simultaneously hampers others from 
benefiting from that because of the engineering problems that it creates 
for end users.


Ultimately, this project has a longstanding commitment to try and 
provide not only a way for your previous work to be validated, but also 
a way for others to build upon and eventually extend that previous 
work.  And both of these goals are crucial if your package is to be 
valuable to the greater scientific community over the long term.


Anyhow I will try and work with you on our issue tracker to see if we 
can find a way to resolve this with you.  Thanks for contributing!



 Marc


On 01/26/2015 08:14 AM, avinash sahu wrote:

Hi Dan,

Now I have included source code of the rsampl.h in the GOAL package.
Although, rlecuyer is good candidate for random number generator, I
currently avoid using it because I wanted results of our  submitted
manuscript to be completely reproducible. I can reproduce the results using
ransampl library by setting seed that I have stored. Changing to other
random generator libraries will imply that I have recheck results of the
manuscript are reproducible and possibly change some of them which is not
possible at this stage. I will reserve that inclusion for the future. I
have resubmitted the GOAL package.

thanks
avi

On Fri, Jan 23, 2015 at 9:37 PM, Levi Waldron levi.wald...@hunter.cuny.edu
wrote:


On Fri, Jan 23, 2015 at 1:58 PM, Dan Tenenbaum dtene...@fredhutch.org
wrote:

However, you should consider using Rlecuyer as it has no external

dependencies (see Levi's post to this thread). Then your package should
build on windows.

I think so too - it's also a standard solution in R, implemented
natively in r-core's parallel library and suggested by the snow
library.  I used it in my pensim library before transitioning to
parallel, and have tested its streams on hyperthreaded CPUs and
clusters.

--
Levi Waldron
Assistant Professor of Biostatistics
City University of New York School of Public Health, Hunter College
2180 3rd Ave Rm 538
New York NY 10035-4003
phone: 212-396-7747
www.waldronlab.org

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Replacing deprecated org.Hs.egCHR and friends

2015-01-13 Thread Marc Carlson

Hi Peter,

I would add that you can see a listing all the currently 
pre-manufactured TxDb packages here:


http://www.bioconductor.org/packages/devel/BiocViews.html#___TxDb

And for convenience you can also use an OrgansimDb package to connect 
the contents of the TxDb package with the older org packages.  You can 
learn more about those (and the other annotation resources) here:


http://www.bioconductor.org/help/workflows/annotation/annotation/

Hope this helps you to be better acquainted!


 Marc



On 01/13/2015 07:30 AM, James W. MacDonald wrote:

Hi Peter,

This isn't a devel question. Next time please ask this sort of thing on the
support site.

As for the message, it seems pretty clear to me. The org.Hs.eg.db package
doesn't have the chromosomal location data any more, but the relevant TxDb
package does have those data, in a much more useful format. The message
can't be any more explicit than that, as there is more than one TxDb
package for human.

You could have hypothetically gone to the annotation data page (
http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData)
and searched for, say 'TxDb', in which case you would see three packages
with names like TxDb.Hsapiens.UCSC.hg19.knownGene. Which one you decide to
use is dependent on the build/source you care about.

And if you are completely unfamiliar with these packages, you need to read
the GenomicFeatures vignette.

Best,

Jim



On Tue, Jan 13, 2015 at 12:34 AM, Peter Langfelder 
peter.langfel...@gmail.com wrote:


Hi all,

can anyone please explain or point me to an explanation of how to
replace org.Hs.egCHR and friends that appear to be deprecated in the
devel version? The deprecation message isn't very helpful. Thanks!

x = org.Hs.egCHR
Warning message:
In (function ()  :
   org.Hs.egCHR is deprecated. Please use an appropriate TxDb object or
   package for this kind of data.

sessionInfo()

R Under development (unstable) (2014-11-24 r67057)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
  [1] org.Hs.eg.db_3.0.0RSQLite_1.0.0 DBI_0.3.1
  [4] AnnotationDbi_1.29.12 GenomeInfoDb_1.3.12   IRanges_2.1.35
  [7] S4Vectors_0.5.16  Biobase_2.27.1BiocGenerics_0.13.4
[10] BiocInstaller_1.17.3

loaded via a namespace (and not attached):
[1] tools_3.2.0

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel






___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] FW: GO offspring consistency

2014-12-05 Thread Marc Carlson

Hi Jelle,

Thank you for your patience in waiting for my answer here.  It took me a 
lot longer to properly test and validate this than I initially expected.


So if you look at amigo you can see these graph views that show you what 
the current terms up and downstream of a given GO term should be:


http://amigo.geneontology.org/amigo/term/GO:0006915

vs  it's offspring term.

http://amigo.geneontology.org/amigo/term/GO:0042981

And you can see (if you click on the inferred tree view for GO:0006915) 
that GO:0042981 is actually listed there as an offspring term.


Which just that leaves us with the mystery of why:

all(subsetapt %in% setapt)

Would ever return false?


Now to do some more digging, if we carry your example one step further 
we can do this to extract the specific terms that have this surprising 
result:


subsetapt[!subsetapt %in% setapt]


And lets look closer at the very 1st result (out of 3) that we see: 
GO:0035602.


So now we would then expect that: GO:0006915 - GO:0042981 - GO:0035602

Especially since the very latest amigo diagrams show this set of 
relationships for this term.


http://amigo.geneontology.org/amigo/term/GO:0035602

But if we look more closely at this term we can notice something unusual 
about it.  Specifically if you look at the Graph views you will see that 
it has a 'part of' rather than an 'is a' relationship to the rest of the 
DAG.  An examination of the other two non-compliant terms indicates that 
they too have this kind of relationship:


http://amigo.geneontology.org/amigo/term/GO:0044336

http://amigo.geneontology.org/amigo/term/GO:0044337


Also of interest is the fact that the highest level term you tested  
(GO:0006915), has a broader kind of relationship to the rest of the 
DAG).  Now please hold onto those thoughts while I tell you another 
important fact.


http://amigo.geneontology.org/amigo/term/GO:0006915


The contents of the GOBPOFFSPRING mapping are ultimately derived from 
the graph_path table that you can find here:


http://geneontology.org/page/lead-database-schema#go-optimisations.table.graph-path

And they are indeed a faithful representation of what is in that table 
(from GO).  That is, the source files both when I made the latest GO.db 
package for the October release and now have the same properties for 
their set of relationships as you pointed out.  So for our 1st example, 
in both places you will find that GO:0035602 is listed as having an 
implied link when you ask for GO:0042981 but not when you ask for 
GO:0006915.


So the very unsatisfying answer to your question is that the terms have 
this relationship because that is what the data at GO say. :P


But the (hopefully) more satisfying answer is that the kind of 
relationships that these terms have to each other creates implications 
for whether or not they can be transitively associated in the GO 
graph_path table.  That is, the child term GO:0035602 is not able to 
be implicitly linked to GO:0006915 because that term has a 'regulates' 
relationship to the offspring terms and *also* because GO:0035602 has 
a 'part of' relationship (instead of an 'is a' relationship) to its 
parent terms.  And those issues don't crop up between the other terms in 
this part of the graph.


I hope this explains things better for you,


 Marc




On 12/02/2014 04:29 AM, jelle.goe...@radboudumc.nl wrote:

  Hi All,

When working with the GO.db package we ran into a seeming inconsistency in the 
GOBPOFFSPRING object. It seems there that a term's offspring may have offspring 
that is not offspring of the term itself. This seems inconsistent with the DAG 
structure of gene ontology.


library(GO.db)
xx - as.list(GOBPOFFSPRING)
setapt - xx$GO:0006915 #apoptosis
subsetapt - xx$GO:0042981 #offspring of apoptosis
GO:0042981%in%setapt

[1] TRUE

all(subsetapt %in% setapt)

[1] FALSE

Is there something wrong or are we misunderstanding the GOBPOFFSPRING object?

Best wishes,

Jelle
Het Radboudumc staat geregistreerd bij de Kamer van Koophandel in het 
handelsregister onder nummer 41055629.
The Radboud university medical center is listed in the Commercial Register of 
the Chamber of Commerce under file number 41055629.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] About Hg38 BSgenome

2014-12-02 Thread Marc Carlson

Hi Raffaele,

You are in luck today because while we normally do *not* have mechanisms 
to harmonize the non-standard chromosome names, for this specific case 
Herve wrote some code to handle it.  So you want to look at this:


library(GenomeInfoDb)
?fetchExtendedChromInfoFromUCSC


 Marc



On 12/02/2014 07:15 AM, Julian Gehring wrote:

Hi Raffaele,

Ignore my last post completely, it was overly optimistic:

The 'BSgenome.Hsapiens.NCBI.GRCh38' package contains the genomic
sequence that is identical between GRCh38 and hg38.  The naming of the
chromosomes is different.  For the toplevel chromosomes, the names can
be easily converted:

   library(BSgenome.Hsapiens.NCBI.GRCh38)
   library(TxDb.Hsapiens.UCSC.hg38.knownGene)

   bs = BSgenome.Hsapiens.NCBI.GRCh38
   seqlevelsStyle(bs) = UCSC ## convert to UCSC style

   seqlevels(BSgenome.Hsapiens.NCBI.GRCh38)

   seqlevels(bs)
   seqlevels(TxDb.Hsapiens.UCSC.hg38.knownGene)

However, this does not work for the non-toplevel chrs, e.g.:
'HSCHR19KIR_RP5_B_HAP_CTG3_1' does not have a corresponding sequence in
the 'TxDb.Hsapiens.UCSC.hg38.knownGene' (and also won't be converted).

Best
Julian


Julian Gehring (12/02/14 15:44):


Hi Raffaele,
You can find it under the name
   BSgenome.Hsapiens.NCBI.GRCh38
   
http://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.NCBI.GRCh38.html
 
(http://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.NCBI.GRCh38.html)
The naming of the chromosomes has been harmonized between UCSC and GRCh with 
the new release, so there should be no need for two versions at the genome 
level.
Best
Julian
On Tue, Dec 2, 2014 at 15:12, Raffaele Adolfo Calogero  wrote:
Dear Bioc Team,
I am the maintainer of chimera package.
Recently some of the users asked for the possibility to use chimera with
fusions detected on hg38 human genome.
I checked for the availability of hg38 as BSgenome but I did not find it in
Bioc repository, as instead there is TxDb.Hsapiens.UCSC.hg38.knownGene. I
would like to know if it is planned the release of hg38 as BSgenome, maybe
in the next Bioc release.
In case it is not planned could please suggest me what to read to build it?
Cheers
Raffaele

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Google hangout on Wed December 10th for new package authors

2014-12-02 Thread Marc Carlson

Hello new package authors,

Based on the number of new software packages being submitted to the 
project it seems that Bioconductor is more popular than ever.  Last 
release we added a hundred and ten new packages (a new record).


A lot of the popularity of this project is because Bioconductor packages 
have to live up to certain minimal standards (Nature Genetics thinks so 
too, e.g., http://www.nature.com/ng/journal/v46/n1/full/ng.2869.html). 
For example every Bioconductor package is expected to:


1) provide complete documentation so that new users will know how to use 
them
2) contain working examples that are run when the package is checked by 
the build system so that failure can be detected early.
3) cooperate with related packages within the project so as to 
facilitate code reuse and support reproducible research.


We hope you will agree that having such package guidelines is a big win 
for the whole community.


To help *you* contribute to Bioconductor, we are going to have a Google 
hangout (on air) to allow you to tune in, listen to some tips from 
Bioconductor package reviewers and then open up the forum for questions.


Webinar Invitation: Contributing your package to Bioconductor: 
guidelines and overview

Date: December 10, 2014
Time: 8:00 AM PST /11:00 AM  EST

Please 'tune in' December 10th at 8AM PST for a Google Hangout to 
discuss new package contributions.  And learn how to maximize the value 
of your package contribution to the Bioconductor community.


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] AnnotationDbi::loadDb() now requires dbType and dbPackage?

2014-10-10 Thread Marc Carlson
Actually they are required in one sense.  Just not required of an end 
user who would be calling the function (so not for that manual page).


But they are required in a separate internal sense (which is what the 
error is referring to) and that is that the database that is loaded it 
must contain metadata that specifies this information. But this is 
really just implementation details that only people who make such 
databases ever really need to know about.  The error in this case just 
happens to be the 1st one that is hit when your package tried to load a 
database file that was recently renamed. By the time you read this post 
though, the problem caused by that test database resource being renamed 
has already been fixed.


 Marc



On 10/10/2014 08:28 AM, Leonardo Collado Torres wrote:

On Fri, Oct 10, 2014 at 11:24 AM, Leonardo Collado Torres
lcoll...@jhu.edu wrote:

Hi,

I think that the docs for ?loadDb (AnnotationDbi) need to be updated
as described below.


According to ?loadDb in AnnotationDbi 1.27.19


dbType
dbType - not required

dbPackage
dbPackage - not required


However, R CMD check for derfinder 0.99.5 and R CMD build for
GenomicFeatures 1.17.21 are failing due to dbType not being specified.
See 
http://bioconductor.org/checkResults/devel/bioc-LATEST/derfinder/oaxaca-checksrc.html
and  
http://bioconductor.org/checkResults/devel/bioc-LATEST/GenomicFeatures/zin1-buildsrc.html
So I guess that the documentation needs to be updated or something
went wrong after GenomicFeatures 1.17.20 (with AnnotationDbi 1.17.21)

Err, I meant 1.27.19 here


because I could get it to work then (see further below).

Anyhow, I'll change the example in derfinder::makeGenomicState and
specify dbType and dbPackage




library('GenomicFeatures')
samplefile - system.file('extdata', 'UCSC_knownGene_sample.sqlite',

+   package='GenomicFeatures')

old - loadDb(samplefile)
new - loadDb(samplefile, dbType = 'TxDb', dbPackage = 'GenomicFeatures')

## For some reason they are not identical.
## My guess is that each one has a different connection, while
everything else is the ## same.


identical(old, new)

[1] FALSE

## However, by 'eye' they look the same


old

TxDb object:
| Db type: TranscriptDb
| Supporting package: GenomicFeatures
| Data source: UCSC
| Genome: hg18
| Genus and Species: Homo sapiens
| UCSC Table: knownGene
| Resource URL: http://genome.ucsc.edu/
| Type of Gene ID: Entrez Gene ID
| Full dataset: no
| miRBase build ID: NA
| transcript_nrow: 135
| exon_nrow: 544
| cds_nrow: 324
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2012-04-13 14:47:54 -0700 (Fri, 13 Apr 2012)
| GenomicFeatures version at creation time: 1.9.4
| RSQLite version at creation time: 0.11.1
| DBSCHEMAVERSION: 1.0

new

TxDb object:
| Db type: TranscriptDb
| Supporting package: GenomicFeatures
| Data source: UCSC
| Genome: hg18
| Genus and Species: Homo sapiens
| UCSC Table: knownGene
| Resource URL: http://genome.ucsc.edu/
| Type of Gene ID: Entrez Gene ID
| Full dataset: no
| miRBase build ID: NA
| transcript_nrow: 135
| exon_nrow: 544
| cds_nrow: 324
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2012-04-13 14:47:54 -0700 (Fri, 13 Apr 2012)
| GenomicFeatures version at creation time: 1.9.4
| RSQLite version at creation time: 0.11.1
| DBSCHEMAVERSION: 1.0


devtools::session_info()

Session 
info--
  setting  value
  version  R version 3.1.1 (2014-07-10)
  system   x86_64, darwin10.8.0
  ui   AQUA
  language (EN)
  collate  en_US.UTF-8
  tz   America/New_York

Packages--
  package   * version  date   source
  AnnotationDbi * 1.27.19  2014-10-05 Bioconductor
  base64enc   0.1.22014-06-26 CRAN (R 3.1.0)
  BatchJobs   1.4  2014-09-24 CRAN (R 3.1.1)
  BBmisc  1.7  2014-06-21 CRAN (R 3.1.0)
  Biobase   * 2.25.1   2014-10-09 Bioconductor
  BiocGenerics  * 0.11.5   2014-09-13 Bioconductor
  BiocParallel0.99.25  2014-10-02 Bioconductor
  biomaRt 2.21.5   2014-10-07 Bioconductor
  Biostrings  2.33.14  2014-09-09 Bioconductor
  bitops  1.0.62013-08-17 CRAN (R 3.1.0)
  brew1.0.62011-04-13 CRAN (R 3.1.0)
  checkmate   1.4  2014-09-03 CRAN (R 3.1.1)
  codetools   0.2.92014-08-21 CRAN (R 3.1.1)
  DBI 0.3.12014-09-24 CRAN (R 3.1.1)
  devtools1.6.12014-10-07 CRAN (R 3.1.1)
  digest  0.6.42013-12-03 CRAN (R 3.1.0)
  fail1.2  2013-09-19 CRAN (R 3.1.0)
  foreach 1.4.22014-04-11 CRAN (R 3.1.0)
  futile.logger   1.3.72014-01-23 CRAN (R 3.1.0)
  futile.options  1.0.02010-04-06 CRAN (R 3.1.0)
  GenomeInfoDb  * 1.1.25   2014-10-02 Bioconductor
  GenomicAlignments   1.1.30   

Re: [Bioc-devel] new error(?) related to annotation: illuminaHumanv1CHR is deprecated

2014-09-23 Thread Marc Carlson
Hi Vince,

You raise an important point that a common use of the chipDb objects 
will become overly complicated with this change. Especially since chip 
platforms should really have an implicit genome that they were designed 
for from the get go.  And since annotations packages are being build 
right now there isn't time to address all of these problems optimally so 
I am going to put these deprecations on ice till sometime after the release.

Thanks for you feedback,


  Marc



On 09/22/2014 08:03 PM, Vincent Carey wrote:
 Thanks for the clarification.  Isn't there a way via active bindings 
 to preserve the
 interfaces conferred by e.g., illuminaHumanv1CHRLOC, so that queries 
 to the
 object (no longer a Bimap) succeed with the endorsed metadata?  the 
 chipDb
 packages would be revised to use a new protocol for these queries that 
 go through
 TxDb.

 On Mon, Sep 22, 2014 at 8:38 PM, Marc Carlson mcarl...@fhcrc.org 
 mailto:mcarl...@fhcrc.org wrote:

 Hi Vince,

 So if you wanted to do this manually, then the thing you would
 want to do is to get a gene ID from the probe and to take that to
 a TranscriptDb object (again: that is if you wanted to do it
 manually).  Alternatively, if you had an OrganismDb object then
 this association would be handled for you (where it would be
 spelled out explicitly).  The explicit nature is what we are after
 here since where a gene is expected to be (chromosome wise) can
 depend on the build of genome you are using.  As people move
 between standard genomes and eventually to custom ones, we needed
 to decouple this kind of data from the organism packages (which
 are only ever intended to hold gene-centered data).

  Marc



 On 09/21/2014 08:21 AM, Vincent Carey wrote:

 On Sun, Sep 21, 2014 at 11:07 AM, Martin Morgan
 mtmor...@fhcrc.org mailto:mtmor...@fhcrc.org wrote:

 On 09/21/2014 07:44 AM, Vincent Carey wrote:

 this is coming out of the build system for GGtools ...
 not easy to find as
 the

 problem seems to cause emission of megabytes of warnings


 illuminaHumanv1CHR is deprecated as the data is better
 accessed from

 another location. Please use an appropriate TxDb
 object or package for

 this kind of data.


 i don't see the deprecation in the doc for
 illuminaHumanv1.db and i cannot

 get a get() to throw it.  i also don't see this on the
 devel version
 package landing page

 Marc will likely reply on Monday. But the intention is
 that the CHR bimaps
 in *db packages Marc curates are being deprecated. The
 deprecation itself
 occurs in AnnotationDbi, I think. The reason is the lack
 of provenance for
 this information -- what genome build does it refer to? --
 and its
 availability from other sources (i.e., the TxDb packages)
 with provenance.


 Nice to hear about the streamlining and improved provenance. 
 I confess I
 don't
 see how to get a probe-chr mapping out of TxDb -- is there
 something new in
 there?
 A select operation that can resolve queries about manufacturer
 identifiers?


 illuminaHumanv1CHR

 illuminaHumanv1CHR is deprecated as the data is better
 accessed from
another location. Please use an appropriate TxDb object
 or package for
this kind of data.
 CHR map for chip illuminaHumanv1 (object of class
 ProbeAnnDbBimap)

 I think this is currently a message, but should be a warning.

 AnnotationDbi is not building successfully, so its
 biocLite() version and
 landing page are not in sync with an svn checkout (used by
 the build
 system); to replicate on your own system requires an svn
 install, at least
 until AnnotationDbi builds successfully.

 OK, so I can get the message now.  But I think more details
 need to be
 supplied if
 we are to drop references to *CHR.


 I guess the megabytes of warnings come from code in
 GGtools or elsewhere;
 maybe there's a


 Indeed.  Perhaps unrelated to this.

 convenient way of aggregating them (hopefully before throwing
 the warning,

 since that can be quite expensive).

 Martin


  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org
 mailto:Bioc-devel@r

Re: [Bioc-devel] deprecated org.Hs.egCHRLOC UPDATE: the as.list() behaves differently in stable and devel configuration

2014-09-23 Thread Marc Carlson

Hi Raffaele,

This problem should be resolved in devel at this time.  Please update to 
the latest version of AnnotationDbi (1.27.14) and try again.


 Marc



On 09/23/2014 02:36 PM, calogero UNITO wrote:

affaeleaffaeleHi Vincent,
I have further investigated the error I have in the vignette of chimera
devel package .
It is related to the libraries used to access to org.Hs.eg.db in the
devel branch.

In the presence of the following libraries (stable branch):
[1] org.Hs.eg.db_2.14.0  RSQLite_0.11.4   DBI_0.3.0
[4] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2   Biobase_2.24.0
[7] BiocGenerics_0.10.0  BiocInstaller_1.14.2

If I extract start and end position for the chromosome location from
org.Hs.eg:
  chr.tmps - as.list(org.Hs.egCHRLOC)
  chr.tmpe - as.list(org.Hs.egCHRLOCEND)

as.numeric(chr.tmps[1:3])
[1] -58858172  18248755 -43248163
as.numeric(chr.tmpe[1:3])
[1] -58864865  18258723 -43280376

I get different numbers for star and end of a gene.

In case I used the libraries derived from devel branch
[1] org.Hs.eg.db_2.14.0   RSQLite_0.11.4 DBI_0.3.0
[4] AnnotationDbi_1.27.13 GenomeInfoDb_1.1.19 IRanges_1.99.28
[7] S4Vectors_0.2.4   Biobase_2.25.0 BiocGenerics_0.11.5

   chr.tmps - as.list(org.Hs.egCHRLOC)
org.Hs.egCHRLOC is deprecated as the data is better accessed from
another location. Please use an appropriate TxDb object or package for
this kind of data.
   chr.tmpe - as.list(org.Hs.egCHRLOCEND)
org.Hs.egCHRLOC is deprecated as the data is better accessed from
another location. Please use an appropriate TxDb object or package for
this kind of data.

as.numeric(chr.tmps[1:3])
[1] -58858172  18248755 -43248163
   as.numeric(chr.tmpe[1:3])
[1] -58858172  18248755 -43248163

The values associated to start and end of the gene are the same.
This is actually the reason why I get errors in the vignette of chimera
package.





R 3.1.1:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

Then I installed the basic configuration needed to use org.Hs.eg.db in
the actual stable release:

source(http://bioconductor.org/biocLite.R;)
biocLite(org.Hs.eg.db)
library(org.Hs.eg.db)

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets methods
[8] base

other attached packages:
[1] org.Hs.eg.db_2.14.0  RSQLite_0.11.4   DBI_0.3.0
[4] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2   Biobase_2.24.0
[7] BiocGenerics_0.10.0  BiocInstaller_1.14.2

loaded via a namespace (and not attached):
[1] IRanges_1.22.10 stats4_3.1.1tools_3.1.1



Then I used  the devel release packages:

|library(BiocInstaller)
useDevel()|
library(org.Hs.eg.db)

sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] org.Hs.eg.db_2.14.0   RSQLite_0.11.4DBI_0.3.0
[4] AnnotationDbi_1.27.13 GenomeInfoDb_1.1.19   IRanges_1.99.28
[7] S4Vectors_0.2.4   Biobase_2.25.0BiocGenerics_0.11.5



On 23/09/14 12:41, Vincent Carey wrote:

Of note, this is not an error and seems at this time not even to be a
warning.
A message is emitted indicating the deprecation, so we have a release to
figure out how to deal with the fact that the *CHR/*CHRLOC entities will go
away
in the next release.

There are various possible workarounds.  Some more commentary will be
forthcoming.


On Tue, Sep 23, 2014 at 3:52 AM, calogero UNITO raffaele.calog...@unito.it
wrote:


Hi,
I am the maintainer of chimera package and I am getting the following
error in the develop version:

org.Hs.egCHRLOC is deprecated as the data is better accessed from
 another location. Please use an appropriate TxDb object or package for
 this kind of data.

Could please indicate me which package I should used instead of
org.Hs.eg.db ?

Cheers
Raf

--

Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
Tel.   ++39 0116706457
Fax++39 0112366457
Mobile ++39 827080
email: raffaele.calog...@unito.it
  raffaele.calog...@gmail.com
www:   http://www.bioinformatica.unito.it


  [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] new error(?) related to annotation: illuminaHumanv1CHR is deprecated

2014-09-22 Thread Marc Carlson

Hi Vince,

So if you wanted to do this manually, then the thing you would want to 
do is to get a gene ID from the probe and to take that to a TranscriptDb 
object (again: that is if you wanted to do it manually).  Alternatively, 
if you had an OrganismDb object then this association would be handled 
for you (where it would be spelled out explicitly).  The explicit nature 
is what we are after here since where a gene is expected to be 
(chromosome wise) can depend on the build of genome you are using.  As 
people move between standard genomes and eventually to custom ones, we 
needed to decouple this kind of data from the organism packages (which 
are only ever intended to hold gene-centered data).


 Marc


On 09/21/2014 08:21 AM, Vincent Carey wrote:

On Sun, Sep 21, 2014 at 11:07 AM, Martin Morgan mtmor...@fhcrc.org wrote:


On 09/21/2014 07:44 AM, Vincent Carey wrote:


this is coming out of the build system for GGtools ... not easy to find as
the

problem seems to cause emission of megabytes of warnings


illuminaHumanv1CHR is deprecated as the data is better accessed from

another location. Please use an appropriate TxDb object or package for

this kind of data.


i don't see the deprecation in the doc for illuminaHumanv1.db and i cannot

get a get() to throw it.  i also don't see this on the devel version
package landing page


Marc will likely reply on Monday. But the intention is that the CHR bimaps
in *db packages Marc curates are being deprecated. The deprecation itself
occurs in AnnotationDbi, I think. The reason is the lack of provenance for
this information -- what genome build does it refer to? -- and its
availability from other sources (i.e., the TxDb packages) with provenance.



Nice to hear about the streamlining and improved provenance.  I confess I
don't
see how to get a probe-chr mapping out of TxDb -- is there something new in
there?
A select operation that can resolve queries about manufacturer identifiers?



illuminaHumanv1CHR

illuminaHumanv1CHR is deprecated as the data is better accessed from
   another location. Please use an appropriate TxDb object or package for
   this kind of data.
CHR map for chip illuminaHumanv1 (object of class ProbeAnnDbBimap)

I think this is currently a message, but should be a warning.

AnnotationDbi is not building successfully, so its biocLite() version and
landing page are not in sync with an svn checkout (used by the build
system); to replicate on your own system requires an svn install, at least
until AnnotationDbi builds successfully.


OK, so I can get the message now.  But I think more details need to be
supplied if
we are to drop references to *CHR.



I guess the megabytes of warnings come from code in GGtools or elsewhere;
maybe there's a


Indeed.  Perhaps unrelated to this.

convenient way of aggregating them (hopefully before throwing the warning,

since that can be quite expensive).

Martin



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] The release is fast approaching. Information about upcoming deadlines.

2014-09-17 Thread Marc Carlson

Hello package contributors,

Please note that next Thursday is the deadline for submitting new 
packages if you want them to make it into the upcoming October release.  
You can see the release schedule here:


http://www.bioconductor.org/developers/release-schedule/

Please also take note of the upcoming deadlines on October 2nd and 6th 
(for existing packages).



Thanks again for participating!


Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] important announcement

2014-09-12 Thread Marc Carlson

Hello,

This is a second warning that in less than a week we plan to roll out
the new support site for Bioconductor.

*Important* Once the support site is 'live', posts to the Bioconductor
mailing list will receive an automatic reply indicating that it is no
longer in service and directing you to the new site. This change
affects the 'bioconductor' mailing list; the 'bioc-devel' mailing list
will continue to function as before.

  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Announcement about the new support site

2014-09-04 Thread Marc Carlson

Hello,

Thank you to those who have participated in the Beta for our new
support site.  The beta period is now over, and we are getting ready
for a formal launch of the site during the week of September 15th.

*Important* Once the support site is 'live', posts to the Bioconductor
mailing list will receive an automatic reply indicating that it is no
longer in service and directing you to the new site. This change
affects the 'bioconductor' mailing list; the 'bioc-devel' mailing list
will continue to function as before.


  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please help us try out our new support site!

2014-08-19 Thread Marc Carlson

Hi Stephanie,

There are two kinds of tag tracking on offer and they should be 
independent of each other.  So on the site when you log in, there is a 
tab called 'My Tags'.  That tab is supposed to list any posts that are 
tagged with your tags of interest and you can look at them whenever you 
log in.  In contrast, the 'Watched Tags should actually email you 
whenever someone hits one of the tags you list in that field.  So you 
might want these two lists to have slightly different contents depending 
on how often you want to get emailed etc.


Also related to this, we have already retroactively tagged and imported 
the past 11+ years of older posts that mention bioconductor package 
names or biocViews terms.  So you should be able to put tags for your 
packages of interest into 'My Tags' and then see related older posts 
listed under your My Tags tab right away.



  Marc


On 08/19/2014 09:51 AM, Stephanie M. Gogarten wrote:
Are tags in My Tags automatically Watched?  Or should I enter those 
tags in both fields?


I really like the option to get email when my packages are mentioned.  
I think it will mean that users get help faster, since those of us who 
are not constantly watching the mailing list will see relevant 
questions right away.


Stephanie

On 8/18/14 12:15 PM, Marc Carlson wrote:

Hello!

This is a message to announce the beta test for our new support site.
We hope to replace the regular Bioconductor mailing list with this
site soon and we have imported the past 11+ years of mailing list
discussion into this new site.  If you would like to help us test it
out you can do that by logging in here:

https://support.bioconductor.org

For those of you who have posted to the bioconductor mailing list
before, you will probably want to recover your well earned reputation
from previous posts and answers.  To do that you will need to scroll
to the bottom of the log in page and click the link that says 'Forgot
Password?'.  This should get you started with your mailing list email
address which will already be linked to your previous posts.

And if you have never posted, then you can start a new account from
that same page.

As you explore the beta, you may come across things that you would
like to see changed or that you feel are not working right. This site
is based on a fork of Biostars, and we ask that you please post such
questions to our github repository for this:

https://github.com/Bioconductor/support.bioconductor.org/issues

We aspire to switch over to this new site in early September, but we are
leaving the schedule flexible depending on how well the beta site
works.  Also: please note that posts made to the new site during the
beta will dissappear after the test period.  We want you to help us
test it, but this is not the live deployment phase quite yet.

I expect there will probably be some other questions about this big
transition. So please ask them as needed and we will try to answer
them the best we can.


   Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Please help us try out our new support site!

2014-08-18 Thread Marc Carlson

Hello!

This is a message to announce the beta test for our new support site.
We hope to replace the regular Bioconductor mailing list with this
site soon and we have imported the past 11+ years of mailing list
discussion into this new site.  If you would like to help us test it
out you can do that by logging in here:

https://support.bioconductor.org

For those of you who have posted to the bioconductor mailing list
before, you will probably want to recover your well earned reputation
from previous posts and answers.  To do that you will need to scroll
to the bottom of the log in page and click the link that says 'Forgot
Password?'.  This should get you started with your mailing list email
address which will already be linked to your previous posts.

And if you have never posted, then you can start a new account from
that same page.

As you explore the beta, you may come across things that you would
like to see changed or that you feel are not working right.  This site
is based on a fork of Biostars, and we ask that you please post such
questions to our github repository for this:

https://github.com/Bioconductor/support.bioconductor.org/issues

We aspire to switch over to this new site in early September, but we are
leaving the schedule flexible depending on how well the beta site
works.  Also: please note that posts made to the new site during the
beta will dissappear after the test period.  We want you to help us
test it, but this is not the live deployment phase quite yet.

I expect there will probably be some other questions about this big
transition. So please ask them as needed and we will try to answer
them the best we can.


  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please help us try out our new support site!

2014-08-18 Thread Marc Carlson

Hi,

So CodersCrowd looks like a pretty neat tool.  But here what we are 
trying to accomplish is something that is less ambitious. Basically 
there are a host of problems that crop up from using a mailing list at 
the scale that we currently use the main bioconductor mailing list.  And 
here we are hoping that our new support site will help out with some of 
them.  To give you an idea about what we were thinking here I will list 
just a few of these problems (in no particular order):


1 - Scale:  With 3500 subscribers to the bioconductor mailing list, 
there is a lot of traffic.  This means that a lot of people will either 
only guess post or they will post and then immediately unsubscribe.  
This basically means that a lot of the people who could most make use of 
this information are currently having trouble getting access to it.  
Having a web site means that we can make sure these questions (and their 
answers) are easily search able and can be read by anyone.  The other 
side of this change is that if you are carefully answering questions, 
that your well earned reputation and your careful answers should now 
find a wider audience.


2 - Repetition:  A lot of times the same questions get asked over and 
over again.  This is bad for everyone, but is especially annoying for 
our package authors who sometimes have to spend a lot of their time 
answering the similar questions over and over again.  Our hope is that 
by capturing your responses into a search able format the 1st time more 
users will be able to discover your hard work and thus benefit from it 
later on.


3 - Mistakes:  Sometimes some spam or something embarrassing will get 
through.  And with mailing lists everything that happens is written in 
permanent ink.  We would rather that we were able to delete spam from 
the public record and that when appropriate you were able to amend 
statements so that they better reflected what you intended.  It's also 
our hope that by amending your answers to questions, you can keep your 
answers current instead of crafting new responses from scratch each time.


Anyhow, those are just some of the problems that we were hoping to 
address.  I hope that the new site helps with these.



 Marc



On 08/18/2014 03:56 PM, Aniba, Radhouane wrote:

Hi Marc,

I am a bit surprised to see that move to a biostars-like website ? Why another 
QA website ?
Why don't you consider a sandbox like website like CodersCrowd that has already 
a docker image of R and Bioconductor where users can reproduce their bugs ? 
Just saying ...

R  Bioconductor is more a deep programming problem solving kind of 
interactions, and so should be the support for it, not just a copy and paste (fork) 
of biostars ( I have nothing against biostars btw)

That's my personal opinion of course :)

Rad
On Aug 18, 2014, at 12:15 PM, Marc Carlson mcarl...@fhcrc.org wrote:


Hello!

This is a message to announce the beta test for our new support site.
We hope to replace the regular Bioconductor mailing list with this
site soon and we have imported the past 11+ years of mailing list
discussion into this new site.  If you would like to help us test it
out you can do that by logging in here:

https://support.bioconductor.org

For those of you who have posted to the bioconductor mailing list
before, you will probably want to recover your well earned reputation
from previous posts and answers.  To do that you will need to scroll
to the bottom of the log in page and click the link that says 'Forgot
Password?'.  This should get you started with your mailing list email
address which will already be linked to your previous posts.

And if you have never posted, then you can start a new account from
that same page.

As you explore the beta, you may come across things that you would
like to see changed or that you feel are not working right.  This site
is based on a fork of Biostars, and we ask that you please post such
questions to our github repository for this:

https://github.com/Bioconductor/support.bioconductor.org/issues

We aspire to switch over to this new site in early September, but we are
leaving the schedule flexible depending on how well the beta site
works.  Also: please note that posts made to the new site during the
beta will dissappear after the test period.  We want you to help us
test it, but this is not the live deployment phase quite yet.

I expect there will probably be some other questions about this big
transition. So please ask them as needed and we will try to answer
them the best we can.


  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Question about which new organism resources to create

2014-05-06 Thread Marc Carlson

Hi everyone,

As many of you already know we have long provided organism annnotation 
packages that give gene based annotations for selected organisms.  And 
we intend to keep doing that.  But these days there is also a lot of 
other data at NCBI that could be used to make gene based databases for 
other organisms.  And at the same time, there is also greater and 
greater demand for annotations from other organisms too.  So I aim to 
make organism based gene databases for a wider range of organisms.  
However instead of just making more packages, I intend to put these DBs 
into the AnnotationHub.  You can get an idea about what access will be 
like by looking at the inparanoid8 objects that were put in for the last 
release.


library(AnnotationHub)
ah = AnnotationHub()
hs8 = ah$inparanoid8.Orthologs.hom.Homo_sapiens.inp8.sqlite
hs8
columns(hs8)
k = head(keys(hs8, 'TOXOPLASMA_GONDII'))
select(hs8, k, 'HOMO_SAPIENS', 'TOXOPLASMA_GONDII')
## etc.

Anyhow my reason for posting is that I am now looking at all the NCBI 
data that could be used for annotation packages and trying to decide 
what to include.  About half of the 14 thousand potential critters in 
the NCBI dataset only have about one gene annotated.  I am guessing that 
it is not worth anyone's time to pre-process those organisms that have 
only one gene.  Or is it?  If you think it might be, now would probably 
be a good time to speak up.


How many annotations do you guys want/expect in an organism package 
before it becomes annoying that you even downloaded it?


Thanks in advance for your opinions,


  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] RE : AnnotationDbi and select function

2014-03-12 Thread Marc Carlson

Also,

There is nothing wrong with using GENEID the way that you initially 
did.  It was just a small bug that prevented some internal subsetting 
from working properly and that is now fixed.


It just happened that GENEID was equivalent to ENTREZID in this case.  
And that ends up making it a slower choice just because the software has 
to do more work (in case GENEID is something else). So since you know 
that these are in fact ENTREZIDs, you can take Jims suggestion as a 
short cut and thus get a little performance boost.


But it's still a less specific thing to request than GENEID (which could 
potentially be another kind of ID).  So the two things (GENEID and 
ENTREZID) are not always the same kind of thing.  They just happened to 
both be ENTREZID in *this* case.  In a different scenario GENEID from 
the associated TranscriptDb might be something like an ensembl gene ID.  
And then to use a shortcut would mean using ENSEMBL instead of ENTREZID 
to do the shortcut...


In contrast: GENEID should normally always work (but it should also be a 
tiny bit slower).


Sorry if you know all this stuff, but I think its better to be explicit 
than to say too little.



  Marc



On 03/12/2014 02:53 PM, Marc Carlson wrote:
I just checked a fix in for this bug to GenomicFeatures (which happens 
to be where the problem was).  It should percolate out to the build 
system soon.


 Marc


On 03/12/2014 02:19 PM, Servant Nicolas wrote:

Hi guys,

Thanks for your feedbacks.
Indeed I put GENEID because it is used in the txdb database.


library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb - TxDb.Hsapiens.UCSC.hg19.knownGene
columns(txdb)

  [1] CDSID  CDSNAMECDSCHROM   CDSSTRAND CDSSTART
  [6] CDSEND EXONID EXONNAME   EXONCHROM EXONSTRAND
[11] EXONSTART  EXONENDGENEID TXID EXONRANK
[16] TXNAME TXCHROMTXSTRAND   TXSTARTTXEND

I will move to ENTREZID which is much faster !
I'm glad It could help
Nicolas


De : bioc-devel-boun...@r-project.org 
[bioc-devel-boun...@r-project.org] de la part de Marc Carlson 
[mcarl...@fhcrc.org]

Date d'envoi : mercredi 12 mars 2014 20:18
À : bioc-devel@r-project.org
Objet : Re: [Bioc-devel] AnnotationDbi and select function

Thanks Nicolaus!  That's a good bug.  I will work on a fix.  The reason
why James work-around here functions is because the number of databases
that it has to query is fewer by one.  It is also faster for this
reason.  So when you say GENEID you mean the ids used in the associated
txdb database which means that these have to be checked against that DB
(and anything related to it extracted) and then merged with the results
of the symbol information by joining on the foreign key for these two
DBs.  So thats actually much more complex than just extracting all the
same data from just the org package even though the end result (in this
case) is the same.  The bug is probably happening in the associated
merge step.

   Marc



On 03/12/2014 10:06 AM, James W. MacDonald wrote:

Hi Nicolas,

On 3/12/2014 12:39 PM, Servant Nicolas wrote:

Dear all,

I have an error using the select function from the AnnotationDbi
package.
I try to convert some geneID into Symbol, but for some strange
reasons it crashed.


library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb - TxDb.Hsapiens.UCSC.hg19.knownGene
isActiveSeq(txdb)[seqlevels(txdb)] - FALSE
isActiveSeq(txdb)[c(chr16,chr1)] - TRUE
geneGR - exonsBy(txdb, gene)
library(Homo.sapiens)
symbol - select(Homo.sapiens, keys = names(geneGR), keytype =
GENEID, columns = SYMBOL)
Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001],
keytype = GENEID,  :
erreur d'évaluation de l'argument 'x' lors de la sélection d'une
méthode pour la fonction 'head' : Erreur dans res[,
.reverseColAbbreviations(x, cnames), drop = FALSE] :


length(geneGR)

[1] 3269
## The first 1K work

symbol - select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype
= GENEID, columns = SYMBOL)

## The 1K+1 does not !

symbol - select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype
= GENEID, columns = SYMBOL)

Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
nombre de dimensions incorrect

It looks like I cannot convert more than 1K elements ?? Any reason
for that ?
Thank you very much
Nicolas

Not sure what 'GENEID' is in this context - it appears to be Entrez
Gene. But anyway, if you use ENTREZID instead, it works fine:


symbol - select(Homo.sapiens, names(geneGR), SYMBOL, ENTREZID)
symbol - select(Homo.sapiens, names(geneGR), GENEID, ENTREZID)

Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions

symbol - select(Homo.sapiens, names(geneGR)[1:1000], GENEID,

ENTREZID)

symbol - select(Homo.sapiens, names(geneGR)[1:1001], GENEID,

ENTREZID)
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions

Best,

Jim




sessionInfo()

R Under development (unstable) (2014-03-05

Re: [Bioc-devel] Update policy on experiment data and annotation packages

2013-10-16 Thread Marc Carlson

Hi Julian,

This is a complicated issue for us and we have to choose our next move 
carefully since we don't have unlimited resources. Especially not with 
respect to time.  But I wanted to let you know that we appreciate your 
comment and that we are still thinking about it.


  Marc



On 10/10/2013 03:05 AM, Julian Gehring wrote:

Hi,

What is the consensus on updating data in experiment data and 
annotation packages?


The bioc website [1] does not state any differences between the two 
package types in terms of updating their content.  From the bioc core, 
I have the information that (a) experimental data packages should 
represent 'frozen' data and not get updated over release cycles, while 
(b)  annotation packages should get updated with every release cycle. 
Should we add this information to the website?


I'm curious what this means for experimental data that accumulates 
over time, i.e. data from big consortia, as represented by e.g. 
'curatedOvarianData', 'SomaticCancerAlterations', and others. Should 
one create create a new package with each release cycle (indicating 
the data version in the package name, as the 'SNPlocs*' packages) to 
ensure reproducibility?  Or update an annotation package with each 
release, and try to ensure backwards compatibility within the package 
itself?


Best wishes
Julian


[1] http://bioconductor.org/developers/package-guidelines/#package-types

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Courtesy message about the upcoming release

2013-10-04 Thread Marc Carlson

Hello everyone,

This is a courtesy message to remind all package developers that next 
Wednesday the 9th of October is the deadline for all packages to pass 
the build system without any errors or warnings.


Please have a look at our build system for the development branch and 
make sure that the packages you maintain are not causing any errors or 
warnings:


http://www.bioconductor.org/checkResults/2.13/bioc-LATEST/


Also you can see our release schedule here if you have questions about 
the dates:


http://www.bioconductor.org/developers/release-schedule/



  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] isActiveSeq deprecated

2013-09-18 Thread Marc Carlson
I actually considered this, but I opted to do it this way just for the 
sake of being consistent (which was my whole mission for implementing 
seqlevels in here in the 1st place).  Now I could make it more 
convenient here and break consistency with how it is used elsewhere, but 
what do people prefer?


Consistent or convenient?


  Marc



On 09/18/2013 10:40 AM, Hervé Pagès wrote:

Hi Marc,

Wouldn't it make sense to just ignore the 'force' arg when
dropping the seqlevels of a TranscriptDb?

The 'force' argument is FALSE by default and this prevents
seqlevels- to shrink GRanges or other vector-like objects
when the user tries to drop seqlevels that are in use.
Internally seqlevels- calls seqlevelsInUse() to get the
seqlevels currently in use and see if they intersect with
the seqlevels to drop.

In the TranscriptDb situation, people always have to use
'force=TRUE' to drop seqlevels, regardless of whether the
levels to drop are in use or not (the seqlevelsInUse()
getter not being defined for TranscriptDb objects, I suspect
seqlevels- doesn't look at this).

So maybe 'force' could just be ignored for TranscriptDb objects?
That would make seqlevels- a little bit more user-friendly on
those objects.

Thanks,
H.


On 09/13/2013 10:38 AM, Marc Carlson wrote:

Hi Florian,

Yes we are trying to make things more uniform.  seqlevels() lets you
rename as well as deactivate chromosomes you want to ignore, so it was
really redundant with isActiveSeq().  So we are moving away from
isActiveSeq() just so that users have less to learn about.  The reason
why isActiveSeq was different from seqlevels was just because it was
born for a TranscriptDb (which is based on an annotation database)
instead of being born on a GRanges object.  So seqlevels was the more
general tool.

Marc



On 09/13/2013 07:24 AM, Hahne, Florian wrote:

Hi Marc,
I saw these warnings in Gviz, but they stem from GenomicFeatures

Warning messages:
1: 'isActiveSeq' is deprecated.
Use 'seqlevels' instead.
See help(Deprecated) and help(GenomicFeatures-deprecated).
2: 'isActiveSeq' is deprecated.
Use 'seqlevels' instead.
See help(Deprecated) and help(GenomicFeatures-deprecated).
3: 'isActiveSeq-' is deprecated.
Use 'seqlevels' instead.
See help(Deprecated) and help(GenomicFeatures-deprecated).
4: 'isActiveSeq-' is deprecated.
Use 'seqlevels' instead.
See help(Deprecated) and help(GenomicFeatures-deprecated).
5: 'isActiveSeq' is deprecated.
Use 'seqlevels' instead.
See help(Deprecated) and help(GenomicFeatures-deprecated).
6: 'isActiveSeq-' is deprecated.
Use 'seqlevels' instead.
See help(Deprecated) and help(GenomicFeatures-deprecated).

So has the whole idea of active chromosomes in the data base been
dropped? I could not find anything in the change notes. Do I get it
right that you can now do
seqlevels(txdb, force=TRUE) - chr1
if you just want the first chromosome to be active?

Florian




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] isActiveSeq deprecated

2013-09-16 Thread Marc Carlson
Thanks Florian,

I just checked in a fix for this.  Please let me know if you find any 
other quirks.

   Marc


On 09/16/2013 05:33 AM, Hahne, Florian wrote:
 Hey Marc,
 I think your move towards seqlevels is not quite working yet:

 samplefile - system.file(extdata, UCSC_knownGene_sample.sqlite, 
 package=GenomicFeatures)
 txdb - loadDb(samplefile)
 ## This works fine
 fiveUTRsByTranscript(txdb)
 ## This breaks
 seqlevels(txdb, force=TRUE) - chr6
 fiveUTRsByTranscript(txdb)
 Error in relist(x, f) :
   shape of 'skeleton' is not compatible with 'NROW(flesh)'

 Deep in the guts of this you are trying to build a GRanges object with 
 NAs as seqlevels, and it doesn't really like that.

 Florian

  sessionInfo()
 R version 3.0.1 RC (2013-05-12 r62736)
 Platform: i386-apple-darwin12.3.0/i386 (32-bit)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] parallel  grid  stats graphics  grDevices utils datasets
 [8] methods   base

 other attached packages:
 [1] GenomicFeatures_1.13.37 AnnotationDbi_1.23.23 Biobase_2.21.7
 [4] GenomicRanges_1.13.43   XVector_0.1.4 IRanges_1.19.36
 [7] BiocGenerics_0.7.5  Gviz_1.5.11 BiocInstaller_1.11.4

 loaded via a namespace (and not attached):
  [1] biomaRt_2.17.2  Biostrings_2.29.18  biovizBase_1.9.2
  [4] bitops_1.0-6BSgenome_1.29.1 cluster_1.14.4
  [7] colorspace_1.2-2DBI_0.2-7 dichromat_2.0-0
 [10] Hmisc_3.12-2labeling_0.2  lattice_0.20-23
 [13] munsell_0.4.2   plyr_1.8  RColorBrewer_1.0-5
 [16] RCurl_1.95-4.1  rpart_4.1-3 Rsamtools_1.13.39
 [19] RSQLite_0.11.4  rtracklayer_1.21.11 scales_0.2.3
 [22] stats4_3.0.1stringr_0.6.2   tools_3.0.1
 [25] XML_3.98-1.1zlibbioc_1.7.0



 From: Marc Carlson mcarl...@fhcrc.org mailto:mcarl...@fhcrc.org
 Date: Friday, September 13, 2013 7:38 PM
 To: Florian Hahne florian.ha...@novartis.com 
 mailto:florian.ha...@novartis.com
 Cc: bioc-devel@r-project.org mailto:bioc-devel@r-project.org 
 bioc-devel@r-project.org mailto:bioc-devel@r-project.org
 Subject: Re: isActiveSeq deprecated

 Hi Florian,

 Yes we are trying to make things more uniform. seqlevels() lets
 you rename as well as deactivate chromosomes you want to ignore,
 so it was really redundant with isActiveSeq().  So we are moving
 away from isActiveSeq() just so that users have less to learn
 about.  The reason why isActiveSeq was different from seqlevels
 was just because it was born for a TranscriptDb (which is based on
 an annotation database) instead of being born on a GRanges
 object.  So seqlevels was the more general tool.

   Marc



 On 09/13/2013 07:24 AM, Hahne, Florian wrote:
 Hi Marc,
 I saw these warnings in Gviz, but they stem from GenomicFeatures

 Warning messages:
 1: 'isActiveSeq' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 2: 'isActiveSeq' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 3: 'isActiveSeq-' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 4: 'isActiveSeq-' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 5: 'isActiveSeq' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 6: 'isActiveSeq-' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).

 So has the whole idea of active chromosomes in the data base been
 dropped? I could not find anything in the change notes. Do I get
 it right that you can now do
 seqlevels(txdb, force=TRUE) - chr1
 if you just want the first chromosome to be active?

 Florian




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] isActiveSeq deprecated

2013-09-13 Thread Marc Carlson
Hi Florian,

Yes we are trying to make things more uniform.  seqlevels() lets you 
rename as well as deactivate chromosomes you want to ignore, so it was 
really redundant with isActiveSeq().  So we are moving away from 
isActiveSeq() just so that users have less to learn about.  The reason 
why isActiveSeq was different from seqlevels was just because it was 
born for a TranscriptDb (which is based on an annotation database) 
instead of being born on a GRanges object.  So seqlevels was the more 
general tool.

   Marc



On 09/13/2013 07:24 AM, Hahne, Florian wrote:
 Hi Marc,
 I saw these warnings in Gviz, but they stem from GenomicFeatures

 Warning messages:
 1: 'isActiveSeq' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 2: 'isActiveSeq' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 3: 'isActiveSeq-' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 4: 'isActiveSeq-' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 5: 'isActiveSeq' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).
 6: 'isActiveSeq-' is deprecated.
 Use 'seqlevels' instead.
 See help(Deprecated) and help(GenomicFeatures-deprecated).

 So has the whole idea of active chromosomes in the data base been 
 dropped? I could not find anything in the change notes. Do I get it 
 right that you can now do
 seqlevels(txdb, force=TRUE) - chr1
 if you just want the first chromosome to be active?

 Florian



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] svn for annotation packages

2013-09-13 Thread Marc Carlson
Yes I try to get updates on everything every time.

   Marc



On 09/12/2013 10:38 AM, Kasper Daniel Hansen wrote:
 Ok, sounds good.

 This is especially nice to know for the annotation packages which are 
 hand created as opposed to being created by some script.

 Kasper


 On Thu, Sep 12, 2013 at 1:22 PM, Marc Carlson mcarl...@fhcrc.org 
 mailto:mcarl...@fhcrc.org wrote:

 Hi Kasper,

 You should get an email from me in the coming weeks with
 instructions regarding the upcoming release.  If you need
 something changed before then, please send me an email.


   Marc




 On 09/11/2013 06:17 PM, Dan Tenenbaum wrote:

 Annotation packages are not in svn. Send your changes to Marc.
 Dan

 Kasper Daniel Hansen kasperdanielhan...@gmail.com
 mailto:kasperdanielhan...@gmail.com wrote:

 What is the url?  Or should I not work from subversion, if
 I want to update
 a package?  The HOWTO is a bit unclear.

 (Want to work on
 IlluminaHumanMethylation450kannotation.ilmn_v1.2)

 Best,
 Kasper

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] svn for annotation packages

2013-09-12 Thread Marc Carlson

Hi Kasper,

You should get an email from me in the coming weeks with instructions 
regarding the upcoming release.  If you need something changed before 
then, please send me an email.



  Marc



On 09/11/2013 06:17 PM, Dan Tenenbaum wrote:

Annotation packages are not in svn. Send your changes to Marc.
Dan

Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote:


What is the url?  Or should I not work from subversion, if I want to update
a package?  The HOWTO is a bit unclear.

(Want to work on IlluminaHumanMethylation450kannotation.ilmn_v1.2)

Best,
Kasper

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] heavy vignette

2013-07-23 Thread Marc Carlson
Hi Carles,

In general there are three steps to consider in turn:

1) Look at the repository of experiment data packages.  If there are 
existing packages there with data that you can use then you probably 
want to use those.

2) Look for ways to make the data smaller and still get your 
testing/examples done.  Maybe you don't need 30
cel files? Maybe you only needed the expressionSet object that they 
eventually result in?

3) If you still really need all 30 raw files, then it sounds like you 
might need to make an new package to hold them.  If that is the case, 
then please document them carefully.  This is so that others who come 
along can find them at step 1 above...


   Marc



On 07/23/2013 03:27 AM, Hernandez Ferrer, Carles wrote:
 Hello to everyone,


 Related to vignettes creation.

 I'm developing an R-package to preprocess raw Affymetrix data for two other 
 packages (MAD and inveRsion). The idea is to publish it in Bioconductor so an 
 executable vignette must be done but to test the package functionality I 
 need ~30 CEL files (this goes from 45.5Mb to 70.0Mb aprox. per file) fore 
 each allowed technologies (4 technologies).

 How do you recommend me to develop the vignette or to store the needed data?


 Carles Hernandez-Ferrer
 Centre for Research in Environmental Epidemiology - CREAL
 Parc de Recerca Biomèdica de Barcelona - PRBB
 Doctor Aiguader, 88 | 0800a3 Barcelona, Spain
 chernan...@creal.cat | 93 214 75 78
 www.creal.cat




   [[alternative HTML version deleted]]



 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation

2013-04-30 Thread Marc Carlson
Hi Michael,

Yes.

library(Homo.sapiens)
cols(Homo.sapiens)
txs - transcripts(Homo.sapiens, columns=c(SYMBOL))

exs - exons(Homo.sapiens, columns=c(SYMBOL))


   Marc


On 04/30/2013 03:07 PM, Michael Lawrence wrote:
 Hi Marc,

 Do you know if it is easy yet to get the gene symbols returned as a 
 result of e.g. a transcripts() or exons() call?

 Michael


 On Tue, Apr 30, 2013 at 2:16 PM, Marc Carlson mcarl...@fhcrc.org 
 mailto:mcarl...@fhcrc.org wrote:

 Related to this:

 I have added getters for seqinfo (and friends) for the OrganismDb
 objects.  I have not added the setters yet though since that
 requires some refactoring of what an OrganismDb object actually is
 internally.  But I intend to do this also.

   Marc




 On 04/25/2013 09:32 AM, Valerie Obenchain wrote:

 Hi Vince, Kasper,

 cc'ing Herve and Marc.

 I think we have a couple of things going on so I wanted to
 clarify. The 'genome' argument to readVcf() is assigned to the
 GRanges in rowData with the genome- setter. This is where
 .normargGenome() gets called.

 setReplaceMethod(genome, Seqinfo,
 function(x, value)
 {
 x@genome - .normargGenome(value, seqnames(x))
 x
 }
 )

 If the 'genome' replacement value is named, the name(s) must
 match the seqnames, not the build. So we aren't talking about
 matching compatible builds,

 fl - system.file(extdata, ex2.vcf,
 package=VariantAnnotation)
 vcf - readVcf(fl, c(b37=hg19))  ## this is wrong
 vcf - readVcf(fl, c(hg19=hg19)) ## also wrong

 Instead the name must be the seqname, the value is the build,

 vcf - readVcf(fl, c(20=hg19))  ## correct
 vcf - readVcf(fl, hg19)  ## also correct

 This requirement for 'genome' is not well documented on
 ?readVcf or ?Seqinfo. We can fix that.

 The second thing is the issue of a flexible mapping between
 seqinfo metadata for different institutions. Herve and Marc
 have worked on this in AnnotationDbi. They can explain more
 about the 'SeqnameStyle' and how it might be used more widely.


 Val


 On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:

 An official comment on this
 http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19
 with some more info in this discussion

 
 https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#!topic/genome/hFp-dGG9gBs
 
 https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#%21topic/genome/hFp-dGG9gBs


 Essentially it seems the b37 has been patched and this
 patched release is
 not reflected in hg19 but may be (I don't know) reflected
 in the b37
 download from NCBI

 Kasper


 On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen 
 kasperdanielhan...@gmail.com
 mailto:kasperdanielhan...@gmail.com wrote:

 I agree with Vincent.  I have seen code from Herve in
 a package with some
 standardization of chromosome names, and this code
 could perhaps be used
 more widely so we don't have all the problems with
 chr1 vs chr01 vs 1.

 However, in this particular case, if Ulrich is
 actually interested in the
 mitochondrial genome, he has a problem.

 hg19, which is the genome version from UCSC is
 consider equal to NCBIs
 b37.  However, as far as I understand, UCSC screwed up
 with the
 mitochondrial genome and used an old version for their
 hg19. So the error
 message is in many ways right here: the two genomes
 are slightly different
 because they have different mitochondrial genomes.

 Kasper


 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Bioconductor and the Google Summer of Code

2013-04-09 Thread Marc Carlson

Hello everyone,

This year Bioconductor is participating in the Google Summer of Code.
Interested parties can see our GSOC page here:

https://google-melange.appspot.com/gsoc/org/google/gsoc2013/bioconductor

Also, we know that we only have a few ideas on our ideas list this year.
It's because we want to start small this year and only take on a few
projects under careful mentorship.  But, that doesn't mean that we don't
want to hear from the community about what you would like to see in the
future.  And of course we would also like to hear from any students out
there who might want to participate!  So if you are a student who wants to
participate, please email our special list set up for this purpose (
gsoc-b...@lists.fhcrc.org ) and tell us about yourself and your interests.


  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Next Wednesday is the deadline for all new package submissions.

2013-03-08 Thread Marc Carlson

Hello everyone,

If you are planning to submit a package to the project in time for the 
upcoming release, please be sure to do so by next Wednesday (the 13th).  
You can see the deadline in our release schedule here:


http://www.bioconductor.org/developers/release-schedule/


Thanks again,


  Marc

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] I would like to publish a bioconductor package.

2013-03-04 Thread Marc Carlson

Hi Davide,

Lots of good advice here.  The main goal with two packages is to 
minimize dependencies for the experiment data package as this is 
presumed to be the less specialized package.  Whenever you have a 
package ready please be sure to follow the follow the instructions on 
the link that Herve provided.


Thanks in advance for your interest in contributing to the project,


  Marc



On 03/04/2013 08:19 AM, Kasper Daniel Hansen wrote:

This is a kind of a chicken and egg problem.

If the data in the experimental data package is in base R containers
(like just a matrix etc), it is pretty clear: the data package does
not depend on anything and the methods package either suggests or
depends on the data package.

However, in most cases, the data will be in some container (S4)
defined in the methods package.  In that case I usually let the data
package depends on the methods package and I let the methods package
suggests the data package.  Then you need to start each example by
something like

if(require(DATAPACKAGE)) {
   CODE
}

I have done this for minfi/minfiData and bsseq/bsseqData

Kasper


On Mon, Mar 4, 2013 at 10:15 AM, Davide Rambaldidavide.ramba...@ieo.eu  wrote:

You can solve the package size issue by putting your example data in a separate 
experiment data package 
(http://www.bioconductor.org/packages/release/data/experiment/).

Stephanie


I fixed the package size issue with a secondary experiment data package 
(flowFitExampleData)

It is not clear to me how to fix the dependencies between the 2 packages:

My setup (I am trying to duplicate the affy/affydata setup…):

flowFit/DESCRIPTION

Suggests: flowFitExampleData


flowFitExampleData/DESCRIPTION

Depends: flowFit


And a lot of (may be are not necessary?)

if (require(flowFItExampleData))

in the examples

  It is correct?

Davide

P.S:

  tested the package on OSX and Linux with R 3.0 unstable for BUILD and CHECK 
and it's OK… (me vs inconsolata.sty: 1 -0)

  for windows, well I will try to do it … may be I will ask more help ...



On Feb 27, 2013, at 5:25 PM, Stephanie M. Gogarten wrote:


You can solve the package size issue by putting your example data in a separate 
experiment data package 
(http://www.bioconductor.org/packages/release/data/experiment/).

Stephanie

On 2/27/13 3:03 AM, Davide Rambaldi wrote:

Hi all,

I am working on a library called flowFit, the purpose of this library is to 
analyze the FACS data coming from proliferation tracking dyes study.

The library depends on the flowCore and flowViz bioconductor libraries and use 
minpack.lm (levenberg-marquadt algorithm) to fit a set of peaks over the FACS 
data.

A typical experimental pipeline:

1) Acquire with FACS a sample of unlabelled cells
2) Acquire with FACS a sample of labeled and unstimulated cells (the Parent 
Population)
3) Acquire with FACS a sample of labeled and stimulated cells (the 
Proliferative Population)

In R we can use the flowCore functions to transform the raw data and to gate 
the population of interest. Once we have gated the correct population, with 2 
commands of flowFit you can perform the fitting:


library(flowFit)
parent- parentFitting(QuahAndParish[[1]], FITC-A)
fitting- proliferationFitting(QuahAndParish[[2]],  FITC-A, 
parent.fitting.cfse@parentPeakPosition,  parent.fitting.cfse@parentPeakSize)

The function can generate also some graphical output with:


plot(fitting.cfse)

To demonstrate the correctness of the fitting I have made some in silico 
simulations and a retrospective analysis of the data from the paper:

New and improved methods for measuring lymphocyte proliferation in vitro and in 
vivo using CFSE-like fluorescent dyes, Benjamin J.C. Quah ⁎, Christopher R. Parish, 
Journal of Immunological Methods (2012)

In this paper, the same population of lymphocytes (proliferation with the same 
growth conditions) was stained with 3 different proliferation tracking dyes: if 
the fitting algorithm is working as expected, we expect to estimate the same % 
of cells for generation in the 3 sample.

Comparing the 3 samples we didn't see any significant difference in the 
estimation of the % of cell for generations, suggesting us that the algorithm 
is correctly estimating the % of cells / generation.

I have posted a graphical output example with the Quah and Parish data (pdf) 
here:

http://dl.dropbox.com/u/40644496/QuahAndPArishOut.pdf

The dataset will be included in the library (in the data subdir).

Actually I am writing the vignette (I am following the guidelines in 
http://www.bioconductor.org/developers/package-guidelines/) and fixing some 
graphical bugs (like the legend oversized …).

The package Pass R CMD build and R CMD CHECK (time: 86 seconds) with no errors 
on OSX and Linux (I have to find a windows machine somewhere ...), I still have 
to test with the R-devel version of R.

The library is bigger than expected (4.2 Mb) because the example datasets (FCS 
files converted in .Rdata) are big (3.7M) and I don't know how 

Re: [Bioc-devel] makeTranscriptDBFromGFF v. Flybase GFF

2013-02-11 Thread Marc Carlson

Hi Malcolm,

Not too much that hasn't been mentioned before.  So I bet that many 
people can probably walk past this one.


Both GFF and GTF files have many of the same things that come up when 
you use them.  They both are being used for things today (like 
transcriptomes) which represent a pretty specific use case.  And both 
these file formats were designed a while ago now, and some kinds of 
information (like exon rank) that are completely crucial for doing 
something like a transcriptome are therefore still optional when making 
a GFF or GTF file.  Also, because these file formats are very flexible 
and general in their specification, it is possible for them to be either 
overly sparse, OR overly loaded with unnecessary stuff (depending on 
what you were planning to use them for).  So it is completely possible 
that the ensembl file may be smaller and yet still contain what you 
need.  Or it might not be smaller.  You will simply have to check it and 
see how it compares.


If you are using my function makeTranscriptDBFromGFF() from the 
GenomicFeatures package, it will try to check and see if the file has 
all the required information for you as it processes it into a 
transcriptDb object.  If you are calling this, the only thing you really 
have to be extra careful about is the exon rank attribute.  This 
function can guess at that information for you, but I am betting you 
don't want that if you can avoid it (which is why you will get a warning 
if this happens).  So for these data, you really want to point to an 
attribute that has that information (if that is possible).


In addition to seeing problems where a file will have too much or too 
little information, you will also sometimes see a file that is formatted 
in some peculiar way that requires you to translate it into a more 
typical looking GFF or GTF file.  This can happen to you because as I 
mentioned above the file formats are fairly general and open to some 
interpretation by those who write them out.  In general I think the most 
important piece of advice is that you should always look at GFF or GTF 
files in person before you try to use them, because you can't really be 
too sure about what kind of information will be in there unless you do.


The bottom line is that both ensembl and flybase are reputable places to 
get data from.  But because they are different places, they may produce 
dramatically different looking GFF or GTF files.



Also related to this, please be sure to use the very latest version of 
makeTranscriptDBFromGFF from the devel branch, as I have made some 
improvements for performance since the release.



I hope this helps,



  Marc




On 02/11/2013 03:13 PM, Cook, Malcolm wrote:

Marc et. al.,

A colleague of mine (cc:ed) is experiencing memory bloat using 
makeTranscriptDBFromGFF on dmel GFF from Flybase.org

I told him of my success in using Ensembl's GTF-ization but that I would check 
in with you (et al).

So

Do you have any advice/warnings/gothcas/toldyasos/caveats re: applying 
makeTranscriptDBFromGFF to Flybase

Thanks!

Cheers,

Malcolm



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [GenomicFeatures] no pkgName found. makeTxDbPackage() called after txdb created from GFF3 file

2013-02-09 Thread Marc Carlson

Hi Malcolm,

In general I have found ensembl to be really great and I expect that 
their gtf files are probably fine.  Usually the exon rank is the 1st 
thing you will see left out when a gtf file is cutting corners, and you 
are correct that they seem to be including that. I ran the one for Homo 
sapiens though makeTranscriptDbFromGFF() and everything appears to be in 
working order.


I wanted to warn Tengfei about this because I worry that most people 
will be surprised to learn that the gtf file format comes with fewer 
guarantees about the data included than they might have expected.  I 
also mentioned it because I noticed that his function call to 
makeTranscriptDbFromGFF() did not specify an exonRankAttributeName, 
which strongly implies to me that maybe that his file might not have had 
that information present.  The assumption was that if he had that 
information, he would have supplied that argument so that he could make 
use of it.  But another possibility is that Tengfei just didn't need 
that information at all, in which case this will all just be another 
(possibly unwarranted) public service message.  If that is the case, I 
apologise for the noise.



  Marc



On 02/08/2013 06:19 PM, Cook, Malcolm wrote:

.Hi Tengfei,
  .
  .Yes that looks like an oversight.  Thanks for reporting that!  I will
  .extend makeTxDbPackage so that it's more accommodating of these newer
  .transcriptDbs.  If you want to help me out, you could call saveDb() on
  .your gmax189 object and send me the .sqlite file that you save it to.
  .
  .Also, if you have any alternate options for importing your data (other
  .than using GFF or GTF): I think you probably should consider it.  The
  .file specifications for these filetypes are missing key details and so
  .you can very easily get a legal GFF or GTF file that is actually
  .missing important details from it's contents.  For example, they can
  .commonly lack information about the order of the exons for a given
  .transcript, which can render them difficult (or impossible) to use for
  .transcript work.   But for these specifications, that information is
  .optional.

Marco, do you have any comment on ensembl GTF (which has exon order) in this 
regard?

Thanks,

Malcolm

  .
  .
  .   Marc
  .
  .
  .
  .On 02/06/2013 09:46 PM, Tengfei Yin wrote:
  . Dear all,
  .
  . I am trying to build a txdb object from gff3 for soybean data and try to
  . make it a package. Code used like this
  .
  . gmax189- makeTranscriptDbFromGFF(~/Gmax_189_gene_exons.gff3,
  . format = gff3, species = Glycine 
max,
  . dataSource = 
http://www.phytozome.org/;)
  . makeTxDbPackage(txdb = gmax189,
  .  version = 0.9.1,
  .  maintainer = Tengfei Yin,
  .  author = Tengfei Yin,
  .  destDir=.,
  .  license=Artistic-2.0)
  .
  . Error message:
  . Error in gsub(_, , pkgName) :
  .error in evaluating the argument 'x' in selecting a method for function
  . 'gsub': Error: object 'pkgName' not found
  .
  .
  . Looks like my dataSource should be either BioMart or UCSC, otherwise no
  . pkgname will be produced in function .makePackageName?
  .
  . Or should I build annotation package in some other ways?
  .
  . Thanks a lot
  .
  . Tengfei
  .
  . my sessionInfo
  .
  . sessionInfo()
  . R Under development (unstable) (2013-01-21 r61728)
  . Platform: x86_64-unknown-linux-gnu (64-bit)
  .
  . locale:
  .   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  .   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  .   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  .   [7] LC_PAPER=C LC_NAME=C
  .   [9] LC_ADDRESS=C   LC_TELEPHONE=C
  . [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
  .
  . attached base packages:
  . [1] parallel  stats graphics  grDevices utils datasets  methods
  . [8] base
  .
  . other attached packages:
  . [1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10  Biobase_2.19.2
  . [4] GenomicRanges_1.11.28  IRanges_1.17.31BiocGenerics_0.5.6
  .
  . loaded via a namespace (and not attached):
  .   [1] biomaRt_2.15.0 Biostrings_2.27.10 bitops_1.0-5
  . BSgenome_1.27.1
  .   [5] DBI_0.2-5  RCurl_1.95-3   Rsamtools_1.11.15
  .   RSQLite_0.11.2
  .   [9] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0XML_3.95-0.1
  .
  . [13] zlibbioc_1.5.0
  .
  .
  .
  .___
  .Bioc-devel@r-project.org mailing list
  .https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [GenomicFeatures] no pkgName found. makeTxDbPackage() called after txdb created from GFF3 file

2013-02-08 Thread Marc Carlson

Hi Tengfei,

Yes that looks like an oversight.  Thanks for reporting that!  I will 
extend makeTxDbPackage so that it's more accommodating of these newer 
transcriptDbs.  If you want to help me out, you could call saveDb() on 
your gmax189 object and send me the .sqlite file that you save it to.


Also, if you have any alternate options for importing your data (other 
than using GFF or GTF): I think you probably should consider it.  The 
file specifications for these filetypes are missing key details and so 
you can very easily get a legal GFF or GTF file that is actually 
missing important details from it's contents.  For example, they can 
commonly lack information about the order of the exons for a given 
transcript, which can render them difficult (or impossible) to use for 
transcript work.   But for these specifications, that information is 
optional.



  Marc



On 02/06/2013 09:46 PM, Tengfei Yin wrote:

Dear all,

I am trying to build a txdb object from gff3 for soybean data and try to
make it a package. Code used like this

gmax189- makeTranscriptDbFromGFF(~/Gmax_189_gene_exons.gff3,
format = gff3, species = Glycine max,
dataSource = http://www.phytozome.org/;)
makeTxDbPackage(txdb = gmax189,
 version = 0.9.1,
 maintainer = Tengfei Yin,
 author = Tengfei Yin,
 destDir=.,
 license=Artistic-2.0)

Error message:
Error in gsub(_, , pkgName) :
   error in evaluating the argument 'x' in selecting a method for function
'gsub': Error: object 'pkgName' not found


Looks like my dataSource should be either BioMart or UCSC, otherwise no
pkgname will be produced in function .makePackageName?

Or should I build annotation package in some other ways?

Thanks a lot

Tengfei

my sessionInfo


sessionInfo()

R Under development (unstable) (2013-01-21 r61728)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10  Biobase_2.19.2
[4] GenomicRanges_1.11.28  IRanges_1.17.31BiocGenerics_0.5.6

loaded via a namespace (and not attached):
  [1] biomaRt_2.15.0 Biostrings_2.27.10 bitops_1.0-5
BSgenome_1.27.1
  [5] DBI_0.2-5  RCurl_1.95-3   Rsamtools_1.11.15
  RSQLite_0.11.2
  [9] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0XML_3.95-0.1

[13] zlibbioc_1.5.0




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] question about assignment warnings for replacement methods

2011-04-05 Thread Marc Carlson

Hi,

I have seen several packages that with the most recent version of R are 
giving a warning like this:


Assignments in \usage in documentation object 'marginalData-methods':
marginalData(object) = value

I assume that this is to prevent people from making assignments in their 
usage statements (which seems completely understandable).  But what 
about the case above?  This is a person who just wants to show the 
proper usage for a replacement method.  IOW they just want to write 
something that looks like what you actually do when you use a 
replacement method.  They just want to show users how to do something 
like this:


replacementMethod(object) - newValue


So is that really something that should not be allowed in a usage 
statement?



  Marc

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] question about assignment warnings for replacement methods

2011-04-05 Thread Marc Carlson

Thank you for the clarifications Duncan.

  Marc


On 04/05/2011 11:15 AM, Duncan Murdoch wrote:

On 05/04/2011 1:51 PM, Marc Carlson wrote:

Hi,

I have seen several packages that with the most recent version of R are
giving a warning like this:

Assignments in \usage in documentation object 'marginalData-methods':
marginalData(object) = value

I assume that this is to prevent people from making assignments in their
usage statements (which seems completely understandable).  But what
about the case above?  This is a person who just wants to show the
proper usage for a replacement method.  IOW they just want to write
something that looks like what you actually do when you use a
replacement method.  They just want to show users how to do something
like this:

replacementMethod(object)- newValue


So is that really something that should not be allowed in a usage
statement?


If replacementMethod was a replacement function, then

replacementMethod(object)- newValue

is supposed to be fine.  But if it is an S3 method, it should be

\method{replacementMethod}{class}(object)- newValue

and if it is an S4 method I think it should be

\S4method{replacementMethod}{signature_list}(object)- newValue

(though the manual suggests using the S3 style, I'm not sure how 
literally to take it).


Duncan Murdoch




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel