Re: [Bioc-devel] AnnotationHub: cleanup

2015-09-17 Thread Vincent Carey
I followed part of this interchange with interest.  I would love to see
very wide adoption and appreciation of AnnotationHub and what I will
describe does not seem to constitute important obstacles to this, but I
have to confess that aspects of the model and grammar are confusing to me.

I use "cache" mainly as a noun.  And in computing applications, IMHO, a
cache is something to be hidden far from the active interface.  In
AnnotationHub "cache" names an important function and a key datastructure
for annotation archiving.

What I understand is (2.1.40):

ah = AnnotationHub()  # creates object for file and database access, will
update db if  appropriate
cache(ah)  #  will offer to acquire all available hub resources for local
caching, upon decline will provide
a named vector of paths

> cache(ah)

download 40503 resources? [y/n] n

 AH5086  AH5087

 "/Users/stvjc/.AnnotationHub/5086"  "/Users/stvjc/.AnnotationHub/5087"

AH14108 AH15146


I am not sure this vector is going to get much use.  Maybe a negative
response should return NULL?

The help page says


cache(x)’ and ‘cache(x) <- value’: Adds (downloads) all resources in

  ‘x’, or removes all local resources corresponding to the

  records in ‘x’ from the cache.


"download" seems like a reasonable name for part of this functionality.
 "cache<-" seems

to be concerned mainly with deletion.  I can certainly define private
alternate terms for these tasks

in my .Rprofile but I do think a closer correspondence of function name to
action could pay off.



On Tue, Sep 15, 2015 at 10:34 AM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

> On Tue, Sep 15, 2015 at 12:25 AM, Morgan, Martin <
> martin.mor...@roswellpark.org> wrote:
>
> > Hi Kasper -- we'll try to act on these, but some comments / looking for
> > clarification...
> >
> > > -Original Message-
> > > From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf
> Of
> > > Kasper Daniel Hansen
> > > Sent: Monday, September 14, 2015 10:45 PM
> > > To: bioc-devel@r-project.org
> > > Subject: [Bioc-devel] AnnotationHub: cleanup
> > >
> > > I currently have the `pleasure` of dealing with students who have
> > problems
> > > with installing AnnotationHub and/or downloading resources.  Here are
> > some
> > > comments including some possible bug reports.
> >
> > I hope this is on the whole a positive experience, and we'll do what we
> > can to make it better.
> >
>
> Well, I love the package and I love it even more having prepared material
> on it.  And the people who complain is of course enriched for people who
> have problems - no way to know if it just works for most people.
>
> And of course right now it is more troublesome since I prepared the class
> using R-3.2.1 and then 3.2.2 was released just before we started and had
> the http -> https change which is an obvious suspect when people have
> download problems :)
>
>  > 1) I think it is extremely dangerous that `cache(ahub)` starts by asking
> to
>
> > > download all resources!  May I suggest this only happens with a
> specific
> > > setting like `cache(ahub, download=TRUE)` or something similar.
> >
> > >
> > > 2) `cache(ahub)` deletes all cached information, except the sqlite
> > database.
> > > Could we get a way to remove everything?
> > >
> > > 3) While I can understand the difference between cache and hubCache, I
> > > would suggest that hubCache(ahub) = NULL removes all cached material
> > > included the sqlite database.
> >
> > For each of the above the envisioned use case was that  'hub' is a
> subset,
> > eg.,
> >
> >   subhub = query(hub, c("homo", "ensembl", "81"))
> >
> > and the user wanted to manipulate all records in the sub-hub.
> > cache(subhub) asks about the 'really download" if the size of the
> (sub)hub
> > is greater than hubOption("MAX_DOWNLOADS"), which by default is 10; it
> > seems like asking is the same as requiring an argument? fileName(subhub)
> > may be closer to what you're looking for...? the path to the file name,
> or
> > NA if It is not in the cache.
> >
> > For cache(subhub) = NULL it wouldn't make sense to delete 5 resources AND
> > the sqlite file for the entire hub.
> >
> > The sqlite file can be discovered with dbfile(hub) / dbfile(subhub), and
> > removed with file.remove(dbfile(subhub))). In some ways it wasn't
>

Re: [Bioc-devel] AnnotationHub: cleanup

2015-09-17 Thread Kasper Daniel Hansen
I believe I agree with Vince.  I also hope that most users won't have to
reset their cache or ever thing about it, but I had to deal with it as a
possible source of problem (not sure that was the problem, but as we all
know it can be hard to diagnose over the internet with limited
information).

Best,
Kasper

On Thu, Sep 17, 2015 at 10:15 AM, Vincent Carey <st...@channing.harvard.edu>
wrote:

> I followed part of this interchange with interest.  I would love to see
> very wide adoption and appreciation of AnnotationHub and what I will
> describe does not seem to constitute important obstacles to this, but I
> have to confess that aspects of the model and grammar are confusing to me.
>
> I use "cache" mainly as a noun.  And in computing applications, IMHO, a
> cache is something to be hidden far from the active interface.  In
> AnnotationHub "cache" names an important function and a key datastructure
> for annotation archiving.
>
> What I understand is (2.1.40):
>
> ah = AnnotationHub()  # creates object for file and database access, will
> update db if  appropriate
> cache(ah)  #  will offer to acquire all available hub resources for local
> caching, upon decline will provide
> a named vector of paths
>
> > cache(ah)
>
> download 40503 resources? [y/n] n
>
>  AH5086  AH5087
>
>  "/Users/stvjc/.AnnotationHub/5086"  "/Users/stvjc/.AnnotationHub/5087"
>
> AH14108 AH15146
>
>
> I am not sure this vector is going to get much use.  Maybe a negative
> response should return NULL?
>
> The help page says
>
>
> cache(x)’ and ‘cache(x) <- value’: Adds (downloads) all resources in
>
>   ‘x’, or removes all local resources corresponding to the
>
>   records in ‘x’ from the cache.
>
>
> "download" seems like a reasonable name for part of this functionality.
>  "cache<-" seems
>
> to be concerned mainly with deletion.  I can certainly define private
> alternate terms for these tasks
>
> in my .Rprofile but I do think a closer correspondence of function name to
> action could pay off.
>
>
>
> On Tue, Sep 15, 2015 at 10:34 AM, Kasper Daniel Hansen <
> kasperdanielhan...@gmail.com> wrote:
>
>> On Tue, Sep 15, 2015 at 12:25 AM, Morgan, Martin <
>> martin.mor...@roswellpark.org> wrote:
>>
>> > Hi Kasper -- we'll try to act on these, but some comments / looking for
>> > clarification...
>> >
>> > > -Original Message-
>> > > From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf
>> Of
>> > > Kasper Daniel Hansen
>> > > Sent: Monday, September 14, 2015 10:45 PM
>> > > To: bioc-devel@r-project.org
>> > > Subject: [Bioc-devel] AnnotationHub: cleanup
>> > >
>> > > I currently have the `pleasure` of dealing with students who have
>> > problems
>> > > with installing AnnotationHub and/or downloading resources.  Here are
>> > some
>> > > comments including some possible bug reports.
>> >
>> > I hope this is on the whole a positive experience, and we'll do what we
>> > can to make it better.
>> >
>>
>> Well, I love the package and I love it even more having prepared material
>> on it.  And the people who complain is of course enriched for people who
>> have problems - no way to know if it just works for most people.
>>
>> And of course right now it is more troublesome since I prepared the class
>> using R-3.2.1 and then 3.2.2 was released just before we started and had
>> the http -> https change which is an obvious suspect when people have
>> download problems :)
>>
>>  > 1) I think it is extremely dangerous that `cache(ahub)` starts by
>> asking
>> to
>>
>> > > download all resources!  May I suggest this only happens with a
>> specific
>> > > setting like `cache(ahub, download=TRUE)` or something similar.
>> >
>> > >
>> > > 2) `cache(ahub)` deletes all cached information, except the sqlite
>> > database.
>> > > Could we get a way to remove everything?
>> > >
>> > > 3) While I can understand the difference between cache and hubCache, I
>> > > would suggest that hubCache(ahub) = NULL removes all cached material
>> > > included the sqlite database.
>> >
>> > For each of the above the envisioned use case was that  'hub' is a
>> subset,
>> > eg.,
>> >
>> 

Re: [Bioc-devel] AnnotationHub: cleanup

2015-09-15 Thread Kasper Daniel Hansen
On Tue, Sep 15, 2015 at 12:25 AM, Morgan, Martin <
martin.mor...@roswellpark.org> wrote:

> Hi Kasper -- we'll try to act on these, but some comments / looking for
> clarification...
>
> > -Original Message-
> > From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf Of
> > Kasper Daniel Hansen
> > Sent: Monday, September 14, 2015 10:45 PM
> > To: bioc-devel@r-project.org
> > Subject: [Bioc-devel] AnnotationHub: cleanup
> >
> > I currently have the `pleasure` of dealing with students who have
> problems
> > with installing AnnotationHub and/or downloading resources.  Here are
> some
> > comments including some possible bug reports.
>
> I hope this is on the whole a positive experience, and we'll do what we
> can to make it better.
>

Well, I love the package and I love it even more having prepared material
on it.  And the people who complain is of course enriched for people who
have problems - no way to know if it just works for most people.

And of course right now it is more troublesome since I prepared the class
using R-3.2.1 and then 3.2.2 was released just before we started and had
the http -> https change which is an obvious suspect when people have
download problems :)

 > 1) I think it is extremely dangerous that `cache(ahub)` starts by asking
to

> > download all resources!  May I suggest this only happens with a specific
> > setting like `cache(ahub, download=TRUE)` or something similar.
>
> >
> > 2) `cache(ahub)` deletes all cached information, except the sqlite
> database.
> > Could we get a way to remove everything?
> >
> > 3) While I can understand the difference between cache and hubCache, I
> > would suggest that hubCache(ahub) = NULL removes all cached material
> > included the sqlite database.
>
> For each of the above the envisioned use case was that  'hub' is a subset,
> eg.,
>
>   subhub = query(hub, c("homo", "ensembl", "81"))
>
> and the user wanted to manipulate all records in the sub-hub.
> cache(subhub) asks about the 'really download" if the size of the (sub)hub
> is greater than hubOption("MAX_DOWNLOADS"), which by default is 10; it
> seems like asking is the same as requiring an argument? fileName(subhub)
> may be closer to what you're looking for...? the path to the file name, or
> NA if It is not in the cache.
>
> For cache(subhub) = NULL it wouldn't make sense to delete 5 resources AND
> the sqlite file for the entire hub.
>
> The sqlite file can be discovered with dbfile(hub) / dbfile(subhub), and
> removed with file.remove(dbfile(subhub))). In some ways it wasn't
> envisioned that this manual manipulation would be a common use case (!).


Ok.  Let me perhaps rephrase my wish list
1) some easy way to reset the entire cache issue, with emphasis on easy.
This is most likely to be used by beginners.  Who it's done, I don't care
to much about.  And I suggest a heading in ?AnnotationHub called something
like "Flushing the cache" or something.
2) It seems natural that there is a way (for problem reporting) to report
which resources are cached, which is (again) easy and does not involve
download.  I don't care if it is cache() or some other name.

> 4) It seems that AnnotationHub in the release version of Bioconductor
> > defaults to using https://.  Wasn't full support for https://
> introduced in R
> > 3.2.2; if so, it seems to be a critical bug that it is using https://
>
> AnnotationHub uses httr::GET and ultimately curl::curl_fetch_disk rather
> than native R support, so what R does is not directly relevant. From ?curl
>
>  Drop-in replacement for base 'url' that supports https, ftps,
>  gzip, deflate, etc. Default behavior is identical to 'url', but
>  request can be fully configured by passing a custom 'handle'.
>
> So I wonder what the actual problem is?
>

Interesting.  Well, at least one user is behind a proxy and uses the tips
in ?download.file to set a proxy server.  Perhaps that doesn't work with
httr?  I don't know.  But there are more than one person with problems.

> 5) Perhaps it should be considered that the default hubCache path is
> > versioned, perhaps with Bioc version, perhaps with something else.  This
> > might cause problems for people running multiple versions of R.
>
> The data base is supposed to handle versioning, so if you've populated the
> cache with Bioc 3.2 and are now accessing the cache with Bioc 3.1, only the
> 3.1 resources are visible. The hope was to avoid multiple copies of these
> possibly large resources.


That sounds pretty nifty.. I was thinking re-design of the database issues.


> 6) I strongly suggest that the output printed when retrieving a

[Bioc-devel] AnnotationHub: cleanup

2015-09-14 Thread Kasper Daniel Hansen
I currently have the `pleasure` of dealing with students who have problems
with installing AnnotationHub and/or downloading resources.  Here are some
comments including some possible bug reports.

1) I think it is extremely dangerous that `cache(ahub)` starts by asking to
download all resources!  May I suggest this only happens with a specific
setting like `cache(ahub, download=TRUE)` or something similar.

2) `cache(ahub)` deletes all cached information, except the sqlite
database.  Could we get a way to remove everything?

3) While I can understand the difference between cache and hubCache, I
would suggest that hubCache(ahub) = NULL removes all cached material
included the sqlite database.

4) It seems that AnnotationHub in the release version of Bioconductor
defaults to using https://.  Wasn't full support for https:// introduced in
R 3.2.2; if so, it seems to be a critical bug that it is using https://

5) Perhaps it should be considered that the default hubCache path is
versioned, perhaps with Bioc version, perhaps with something else.  This
might cause problems for people running multiple versions of R.

6) I strongly suggest that the output printed when retrieving an
AnnotationHub resource includes the download url.

7) If you run AnnotationHub without having GenomicRanges / rtracklayer
installed, it downloads the resource and then pangs out with an error.  To
me it seems more natural to pang out with an error immediately, especially
since when it works, it appears from message printing that loading the
library happens prior to download.

Best,
Kasper

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] AnnotationHub: cleanup

2015-09-14 Thread Morgan, Martin
Hi Kasper -- we'll try to act on these, but some comments / looking for 
clarification...

> -Original Message-
> From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf Of
> Kasper Daniel Hansen
> Sent: Monday, September 14, 2015 10:45 PM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] AnnotationHub: cleanup
> 
> I currently have the `pleasure` of dealing with students who have problems
> with installing AnnotationHub and/or downloading resources.  Here are some
> comments including some possible bug reports.

I hope this is on the whole a positive experience, and we'll do what we can to 
make it better.

> 
> 1) I think it is extremely dangerous that `cache(ahub)` starts by asking to
> download all resources!  May I suggest this only happens with a specific
> setting like `cache(ahub, download=TRUE)` or something similar.

> 
> 2) `cache(ahub)` deletes all cached information, except the sqlite database.
> Could we get a way to remove everything?
> 
> 3) While I can understand the difference between cache and hubCache, I
> would suggest that hubCache(ahub) = NULL removes all cached material
> included the sqlite database.

For each of the above the envisioned use case was that  'hub' is a subset, eg.,

  subhub = query(hub, c("homo", "ensembl", "81"))

and the user wanted to manipulate all records in the sub-hub. cache(subhub) 
asks about the 'really download" if the size of the (sub)hub is greater than 
hubOption("MAX_DOWNLOADS"), which by default is 10; it seems like asking is the 
same as requiring an argument? fileName(subhub) may be closer to what you're 
looking for...? the path to the file name, or NA if It is not in the cache.

For cache(subhub) = NULL it wouldn't make sense to delete 5 resources AND the 
sqlite file for the entire hub.

The sqlite file can be discovered with dbfile(hub) / dbfile(subhub), and 
removed with file.remove(dbfile(subhub))). In some ways it wasn't envisioned 
that this manual manipulation would be a common use case (!).

> 
> 4) It seems that AnnotationHub in the release version of Bioconductor
> defaults to using https://.  Wasn't full support for https:// introduced in R
> 3.2.2; if so, it seems to be a critical bug that it is using https://

AnnotationHub uses httr::GET and ultimately curl::curl_fetch_disk rather than 
native R support, so what R does is not directly relevant. From ?curl

 Drop-in replacement for base 'url' that supports https, ftps,
 gzip, deflate, etc. Default behavior is identical to 'url', but
 request can be fully configured by passing a custom 'handle'.

So I wonder what the actual problem is?

> 5) Perhaps it should be considered that the default hubCache path is
> versioned, perhaps with Bioc version, perhaps with something else.  This
> might cause problems for people running multiple versions of R.

The data base is supposed to handle versioning, so if you've populated the 
cache with Bioc 3.2 and are now accessing the cache with Bioc 3.1, only the 3.1 
resources are visible. The hope was to avoid multiple copies of these possibly 
large resources.

> 6) I strongly suggest that the output printed when retrieving an
> AnnotationHub resource includes the download url.

Ok something that's easy to do! Sometimes this will be cryptic (when the 
resource is cached in the AnnotationHub server, rather than being retrieved 
from the original source)

> 7) If you run AnnotationHub without having GenomicRanges / rtracklayer
> installed, it downloads the resource and then pangs out with an error.  To me
> it seems more natural to pang out with an error immediately, especially since
> when it works, it appears from message printing that loading the library
> happens prior to download.

I guess by 'run AnnotationHub' you mean retrieve a specific resource?

The import recipes generally start by require()ing the necessary libraries. I 
spotted a couple of recipes that didn't follow this convention (for 2bit and 
chain file resources from rtracklayer; none that involved GenomicRanges). Are 
there specific examples?

Martin

> 
> Best,
> Kasper
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel