Re: [Bioc-devel] as.list of a GRanges

2018-02-19 Thread Hervé Pagès

Hi Renan,

Most packages affected by these changes are packages that loop on
the individual ranges of a GRanges object. They generally don't
call as.list() directly but use something like lapply(), vapply(),
sapply(), Map(), Reduce(), etc... All these functions indeed call
as.list() internally on the supplied object before looping on it.
Just to clarify, when I say I found a dozen of Bioconductor packages
in the entire software repo where as.list() was used on a GRanges
object, I'm counting all the packages that use it explicitly or
implicitly. This includes signeR, which I had on my list of packages
to fix.

BTW in this particular instance, I would recommend doing

reduce(granges, drop.empty.ranges=TRUE)

instead of

Reduce(union, as(granges, "GRangesList"))

reduce() walks on the individual ranges of the supplied object at
the C level so is much faster than performing a binary union in
an R loop. It should also be more memory efficient.

Cheers,
H.


On 02/16/2018 09:02 AM, Renan Valieris wrote:

FWIW, this change also affects code that don't call as.list() explicitly.

such as calling Reduce(union, granges), Reduce is implemented on base, and
will call as.list() if the predicate isn't a vector already.

I understand it wasn't intended to be used this way, but with this in mind
there are more packages potentially affected by the change.

On Fri, Feb 16, 2018 at 1:25 PM, Nathan Sheffield 
wrote:


For what it's worth, my package (LOLA) was one that used as.list on a
GRanges or GRangesList, and those calls were broken by changes to devel.
Since I was also pushing changes at the time, I assumed the devel build
errors were due to my updates -- I spent quite a bit of time trying to
figure out what was wrong before I realized this breakage was not caused by
my updates, but by upstream changes in GRanges...eventually I tracked down
errors to as.list (and ultimately, found other errors, which we discussed
earlier on this list), but my conclusion from this was that, from my
perspective, using the deployed bioc devel as a way to test for what
refactoring will break doesn't seem like the ideal way to go -- I assumed
that generally, other package changes wouldn't typically be pushed that
would break my package's build, so it devalued the role of the dev builds
and reduced my confidence in using that (now when I see error I may assume
it's something else, and wait a few days, instead of diving right in to try
to solve the problem).

I like the idea of temporarily restoring as.list with a deprecation
message -- also, as a general development philosophy going forward in terms
of testing on devel. This would have saved me a lot of time troubleshooting
in this instance.

Just my 2 cents.

-Nathan



On 02/16/2018 02:57 AM, Bernat Gel wrote:


Hi Hervé and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour" but more a
"works as expected, even if not clearly documented" behaviour.

In any case I can change the code to as(gr, "GRangesList") as suggested.

Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat 
www.germanstrias.org 










El 02/15/2018 a las 11:19 PM, Hervé Pagès escribió:


On 02/15/2018 01:57 PM, Michael Lawrence wrote:




On Thu, Feb 15, 2018 at 1:45 PM, Hervé Pagès > wrote:

 On 02/15/2018 11:53 AM, Cook, Malcolm wrote:

 Hi,

 Can I ask, is this change under discussion in current release or
 so far in Bioconductor devel only (my assumption)?


 Bioconductor devel only.


> On 02/15/2018 08:37 AM, Michael Lawrence wrote:
> > So is as.list() no longer supported for GRanges objects?
 I have found it
> > useful in places.
>
> Very few places. I found a dozen of them in the entire
 software repo.

 However there are probably more in the wild...


 What as.list() was doing on a GRanges object was not documented.
Relying
 on some kind of obscure undocumented feature is never a good idea.


There's 

Re: [Bioc-devel] as.list of a GRanges

2018-02-19 Thread Hervé Pagès

On 02/19/2018 06:43 AM, Michael Lawrence wrote:



On Mon, Feb 19, 2018 at 2:10 AM, Bernat Gel > wrote:


Hi Hervé,

I completely agree with the goal of having the semantics of
list-like operations standardised and documented to avoid surprises,
and if to do so, the current use of as.list must be changed I'm
pefectly ok with that. I had not seen the strange behaviour with
IRanges, 



Just want to point out that it's important to keep in mind that many of 
our users never use IRanges directly, so consistency is not an absolute 
requirement.


Even if you only use GRanges objects, it's confusing that lapply()
works on them but not mapply(). The undergoing changes will also
address inconsistencies within the GRanges API, not just the
inconsistencies between the GRanges and IRanges APIs.

H.



so I was not aware of the problem.

In any case, thanks for fixing (and simplifying) karyoploteR. In
retrospective I don't know why I didn't use simple vectorization!
So, thanks


Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068 
Fax: (+34) 93 497 8654 
08916 Badalona, Barcelona, Spain
b...@igtp.cat  >
www.germanstrias.org


>

>







El 02/17/2018 a las 04:19 AM, Hervé Pagès escribió:

Hi Bernat,

On 02/15/2018 11:57 PM, Bernat Gel wrote:

Hi Hervé and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour"
but more a "works as expected, even if not clearly
documented" behaviour.


Most users/developers will probably agree that as.list() worked
as expected on a GRanges object. But then they'll be surprised
and confused when they use it on an IRanges object and discover
that it does something completely different. The current effort
is to bring more consistency between GRanges and IRanges objects
and to have their list-like semantics aligned and documented so
there will be no more such surprise.


In any case I can change the code to as(gr, "GRangesList")
as suggested.


I went ahead and fixed karyoploteR. This is karyoploteR 1.5.2. Make
sure to resync your GitHub repo by following the instructions here:



https://bioconductor.org/developers/how-to/git/sync-existing-repositories/




Note that the loop on the GRanges object (via the call to Map())
was not needed and could be replaced with a solution that uses
proper vectorization.

Best,
H.


Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer
(PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068 
Fax: (+34) 93 497 8654 
08916 Badalona, Barcelona, Spain
b...@igtp.cat  >
www.germanstrias.org


Re: [Rd] [parallel] fixes load balancing of parLapplyLB

2018-02-19 Thread Henrik Bengtsson
Hi, I'm trying to understand the rationale for your proposed amount of
splitting and more precisely why that one is THE one.

If I put labels on your example numbers in one of your previous post:

 nbrOfElements <- 97
 nbrOfWorkers <- 5

With these, there are two extremes in how you can split up the
processing in chunks such that all workers are utilized:

(A) Each worker, called multiple times, processes one element each time:

> nbrOfElements <- 97
> nbrOfWorkers <- 5
> nbrOfChunks <- nbrOfElements
> sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[30] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[59] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[88] 1 1 1 1 1 1 1 1 1 1


(B) Each worker, called once, processes multiple element:

> nbrOfElements <- 97
> nbrOfWorkers <- 5
> nbrOfChunks <- nbrOfWorkers
> sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)
[1] 20 19 19 19 20

I understand that neither of these two extremes may be the best when
it comes to orchestration overhead and load balancing. Instead, the
best might be somewhere in-between, e.g.

(C) Each worker, called multiple times, processing multiple elements:

> nbrOfElements <- 97
> nbrOfWorkers <- 5
> nbrOfChunks <- nbrOfElements / nbrOfWorkers
> sapply(parallel:::splitList(1:nbrOfElements, nbrOfChunks), length)
 [1] 5 5 5 5 4 5 5 5 5 5 4 5 5 5 5 4 5 5 5 5

However, there are multiple alternatives between the two extremes, e.g.

> nbrOfChunks <- scale * nbrOfElements / nbrOfWorkers

So, is there a reason why you argue for scale = 1.0 to be the optimal?

FYI, In future.apply::future_lapply(X, FUN, ...) there is a
'future.scheduling' scale factor(*) argument where default
future.scheduling = 1 corresponds to (B) and future.scheduling = +Inf
to (A).  Using future.scheduling = 4 achieves the amount of
load-balancing you propose in (C).   (*) Different definition from the
above 'scale'. (Disclaimer: I'm the author)

/Henrik

On Mon, Feb 19, 2018 at 10:21 AM, Christian Krause
 wrote:
> Dear R-Devel List,
>
> I have installed R 3.4.3 with the patch applied on our cluster and ran a 
> *real-world* job of one of our users to confirm that the patch works to my 
> satisfaction. Here are the results.
>
> The original was a series of jobs, all essentially doing the same stuff using 
> bootstrapped data, so for the original there is more data and I show the 
> arithmetic mean with standard deviation. The confirmation with the patched R 
> was only a single instance of that series of jobs.
>
> ## Job Efficiency
>
> The job efficiency is defined as (this is what the `qacct-efficiency` tool 
> below does):
>
> ```
> efficiency = cputime / cores / wallclocktime * 100%
> ```
>
> In simpler words: how well did the job utilize its CPU cores. It shows the 
> percentage of time the job was actually doing stuff, as opposed to the 
> difference:
>
> ```
> wasted = 100% - efficiency
> ```
>
> ... which, essentially, tells us how much of the resources were wasted, i.e. 
> CPU cores just idling, without being used by anyone. We care a lot about that 
> because, for our scientific computing cluster, wasted resources is like 
> burning money.
>
> ### original
>
> This is the entire series from our job accounting database, filteres the 
> successful jobs, calculates efficiency and then shows the average and 
> standard deviation of the efficiency:
>
> ```
> $ qacct -j 4433299 | qacct-success | qacct-efficiency | meansd
> n=945 ∅ 61.7276 ± 7.78719
> ```
>
> This is the entire series from our job accounting database, filteres the 
> successful jobs, calculates efficiency and does sort of a histogram-like 
> binning before calculation of mean and standard deviation (to get a more 
> detailed impression of the distribution when standard deviation of the 
> previous command is comparatively high):
>
> ```
> $ qacct -j 4433299 | qacct-success | qacct-efficiency | meansd-bin -w 10 | 
> sort -gk1 | column -t
> 10  -  20  ->  n=3∅  19.216667   ±  0.9112811494447459
> 20  -  30  ->  n=6∅  26.418  ±  2.665996374091058
> 30  -  40  ->  n=12   ∅  35.115834   ±  2.8575783082671196
> 40  -  50  ->  n=14   ∅  45.35285714285715   ±  2.98623361591005
> 50  -  60  ->  n=344  ∅  57.114593023255814  ±  2.1922005551774415
> 60  -  70  ->  n=453  ∅  64.29536423841049   ±  2.8334788433963856
> 70  -  80  ->  n=108  ∅  72.95592592592598   ±  2.5219474143639276
> 80  -  90  ->  n=5∅  81.526  ±  1.2802265424525452
> ```
>
> I have attached an example graph from our monitoring system of a single 
> instance in my previous mail. There you can see that the load balancing does 
> not actually work, i.e. same as `parLapply`. This reflects in the job 
> efficiency.
>
> ### patch applied
>
> This is the single instance I used to confirm that the patch works:
>
> ```
> $ qacct -j 4562202 | qacct-efficiency
> 

Re: [Bioc-devel] rsvg on mac

2018-02-19 Thread Alexey Sergushichev
Valerie, thanks. Will try to ask there.

However, after looking through the mailing list it looks like R-devel
builds for OS X aren't trivial and aren't part of CRAN...

--
Alexey

On Mon, Feb 19, 2018 at 9:05 PM, Obenchain, Valerie <
valerie.obench...@roswellpark.org> wrote:

> Hi,
>
> There has been some discussion of the devel Mac binaries on the R-SIG-Mac
> mailing list. That list would be the best place to ask this question.
>
> https://stat.ethz.ch/pipermail/r-sig-mac/2018-January/thread.html
>
> Valerie
>
>
>
> On 02/19/2018 08:32 AM, Alexey Sergushichev wrote:
>
> Valerie,
>
> Are there any estimates on how often CRAN OS X builds happen? There are
> still no builds for rsvg and other packages...
>
> Thanks,
> Alexey
>
> On Wed, Feb 7, 2018 at 8:09 PM, Obenchain, Valerie  roswellpark.org> wrote:
>
>> Hi Kevin,
>>
>> CRAN binaries for El Capitan in devel aren't available. You can see this
>> on the rsvg landing page:
>>
>> https://cran.r-project.org/web/packages/rsvg/index.html
>>
>> Nothing we can do until CRAN makes them available.
>>
>> Valerie
>>
>>
>> On 02/07/2018 08:29 AM, Kevin Horan wrote:
>>
>>
>> The ChemmineR build is failing on the mac due to a new dependency not
>> being available, the package "rsvg". Would it be possible to install
>> that on the mac build machine? Thanks.
>>
>> http://bioconductor.org/checkResults/devel/bioc-LATEST/
>> ChemmineR/merida2-install.html
>>
>>
>> Kevin
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>>
>>
>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee or
>> agent responsible for the delivery of this message to the intended
>> recipient(s), you are hereby notified that any disclosure, copying,
>> distribution, or use of this email message is prohibited.  If you have
>> received this message in error, please notify the sender immediately by
>> e-mail and delete this email message from your computer. Thank you.
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] rsvg on mac

2018-02-19 Thread Obenchain, Valerie
Hi,

There has been some discussion of the devel Mac binaries on the R-SIG-Mac 
mailing list. That list would be the best place to ask this question.

https://stat.ethz.ch/pipermail/r-sig-mac/2018-January/thread.html

Valerie


On 02/19/2018 08:32 AM, Alexey Sergushichev wrote:
Valerie,

Are there any estimates on how often CRAN OS X builds happen? There are still 
no builds for rsvg and other packages...

Thanks,
Alexey

On Wed, Feb 7, 2018 at 8:09 PM, Obenchain, Valerie 
> 
wrote:
Hi Kevin,

CRAN binaries for El Capitan in devel aren't available. You can see this on the 
rsvg landing page:

https://cran.r-project.org/web/packages/rsvg/index.html

Nothing we can do until CRAN makes them available.

Valerie


On 02/07/2018 08:29 AM, Kevin Horan wrote:


The ChemmineR build is failing on the mac due to a new dependency not
being available, the package "rsvg". Would it be possible to install
that on the mac build machine? Thanks.

http://bioconductor.org/checkResults/devel/bioc-LATEST/ChemmineR/merida2-install.html


Kevin

___
Bioc-devel@r-project.org>
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] [parallel] fixes load balancing of parLapplyLB

2018-02-19 Thread Christian Krause
Dear R-Devel List,

I have installed R 3.4.3 with the patch applied on our cluster and ran a 
*real-world* job of one of our users to confirm that the patch works to my 
satisfaction. Here are the results.

The original was a series of jobs, all essentially doing the same stuff using 
bootstrapped data, so for the original there is more data and I show the 
arithmetic mean with standard deviation. The confirmation with the patched R 
was only a single instance of that series of jobs.

## Job Efficiency

The job efficiency is defined as (this is what the `qacct-efficiency` tool 
below does):

```
efficiency = cputime / cores / wallclocktime * 100%
```

In simpler words: how well did the job utilize its CPU cores. It shows the 
percentage of time the job was actually doing stuff, as opposed to the 
difference:

```
wasted = 100% - efficiency
```

... which, essentially, tells us how much of the resources were wasted, i.e. 
CPU cores just idling, without being used by anyone. We care a lot about that 
because, for our scientific computing cluster, wasted resources is like burning 
money.

### original

This is the entire series from our job accounting database, filteres the 
successful jobs, calculates efficiency and then shows the average and standard 
deviation of the efficiency:

```
$ qacct -j 4433299 | qacct-success | qacct-efficiency | meansd
n=945 ∅ 61.7276 ± 7.78719
```

This is the entire series from our job accounting database, filteres the 
successful jobs, calculates efficiency and does sort of a histogram-like 
binning before calculation of mean and standard deviation (to get a more 
detailed impression of the distribution when standard deviation of the previous 
command is comparatively high):

```
$ qacct -j 4433299 | qacct-success | qacct-efficiency | meansd-bin -w 10 | sort 
-gk1 | column -t
10  -  20  ->  n=3∅  19.216667   ±  0.9112811494447459
20  -  30  ->  n=6∅  26.418  ±  2.665996374091058
30  -  40  ->  n=12   ∅  35.115834   ±  2.8575783082671196
40  -  50  ->  n=14   ∅  45.35285714285715   ±  2.98623361591005
50  -  60  ->  n=344  ∅  57.114593023255814  ±  2.1922005551774415
60  -  70  ->  n=453  ∅  64.29536423841049   ±  2.8334788433963856
70  -  80  ->  n=108  ∅  72.95592592592598   ±  2.5219474143639276
80  -  90  ->  n=5∅  81.526  ±  1.2802265424525452
```

I have attached an example graph from our monitoring system of a single 
instance in my previous mail. There you can see that the load balancing does 
not actually work, i.e. same as `parLapply`. This reflects in the job 
efficiency.

### patch applied

This is the single instance I used to confirm that the patch works:

```
$ qacct -j 4562202 | qacct-efficiency
97.36
```

The graph from our monitoring system is attached. As you can see, the load 
balancing works to a satisfying degree and the efficiency is well above 90% 
which was what I had hoped for :-)

## Additional Notes

The list used in this jobs `parLapplyLB` is 5812 elements long. With the 
`splitList`-chunking from the patch, you'll get 208 lists of about 28 elements 
(208 chunks of size 28). The job ran on 28 CPU cores and had a wallclock time 
of 120351.590 seconds, i.e. 33.43 hours. Thus, the function we apply to our 
list takes about 580 seconds per list element, i.e. about 10 minutes. I 
suppose, for that runtime, we would get even better load balancing if we would 
reduce the chunk size even further, maybe even down to 1, thus getting our 
efficiency even closer to 100%.

Of course, for really short-running functions, a higher chunk size may be more 
efficient because of the overhead. In our case, the overhead is negligible and 
that is why the low chunk size works really well. In contrast, for smallish 
lists with short-running functions, you might not even need load balancing and 
`parLapply` suffices. It only becomes an issue, when the runtime of the 
function is high and / or varying.

In our case, the entire runtime of the entire series of jobs was:

```
$ qacct -j 4433299 | awk '$1 == "wallclock" { sum += $2 } END { print sum, 
"seconds" }'
4.72439e+09 seconds
```

Thats about 150 years on a single core or 7.5 years on a 20 core server! Our 
user was constantly using about 500 cores, so this took about 110 days. If you 
compare this to my 97% efficiency example, the jobs could have been finished in 
75 days instead ;-)

## Upcoming Patch

If this patch gets applied to the R code base (and I hope it will :-)) my 
colleague and I will submit another patch that adds the chunk size as an 
optional parameter to all off the load balancing functions. With that 
parameter, users of these functions *can* decide for themselves which chunk 
size they prefer for their code. As mentioned before, the most efficient chunk 
size depends on the used functions runtime, which is the only thing R does not 
know and users really should be allowed to specify explicitly. The default of 
this new optional parameter would be 

Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Vincent Carey
On Mon, Feb 19, 2018 at 11:27 AM, Alexey Sergushichev 
wrote:

> Kevin,
>
> > It does not request users to make R-devel a _requirement_ of their
> package.
>
> Sadly it does for new packages. New packages submitted to Bioconductor 3.7
> are _required_ to have R >= 3.5 dependency, otherwist BiocCheck will result
> in a warning (
> https://github.com/Bioconductor/BiocCheck/blob/
> be9cd6e36d95f8bf873b52427d2a97fce6fbb9b9/R/checks.R#L23)
> and warnings aren't allowed for new package submission.
>
> > Here, I think the decision here boils down to how far back in terms of R
> versions the developer is willing to support the package. I suppose one
> could state R≥2.3 if they're confident about it.
>
> That's the problem: this is true for packages already in Bioconductor, but
> it's not ture for the new package submissions.
>
> Aaron,
>
> > Personally, I haven't found it to be particularly difficult to update R,
> > or to run R-devel in parallel with R 3.4, even without root privileges.
>
> I find it much harder for a normal user to install R-devel (and update it
> properly, because it's a development version) and running
> 'devtools::install_github("blabla/my_package")'.
>
> > I think many people underappreciate the benefits of moving to the latest
> > version of R.
>
> Don't you think it should be a developer's choice whether to use such new
> features or ignore them and have a potentially bigger audience?
>

It _is_ the developer's choice.  But a developer of packages for the
Bioconductor
project commits to using R-devel during certain pre-release phases,
depending
on proximity in time to a point release of R.  (See
http://bioconductor.org/developers/how-to/useDevel/)
for full details.)  BiocCheck verifies that this commitment is met.


>
> > Enforcing version consistency avoids heartache during release and
> > debugging.
>
> But it's a developer's heartache. As I said, it even can't be attributed to
> Bioconductor at all, as it's not possible to install the package from
> bioc-devel, unless you have the corresponding R version.
>
>
> --
> Alexey
>
>
>
> On Mon, Feb 19, 2018 at 6:38 PM, Aaron Lun  wrote:
>
> > I'll just throw in my two cents here.
> >
> > I think many people underappreciate the benefits of moving to the latest
> > version of R. If you inspect the R-devel NEWS file, there's a couple of
> > nice fixes/features that a developer might want to take advantage of:
> >
> > - sum() doesn't give NAs upon integer overflow anymore.
> > - New ...elt(n) and ...length() functions for dealing with ellipses.
> > - ALTREP support for 1:n sequences (wow!)
> > - zero length subassignment in a non-zero index fails correctly.
> >
> > The previous 3.4.0 release also added support for more DLLs being loaded
> > at once, which was otherwise causing headaches in workflows. And 3.4.2
> > had a bug fix to LAPACK, which did result in a few user-level changes in
> > some packages like edgeR. So there are considerable differences between
> > the versions of R, especially if one is a package developer.
> >
> > Enforcing version consistency avoids heartache during release and
> > debugging. There's a choice between users getting annoyed about having
> > to update R, and then updating R, and everything working as a result; or
> > everyone (developers/users) wasting some time figuring out whether a bug
> > in a package is due to the code in the package itself or the version of
> > R. The brief annoyance in the first option is better than the chronic
> > grief of the second option, especially given that the solution to the
> > problem in the second option would be to update R anyway.
> >
> > Personally, I haven't found it to be particularly difficult to update R,
> > or to run R-devel in parallel with R 3.4, even without root privileges.
> >
> > -Aaron
> >
> > On 19/02/18 14:55, Kevin RUE wrote:
> > > Hi Alexey,
> > >
> > > I do agree with you that there is no harm in testing against other
> > version
> > > of R. In a way, that is even good practice, considering that many HPC
> > users
> > > do not always have access to the latest version of R, and that Travis
> is
> > > making this fairly easy.
> > >
> > > Now, with regard to your latest reply, I am wondering whether we're
> > having
> > > confusion here between the "R≥x.x" requirement, and the version(s) of R
> > > that you use to develop/test your package (the version of R installed
> on
> > > your own machine).
> > >
> > > First, I think the "R≥x.x" does not have an explicit rule.
> > > To me, the point of this requirement is to declare the oldest version
> of
> > R
> > > that the package has been tested/validated for. This does not
> necessarily
> > > have to be the _next_ version of R (see the core Bioc package
> S4Vectors:
> > > https://bioconductor.org/packages/release/bioc/html/S4Vectors.html,
> and
> > I
> > > am sure there are older requirements in other packages).
> > > Here, I think the decision here boils down to how far back 

Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Aaron Lun
>  > Personally, I haven't found it to be particularly difficult to update R,
>  > or to run R-devel in parallel with R 3.4, even without root privileges.
> 
> I find it much harder for a normal user to install R-devel (and update 
> it properly, because it's a development version) and running 
> 'devtools::install_github("blabla/my_package")'.

There seem to be two issues here.

The first is regarding the usability of your specific package. For this, 
Kevin's suggestion (and what you are already doing) is pretty 
reasonable. It's just a branch with a single altered commit (>= 3.5 to 
 >= 3.4); it costs nothing, and you can delete it later.

However, this "solution" will only last until the next BioC release, at 
which point biocLite() will only work on R 3.5.*. So, sooner or later, 
your users will have to update their versions of R.

Which leads us to the second question. Should Bioconductor, as a 
project, enforce the use of the latest R version? The core team will 
have better things to say than me on this topic, but for me, the answer 
is an unqualified yes. We get the latest features, bugfixes and 
improvements; a considerable set of benefits, IMHO.

>  > I think many people underappreciate the benefits of moving to the latest
>  > version of R.
> 
> Don't you think it should be a developer's choice whether to use such 
> new features or ignore them and have a potentially bigger audience?

It's true that a developer might not need the latest cutting-edge 
features in the latest version of R. But they should incorporate bug 
fixes to the underlying infrastructure, or changes to existing 
functionality that result in different behaviour.

Of course, it would be difficult to ask every developer to read through 
the NEWS to see if the changes affect their package. It is much easier 
for everyone to just use the latest version of R; then we only have to 
deal with bugs in the latest version, not previously solved ones.

And besides; let's say, hypothetically, BioC didn't have a R version 
requirement. Unless you're using a quite restricted subset of packages, 
you'll encounter a package somewhere that requires the latest R version. 
In my workflows, I know that I load at least 100 packages; only one of 
them needs to have R (>= 3.5) to force me to upgrade anyway.

>  > Enforcing version consistency avoids heartache during release and
>  > debugging.
> 
> But it's a developer's heartache. As I said, it even can't be attributed 
> to Bioconductor at all, as it's not possible to install the package from 
> bioc-devel, unless you have the corresponding R version.

Yes, that's the point. To paraphrase what I tell my colleagues:

Bugs in a BioC-release package with R 3.4 = my problem
Bugs in a BioC-devel package with R 3.5 = my problem
Bugs in a BioC-devel package with R 3.4 = not my problem

 From my perspective, the version requirements in biocLite() ensure that 
the user is doing things properly; and if they follow the rules, any 
bugs are therefore the fault of my package. If the users don't follow 
the rules, they're on their own - but at least they know what the rules 
are, because it's pretty inconvenient to break them.

Cheers,

Aaron

> On Mon, Feb 19, 2018 at 6:38 PM, Aaron Lun  > wrote:
> 
> I'll just throw in my two cents here.
> 
> I think many people underappreciate the benefits of moving to the latest
> version of R. If you inspect the R-devel NEWS file, there's a couple of
> nice fixes/features that a developer might want to take advantage of:
> 
> - sum() doesn't give NAs upon integer overflow anymore.
> - New ...elt(n) and ...length() functions for dealing with ellipses.
> - ALTREP support for 1:n sequences (wow!)
> - zero length subassignment in a non-zero index fails correctly.
> 
> The previous 3.4.0 release also added support for more DLLs being loaded
> at once, which was otherwise causing headaches in workflows. And 3.4.2
> had a bug fix to LAPACK, which did result in a few user-level changes in
> some packages like edgeR. So there are considerable differences between
> the versions of R, especially if one is a package developer.
> 
> Enforcing version consistency avoids heartache during release and
> debugging. There's a choice between users getting annoyed about having
> to update R, and then updating R, and everything working as a result; or
> everyone (developers/users) wasting some time figuring out whether a bug
> in a package is due to the code in the package itself or the version of
> R. The brief annoyance in the first option is better than the chronic
> grief of the second option, especially given that the solution to the
> problem in the second option would be to update R anyway.
> 
> Personally, I haven't found it to be particularly difficult to update R,
> or to run R-devel in parallel with R 3.4, even without root privileges.
> 
> 

Re: [Bioc-devel] rsvg on mac

2018-02-19 Thread Alexey Sergushichev
Valerie,

Are there any estimates on how often CRAN OS X builds happen? There are
still no builds for rsvg and other packages...

Thanks,
Alexey

On Wed, Feb 7, 2018 at 8:09 PM, Obenchain, Valerie <
valerie.obench...@roswellpark.org> wrote:

> Hi Kevin,
>
> CRAN binaries for El Capitan in devel aren't available. You can see this
> on the rsvg landing page:
>
> https://cran.r-project.org/web/packages/rsvg/index.html
>
> Nothing we can do until CRAN makes them available.
>
> Valerie
>
>
> On 02/07/2018 08:29 AM, Kevin Horan wrote:
>
>
> The ChemmineR build is failing on the mac due to a new dependency not
> being available, the package "rsvg". Would it be possible to install
> that on the mac build machine? Thanks.
>
> http://bioconductor.org/checkResults/devel/bioc-LATEST/ChemmineR/merida2-
> install.html
>
>
> Kevin
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Alexey Sergushichev
Kevin,

> It does not request users to make R-devel a _requirement_ of their
package.

Sadly it does for new packages. New packages submitted to Bioconductor 3.7
are _required_ to have R >= 3.5 dependency, otherwist BiocCheck will result
in a warning (
https://github.com/Bioconductor/BiocCheck/blob/be9cd6e36d95f8bf873b52427d2a97fce6fbb9b9/R/checks.R#L23)
and warnings aren't allowed for new package submission.

> Here, I think the decision here boils down to how far back in terms of R
versions the developer is willing to support the package. I suppose one
could state R≥2.3 if they're confident about it.

That's the problem: this is true for packages already in Bioconductor, but
it's not ture for the new package submissions.

Aaron,

> Personally, I haven't found it to be particularly difficult to update R,
> or to run R-devel in parallel with R 3.4, even without root privileges.

I find it much harder for a normal user to install R-devel (and update it
properly, because it's a development version) and running
'devtools::install_github("blabla/my_package")'.

> I think many people underappreciate the benefits of moving to the latest
> version of R.

Don't you think it should be a developer's choice whether to use such new
features or ignore them and have a potentially bigger audience?

> Enforcing version consistency avoids heartache during release and
> debugging.

But it's a developer's heartache. As I said, it even can't be attributed to
Bioconductor at all, as it's not possible to install the package from
bioc-devel, unless you have the corresponding R version.


--
Alexey



On Mon, Feb 19, 2018 at 6:38 PM, Aaron Lun  wrote:

> I'll just throw in my two cents here.
>
> I think many people underappreciate the benefits of moving to the latest
> version of R. If you inspect the R-devel NEWS file, there's a couple of
> nice fixes/features that a developer might want to take advantage of:
>
> - sum() doesn't give NAs upon integer overflow anymore.
> - New ...elt(n) and ...length() functions for dealing with ellipses.
> - ALTREP support for 1:n sequences (wow!)
> - zero length subassignment in a non-zero index fails correctly.
>
> The previous 3.4.0 release also added support for more DLLs being loaded
> at once, which was otherwise causing headaches in workflows. And 3.4.2
> had a bug fix to LAPACK, which did result in a few user-level changes in
> some packages like edgeR. So there are considerable differences between
> the versions of R, especially if one is a package developer.
>
> Enforcing version consistency avoids heartache during release and
> debugging. There's a choice between users getting annoyed about having
> to update R, and then updating R, and everything working as a result; or
> everyone (developers/users) wasting some time figuring out whether a bug
> in a package is due to the code in the package itself or the version of
> R. The brief annoyance in the first option is better than the chronic
> grief of the second option, especially given that the solution to the
> problem in the second option would be to update R anyway.
>
> Personally, I haven't found it to be particularly difficult to update R,
> or to run R-devel in parallel with R 3.4, even without root privileges.
>
> -Aaron
>
> On 19/02/18 14:55, Kevin RUE wrote:
> > Hi Alexey,
> >
> > I do agree with you that there is no harm in testing against other
> version
> > of R. In a way, that is even good practice, considering that many HPC
> users
> > do not always have access to the latest version of R, and that Travis is
> > making this fairly easy.
> >
> > Now, with regard to your latest reply, I am wondering whether we're
> having
> > confusion here between the "R≥x.x" requirement, and the version(s) of R
> > that you use to develop/test your package (the version of R installed on
> > your own machine).
> >
> > First, I think the "R≥x.x" does not have an explicit rule.
> > To me, the point of this requirement is to declare the oldest version of
> R
> > that the package has been tested/validated for. This does not necessarily
> > have to be the _next_ version of R (see the core Bioc package S4Vectors:
> > https://bioconductor.org/packages/release/bioc/html/S4Vectors.html, and
> I
> > am sure there are older requirements in other packages).
> > Here, I think the decision here boils down to how far back in terms of R
> > versions the developer is willing to support the package. I suppose one
> > could state R≥2.3 if they're confident about it.
> >
> > On a separate note, going back to the Bioc guideline that I initially
> > highlighted ("Package authors should develop against the version of *R*
> that
> > will be available to users when the *Bioconductor* devel branch becomes
> the
> > *Bioconductor* release branch."), this rather refers to the
> forward-looking
> > guideline that the cutting-edge version of any R package should be
> > compatible with the cutting edge version of R, and that developers should
> > be 

Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Aaron Lun
I'll just throw in my two cents here.

I think many people underappreciate the benefits of moving to the latest 
version of R. If you inspect the R-devel NEWS file, there's a couple of 
nice fixes/features that a developer might want to take advantage of:

- sum() doesn't give NAs upon integer overflow anymore.
- New ...elt(n) and ...length() functions for dealing with ellipses.
- ALTREP support for 1:n sequences (wow!)
- zero length subassignment in a non-zero index fails correctly.

The previous 3.4.0 release also added support for more DLLs being loaded 
at once, which was otherwise causing headaches in workflows. And 3.4.2 
had a bug fix to LAPACK, which did result in a few user-level changes in 
some packages like edgeR. So there are considerable differences between 
the versions of R, especially if one is a package developer.

Enforcing version consistency avoids heartache during release and 
debugging. There's a choice between users getting annoyed about having 
to update R, and then updating R, and everything working as a result; or 
everyone (developers/users) wasting some time figuring out whether a bug 
in a package is due to the code in the package itself or the version of 
R. The brief annoyance in the first option is better than the chronic 
grief of the second option, especially given that the solution to the 
problem in the second option would be to update R anyway.

Personally, I haven't found it to be particularly difficult to update R, 
or to run R-devel in parallel with R 3.4, even without root privileges.

-Aaron

On 19/02/18 14:55, Kevin RUE wrote:
> Hi Alexey,
> 
> I do agree with you that there is no harm in testing against other version
> of R. In a way, that is even good practice, considering that many HPC users
> do not always have access to the latest version of R, and that Travis is
> making this fairly easy.
> 
> Now, with regard to your latest reply, I am wondering whether we're having
> confusion here between the "R≥x.x" requirement, and the version(s) of R
> that you use to develop/test your package (the version of R installed on
> your own machine).
> 
> First, I think the "R≥x.x" does not have an explicit rule.
> To me, the point of this requirement is to declare the oldest version of R
> that the package has been tested/validated for. This does not necessarily
> have to be the _next_ version of R (see the core Bioc package S4Vectors:
> https://bioconductor.org/packages/release/bioc/html/S4Vectors.html, and I
> am sure there are older requirements in other packages).
> Here, I think the decision here boils down to how far back in terms of R
> versions the developer is willing to support the package. I suppose one
> could state R≥2.3 if they're confident about it.
> 
> On a separate note, going back to the Bioc guideline that I initially
> highlighted ("Package authors should develop against the version of *R* that
> will be available to users when the *Bioconductor* devel branch becomes the
> *Bioconductor* release branch."), this rather refers to the forward-looking
> guideline that the cutting-edge version of any R package should be
> compatible with the cutting edge version of R, and that developers should
> be working with R-devel to ensure this.
> In other words, this only refers to the version of R that the developer
> should have installed on their own machine. It does not request users to
> make R-devel a _requirement_ of their package.
> 
> I hope this addresses your question better, and I am curious to hear if
> anyone else has an opinion or precisions to weigh in on this topic.
> 
> Best,
> Kevin
> 
> 
> On Mon, Feb 19, 2018 at 12:19 PM, Alexey Sergushichev 
> wrote:
> 
>> Hello Kevin,
>>
>> Well, bioc-devel packages are tested against bioc-devel (and R-3.5) in any
>> case. What I'm saying is that aside from testing the package against
>> bioc-devel, I can as well test against bioc-release too on my own. If the
>> package doesn't work with bioc-devel it shouldn't pass bioc-devel checks,
>> if the package is properly developed and has a good test coverage. So I see
>> no problem in allowing developers to test against other versions, on top of
>> developing against bioc-devel. And as it's only possible to install the
>> package from github and not from Bioconductor, the developer alone is
>> responsible for the package to work properly.
>>
>> I can't really see a scenario, where requiring R >= 3.5 helps to improve
>> the package quality.
>>
>>> A short-term workaround can be to create a git branch (e.g. "3.4").
>>
>> That's the way I'm doing too, but supporting two branches different only
>> in R version looks ridiculous and unnecessary.
>>
>> --
>> Alexey
>>
>>
>>
>>
>>
>> On Mon, Feb 19, 2018 at 12:48 PM, Kevin RUE  wrote:
>>
>>> Dear Alexey,
>>>
>>> The reason is somewhat implicitly given at https://www.bioconductor.or
>>> g/developers/how-to/useDevel/ :
>>> "Package authors should develop against the version of *R* 

Re: [Rd] readLines interaction with gsub different in R-dev

2018-02-19 Thread Tomas Kalibera

Thank you for the report and analysis. Now fixed in R-devel.
Tomas

On 02/17/2018 08:24 PM, William Dunlap via R-devel wrote:

I think the problem in R-devel happens when there are non-ASCII characters
in any
of the strings passed to gsub.

txt <- vapply(list(as.raw(c(0x41, 0x6d, 0xc3, 0xa9, 0x6c, 0x69, 0x65)),
as.raw(c(0x41, 0x6d, 0x65, 0x6c, 0x69, 0x61))), rawToChar, "")
txt
#[1] "Amélie" "Amelia"
Encoding(txt)
#[1] "unknown" "unknown"
gsub(perl=TRUE, "(\\w)(\\w)", "<\\L\\1\\U\\2>", txt)
#[1] "", txt[1])
#[1] "", txt[2])
#[1] ""

I can change the Encoding to "latin1" or "UTF-8" and get similar results
from gsub.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Feb 17, 2018 at 7:35 AM, Hugh Parsonage 
wrote:


| Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the
regexp
| you use wrong, ie isn't R-devel giving the correct answer?

No, I don't think R-devel is correct (or at least consistent with the
documentation). My interpretation of gsub("(\\w)", "\\U\\1", entry,
perl = TRUE) is "Take every word character and replace it with itself,
converted to uppercase."

Perhaps my example was too minimal. Consider the following:

R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
[1] "A"

R> gsub("(\\w)", "\\1", entry, perl = TRUE)
[1] "author: Amélie"   # OK, but very different to 'A', despite only
not specifying uppercase

R> gsub("(\\w)", "\\U\\1", "author: Amelie", perl = TRUE)
[1] "AUTHOR: AMELIE"  # OK, but very different to 'A',

R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE)
  "AUTHOR"  # Where did everything after the first group go?

I should note the following example too:
R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE)
[1] "AUTHOR: AMéLIE"  # latin1 encoding


A call to `readLines` (possibly `scan()` and `read.table` and friends)
is essential.




On 18 February 2018 at 02:15, Dirk Eddelbuettel  wrote:

On 17 February 2018 at 21:10, Hugh Parsonage wrote:
| I was told to re-raise this issue with R-dev:
|
| In the documentation of R-dev and R-3.4.3, under ?gsub
|
| > replacement
| >... For perl = TRUE only, it can also contain "\U" or "\L" to

convert the rest of the replacement to upper or lower case and "\E" to end
case conversion.

|
| However, the following code runs differently:
|
| tempf <- tempfile()
| writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
| entry <- readLines(tempf, encoding = "UTF-8")
| gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
|
|
| "AUTHOR: AMÉLIE"  # R-3.4.3
|
| "A"  # R-dev

Confirmed for R-devel (current) on Ubuntu 17.10.  But ... isn't the

regexp

you use wrong, ie isn't R-devel giving the correct answer?

R> tempf <- tempfile()
R> writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE)
R> entry <- readLines(tempf, encoding = "UTF-8")
R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
[1] "A"
R> gsub("(\\w+)", "\\U\\1", entry, perl = TRUE)
[1] "AUTHOR"
R> gsub("(.*)", "\\U\\1", entry, perl = TRUE)
[1] "AUTHOR: AMÉLIE"
R>

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Kevin RUE
Hi Alexey,

I do agree with you that there is no harm in testing against other version
of R. In a way, that is even good practice, considering that many HPC users
do not always have access to the latest version of R, and that Travis is
making this fairly easy.

Now, with regard to your latest reply, I am wondering whether we're having
confusion here between the "R≥x.x" requirement, and the version(s) of R
that you use to develop/test your package (the version of R installed on
your own machine).

First, I think the "R≥x.x" does not have an explicit rule.
To me, the point of this requirement is to declare the oldest version of R
that the package has been tested/validated for. This does not necessarily
have to be the _next_ version of R (see the core Bioc package S4Vectors:
https://bioconductor.org/packages/release/bioc/html/S4Vectors.html, and I
am sure there are older requirements in other packages).
Here, I think the decision here boils down to how far back in terms of R
versions the developer is willing to support the package. I suppose one
could state R≥2.3 if they're confident about it.

On a separate note, going back to the Bioc guideline that I initially
highlighted ("Package authors should develop against the version of *R* that
will be available to users when the *Bioconductor* devel branch becomes the
*Bioconductor* release branch."), this rather refers to the forward-looking
guideline that the cutting-edge version of any R package should be
compatible with the cutting edge version of R, and that developers should
be working with R-devel to ensure this.
In other words, this only refers to the version of R that the developer
should have installed on their own machine. It does not request users to
make R-devel a _requirement_ of their package.

I hope this addresses your question better, and I am curious to hear if
anyone else has an opinion or precisions to weigh in on this topic.

Best,
Kevin


On Mon, Feb 19, 2018 at 12:19 PM, Alexey Sergushichev 
wrote:

> Hello Kevin,
>
> Well, bioc-devel packages are tested against bioc-devel (and R-3.5) in any
> case. What I'm saying is that aside from testing the package against
> bioc-devel, I can as well test against bioc-release too on my own. If the
> package doesn't work with bioc-devel it shouldn't pass bioc-devel checks,
> if the package is properly developed and has a good test coverage. So I see
> no problem in allowing developers to test against other versions, on top of
> developing against bioc-devel. And as it's only possible to install the
> package from github and not from Bioconductor, the developer alone is
> responsible for the package to work properly.
>
> I can't really see a scenario, where requiring R >= 3.5 helps to improve
> the package quality.
>
> > A short-term workaround can be to create a git branch (e.g. "3.4").
>
> That's the way I'm doing too, but supporting two branches different only
> in R version looks ridiculous and unnecessary.
>
> --
> Alexey
>
>
>
>
>
> On Mon, Feb 19, 2018 at 12:48 PM, Kevin RUE  wrote:
>
>> Dear Alexey,
>>
>> The reason is somewhat implicitly given at https://www.bioconductor.or
>> g/developers/how-to/useDevel/ :
>> "Package authors should develop against the version of *R* that will be
>> available to users when the *Bioconductor* devel branch becomes the
>> *Bioconductor* release branch."
>>
>> In other words, developing against the _next_ version of R ensures that
>> all packages in development are tested in the environment where they will
>> be released to the general community. In particular, that environment
>> includes the latest devel version of all Bioconductor packages, that will
>> become their next release version.
>> If developers were allowed to develop and test their package in the
>> _current_ version of R, there is no guarantee that those packages would
>> still work when they are made available with the _next_ version of R (e.g.
>> if one of their dependencies is about to introduce some breaking changes).
>> That could cause all sorts of trouble in the first builds on the next
>> Bioconductor release, which is meant to be a place storing stable working
>> code.
>>
>> Overall, you will do yourself and your users a favor developing with the
>> _next_ version of R, as this is a forward-looking strategy, as explained
>> above. In contrast, the short-term benefit of developing with the _current_
>> version of R is largely outweighed by the risk of wasting time looking at
>> code that is about to be deprecated.
>>
>> A short-term workaround can be to create a git branch (e.g. "3.4"), where
>> the R version requirement is downgraded. Then, you can always keep
>> developing against R-devel on your master branch and back-port the more
>> recent commit to the "3.4" branch by typing "git rebase master 3.4" in your
>> shell.
>> A recent example of this situation can be found in the discussion here as
>> a branch to the original repository 

Re: [Bioc-devel] as.list of a GRanges

2018-02-19 Thread Michael Lawrence
On Mon, Feb 19, 2018 at 2:10 AM, Bernat Gel  wrote:

> Hi Hervé,
>
> I completely agree with the goal of having the semantics of list-like
> operations standardised and documented to avoid surprises, and if to do so,
> the current use of as.list must be changed I'm pefectly ok with that. I had
> not seen the strange behaviour with IRanges,


Just want to point out that it's important to keep in mind that many of our
users never use IRanges directly, so consistency is not an absolute
requirement.

so I was not aware of the problem.
>
> In any case, thanks for fixing (and simplifying) karyoploteR. In
> retrospective I don't know why I didn't use simple vectorization! So, thanks
>
>
> Bernat
>
> *Bernat Gel Moreno*
> Bioinformatician
>
> Hereditary Cancer Program
> Program of Predictive and Personalized Medicine of Cancer (PMPPC)
> Germans Trias i Pujol Research Institute (IGTP)
>
> Campus Can Ruti
> Carretera de Can Ruti, Camí de les Escoles s/n
> 08916 Badalona, Barcelona, Spain
>
> Tel: (+34) 93 554 3068
> Fax: (+34) 93 497 8654
> 08916 Badalona, Barcelona, Spain
> b...@igtp.cat 
> www.germanstrias.org 
>
> 
>
>
>
>
>
>
>
> El 02/17/2018 a las 04:19 AM, Hervé Pagès escribió:
>
>> Hi Bernat,
>>
>> On 02/15/2018 11:57 PM, Bernat Gel wrote:
>>
>>> Hi Hervé and others,
>>>
>>> Thanks for the responses.
>>>
>>> I woudn't call as.list() of a GRanges an "obscure behaviour" but more a
>>> "works as expected, even if not clearly documented" behaviour.
>>>
>>
>> Most users/developers will probably agree that as.list() worked
>> as expected on a GRanges object. But then they'll be surprised
>> and confused when they use it on an IRanges object and discover
>> that it does something completely different. The current effort
>> is to bring more consistency between GRanges and IRanges objects
>> and to have their list-like semantics aligned and documented so
>> there will be no more such surprise.
>>
>>
>>> In any case I can change the code to as(gr, "GRangesList") as suggested.
>>>
>>
>> I went ahead and fixed karyoploteR. This is karyoploteR 1.5.2. Make
>> sure to resync your GitHub repo by following the instructions here:
>>
>>
>> https://bioconductor.org/developers/how-to/git/sync-existing
>> -repositories/
>>
>> Note that the loop on the GRanges object (via the call to Map())
>> was not needed and could be replaced with a solution that uses
>> proper vectorization.
>>
>> Best,
>> H.
>>
>>
>>> Thanks again for the responses and discussion :)
>>>
>>> Bernat
>>>
>>>
>>> *Bernat Gel Moreno*
>>> Bioinformatician
>>>
>>> Hereditary Cancer Program
>>> Program of Predictive and Personalized Medicine of Cancer (PMPPC)
>>> Germans Trias i Pujol Research Institute (IGTP)
>>>
>>> Campus Can Ruti
>>> Carretera de Can Ruti, Camí de les Escoles s/n
>>> 08916 Badalona, Barcelona, Spain
>>>
>>> Tel: (+34) 93 554 3068
>>> Fax: (+34) 93 497 8654
>>> 08916 Badalona, Barcelona, Spain
>>> b...@igtp.cat 
>>> www.germanstrias.org >> .com/v2/url?u=http-3A__www.germanstrias.org_=DwMDaQ=e
>>> RAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJK
>>> aaPhzWA=Wwl42dL5uGJa8PR0aAcNnIN0t-uut5R2xLKBhl0ynV8=z45_
>>> PX78N6zLu1Bcn-mYQcyRortvXjNyQcWASriwsr0=>
>>>
>>> >> germanstrias.org_=DwMDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3X
>>> eAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Wwl42dL5uGJa8PR0aAcN
>>> nIN0t-uut5R2xLKBhl0ynV8=z45_PX78N6zLu1Bcn-mYQcyRortvXjNyQcWASriwsr0=>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> El 02/15/2018 a las 11:19 PM, Hervé Pagès escribió:
>>>
 On 02/15/2018 01:57 PM, Michael Lawrence wrote:

>
>
> On Thu, Feb 15, 2018 at 1:45 PM, Hervé Pagès  > wrote:
>
> On 02/15/2018 11:53 AM, Cook, Malcolm wrote:
>
> Hi,
>
> Can I ask, is this change under discussion in current release
> or
> so far in Bioconductor devel only (my assumption)?
>
>
> Bioconductor devel only.
>
>
>> On 02/15/2018 08:37 AM, Michael Lawrence wrote:
>> > So is as.list() no longer supported for GRanges objects?
> I have found it
>> > useful in places.
>>
>> Very few places. I found a dozen of them in the entire
> software repo.
>
> However there are probably more in the wild...
>
>
> What as.list() was doing on a GRanges object was not documented.
> Relying
> on some kind of obscure undocumented feature is never a good idea.
>
>
> There's just too much that is documented implicitly through inherited
> behaviors, or where we say things like "this data structure behaves as one
> would expect given base R". It's not fair to claim that those features 

Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Alexey Sergushichev
Hello Kevin,

Well, bioc-devel packages are tested against bioc-devel (and R-3.5) in any
case. What I'm saying is that aside from testing the package against
bioc-devel, I can as well test against bioc-release too on my own. If the
package doesn't work with bioc-devel it shouldn't pass bioc-devel checks,
if the package is properly developed and has a good test coverage. So I see
no problem in allowing developers to test against other versions, on top of
developing against bioc-devel. And as it's only possible to install the
package from github and not from Bioconductor, the developer alone is
responsible for the package to work properly.

I can't really see a scenario, where requiring R >= 3.5 helps to improve
the package quality.

> A short-term workaround can be to create a git branch (e.g. "3.4").

That's the way I'm doing too, but supporting two branches different only in
R version looks ridiculous and unnecessary.

--
Alexey





On Mon, Feb 19, 2018 at 12:48 PM, Kevin RUE  wrote:

> Dear Alexey,
>
> The reason is somewhat implicitly given at https://www.bioconductor.or
> g/developers/how-to/useDevel/ :
> "Package authors should develop against the version of *R* that will be
> available to users when the *Bioconductor* devel branch becomes the
> *Bioconductor* release branch."
>
> In other words, developing against the _next_ version of R ensures that
> all packages in development are tested in the environment where they will
> be released to the general community. In particular, that environment
> includes the latest devel version of all Bioconductor packages, that will
> become their next release version.
> If developers were allowed to develop and test their package in the
> _current_ version of R, there is no guarantee that those packages would
> still work when they are made available with the _next_ version of R (e.g.
> if one of their dependencies is about to introduce some breaking changes).
> That could cause all sorts of trouble in the first builds on the next
> Bioconductor release, which is meant to be a place storing stable working
> code.
>
> Overall, you will do yourself and your users a favor developing with the
> _next_ version of R, as this is a forward-looking strategy, as explained
> above. In contrast, the short-term benefit of developing with the _current_
> version of R is largely outweighed by the risk of wasting time looking at
> code that is about to be deprecated.
>
> A short-term workaround can be to create a git branch (e.g. "3.4"), where
> the R version requirement is downgraded. Then, you can always keep
> developing against R-devel on your master branch and back-port the more
> recent commit to the "3.4" branch by typing "git rebase master 3.4" in your
> shell.
> A recent example of this situation can be found in the discussion here as
> a branch to the original repository https://github.com/csoneson/iS
> EE/pull/124 and here as a fork https://github.com/mdshw5
> /iSEE/commit/6fb98192a635a6222491b66fb0474dc38f922495
>
> I hope this helps.
>
> Best wishes,
> Kevin
>
>
> On Mon, Feb 19, 2018 at 8:02 AM, Alexey Sergushichev 
> wrote:
>
>> Dear Bioconducotr community,
>>
>> I wonder, what is the reason behind requirement for dependency R >= 3.5
>> (currently) for new packages?
>>
>> As a developer I really want an installation of my package to be as easy
>> as
>> possible and want my package to be easily installed from github. So
>> currently, when I develop a package I put a R >= 3.4 in my DESCRIPTION and
>> test it using Travis against bioc-release. Then, before submission
>> to Bioconductor, I have to change R >= 3.4 dependency to R >= 3.5, so that
>> the package passes BiocCheck. However, most users don't have R-devel
>> installed, so they have R 3.4 in the best case, and for these users I
>> create another repository branch with R >= 3.4 dependency.
>>
>> Overall, it is quite bothersome and it doesn't really make sense to me to
>> to restrict potential users in this way. Am I the only one who have issues
>> with this? Am I missing something? Or may be this check could be removed?
>>
>> Best,
>> Alexey Sergushichev
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-19 Thread Tomas Kalibera


I think it is as Kevin described in an earlier response - the garbled 
output is because a UTF-8 encoded string is assumed to be native 
encoding (which happens not to be UTF-8 on the platform where this is 
observed) and converted again to UTF-8.


I think the documentation is consistent with the observed behavior


   tmp <- 'é'
   tmp <- iconv(tmp, to = 'UTF-8')
   print(Encoding(tmp))
   print(charToRaw(tmp))
   tmpfilepath <- tempfile()
   writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE)

[1] "UTF-8"
[1] c3 a9

Raw text as hex: c3 83 c2 a9
useBytes=TRUE in writeLines means that the UTF-8 string will be passed 
byte-by-byte to the connection. encoding="UTF-8" tells the connection to 
convert the bytes to UTF-8 (from native encoding). So the second step is 
converting a string which is assumed to be in native encoding, but in 
fact it is in UTF-8.


The documentation describes "useBytes=TRUE" as for expert use only, it 
can be useful for avoiding unnecessary conversions in some special 
cases, but one has then to make sure that no more conversions are 
attempted (so use "" as encoding of in "file", for instance). The long 
advice short would be to not use useBytes=TRUE with writeLines, but 
depend on the default behavior.


Tomas


On 02/17/2018 11:24 PM, Kevin Ushey wrote:

Of course, right after writing this e-mail I tested on my Windows
machine and did not see what I expected:


charToRaw(before)

[1] c3 a9

charToRaw(after)

[1] e9

so obviously I'm misunderstanding something as well.

Best,
Kevin

On Sat, Feb 17, 2018 at 2:19 PM, Kevin Ushey  wrote:

 From my understanding, translation is implied in this line of ?file (from the
Encoding section):

 The encoding of the input/output stream of a connection can be specified
 by name in the same way as it would be given to iconv: see that help page
 for how to find out what encoding names are recognized on your platform.
 Additionally, "" and "native.enc" both mean the ‘native’ encoding, that is
 the internal encoding of the current locale and hence no translation is
 done.

This is also hinted at in the documentation in ?readLines for its 'encoding'
argument, which has a different semantic meaning from the 'encoding' argument
as used with R connections:

 encoding to be assumed for input strings. It is used to mark character
 strings as known to be in Latin-1 or UTF-8: it is not used to re-encode
 the input. To do the latter, specify the encoding as part of the
 connection con or via options(encoding=): see the examples.

It might be useful to augment the documentation in ?file with something like:

 The 'encoding' argument is used to request the translation of strings when
 writing to a connection.

and, perhaps to further drive home the point about not translating when
encoding = "native.enc":

 Note that R will not attempt translation of strings when encoding is
 either "" or "native.enc" (the default, as per getOption("encoding")).
 This implies that attempting to write, for example, UTF-8 encoded content
 to a connection opened using "native.enc" will retain its original UTF-8
 encoding -- it will not be translated.

It is a bit surprising that 'native.enc' means "do not translate" rather than
"attempt translation to the encoding associated with the current locale", but
those are the semantics and they are not bound to change.

This is the code I used to convince myself of that case:

 conn <- file(tempfile(), encoding = "native.enc", open = "w+")

 before <- iconv('é', to = "UTF-8")
 cat(before, file = conn, sep = "\n")
 after <- readLines(conn)

 charToRaw(before)
 charToRaw(after)

with output:

 > charToRaw(before)
 [1] c3 a9
 > charToRaw(after)
 [1] c3 a9

Best,
Kevin


On Thu, Feb 15, 2018 at 9:16 AM, Ista Zahn  wrote:

On Thu, Feb 15, 2018 at 11:19 AM, Kevin Ushey  wrote:

I suspect your UTF-8 string is being stripped of its encoding before
write, and so assumed to be in the system native encoding, and then
re-encoded as UTF-8 when written to the file. You can see something
similar with:

 > tmp <- 'é'
 > tmp <- iconv(tmp, to = 'UTF-8')
 > Encoding(tmp) <- "unknown"
 > charToRaw(iconv(tmp, to = "UTF-8"))
 [1] c3 83 c2 a9

It's worth saying that:

 file(..., encoding = "UTF-8")

means "attempt to re-encode strings as UTF-8 when writing to this
file". However, if you already know your text is UTF-8, then you
likely want to avoid opening a connection that might attempt to
re-encode the input. Conversely (assuming I'm understanding the
documentation correctly)

 file(..., encoding = "native.enc")

means "assume that strings are in the native encoding, and hence
translation is unnecessary". Note that it does not mean "attempt to
translate strings to the native encoding".

If all that is true I think ?file needs some 

Re: [Bioc-devel] as.list of a GRanges

2018-02-19 Thread Bernat Gel

Hi Hervé,

I completely agree with the goal of having the semantics of list-like 
operations standardised and documented to avoid surprises, and if to do 
so, the current use of as.list must be changed I'm pefectly ok with 
that. I had not seen the strange behaviour with IRanges, so I was not 
aware of the problem.


In any case, thanks for fixing (and simplifying) karyoploteR. In 
retrospective I don't know why I didn't use simple vectorization! So, thanks



Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat 
www.germanstrias.org 









El 02/17/2018 a las 04:19 AM, Hervé Pagès escribió:

Hi Bernat,

On 02/15/2018 11:57 PM, Bernat Gel wrote:

Hi Hervé and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour" but more 
a "works as expected, even if not clearly documented" behaviour.


Most users/developers will probably agree that as.list() worked
as expected on a GRanges object. But then they'll be surprised
and confused when they use it on an IRanges object and discover
that it does something completely different. The current effort
is to bring more consistency between GRanges and IRanges objects
and to have their list-like semantics aligned and documented so
there will be no more such surprise.



In any case I can change the code to as(gr, "GRangesList") as suggested.


I went ahead and fixed karyoploteR. This is karyoploteR 1.5.2. Make
sure to resync your GitHub repo by following the instructions here:


https://bioconductor.org/developers/how-to/git/sync-existing-repositories/ 



Note that the loop on the GRanges object (via the call to Map())
was not needed and could be replaced with a solution that uses
proper vectorization.

Best,
H.



Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat 
www.germanstrias.org 
 



 









El 02/15/2018 a las 11:19 PM, Hervé Pagès escribió:

On 02/15/2018 01:57 PM, Michael Lawrence wrote:



On Thu, Feb 15, 2018 at 1:45 PM, Hervé Pagès > wrote:


    On 02/15/2018 11:53 AM, Cook, Malcolm wrote:

    Hi,

    Can I ask, is this change under discussion in current 
release or

    so far in Bioconductor devel only (my assumption)?


    Bioconductor devel only.


   > On 02/15/2018 08:37 AM, Michael Lawrence wrote:
   > > So is as.list() no longer supported for GRanges 
objects?

    I have found it
   > > useful in places.
   >
   > Very few places. I found a dozen of them in the entire
    software repo.

    However there are probably more in the wild...


    What as.list() was doing on a GRanges object was not 
documented. Relying

    on some kind of obscure undocumented feature is never a good idea.


There's just too much that is documented implicitly through 
inherited behaviors, or where we say things like "this data 
structure behaves as one would expect given base R". It's not fair 
to claim that those features are undocumented. Our documentation is 
not complete enough to use it as an excuse.


It's not fair to suggest that this is a widely used feature either.

I've identified all the places in the 1500 software packages where
this was used, and, as I said, there were very few places. BTW I
fixed most of them but my plan is to fix all of them. Some of the
code that is outside the Bioc package corpus might be affected but
it's fair to assume that this will be a very rare occurence. This can
be mitigated by temporary restoring as.list() on GRanges, with a
deprecation message, and wait 1 more devel cycle to replace it with
the new behavior. I chose to disable it for now, on purpose, so I can
identify packages that break (the build 

Re: [Bioc-devel] R version check in BiocChech

2018-02-19 Thread Kevin RUE
Dear Alexey,

The reason is somewhat implicitly given at
https://www.bioconductor.org/developers/how-to/useDevel/ :
"Package authors should develop against the version of *R* that will be
available to users when the *Bioconductor* devel branch becomes the
*Bioconductor* release branch."

In other words, developing against the _next_ version of R ensures that all
packages in development are tested in the environment where they will be
released to the general community. In particular, that environment includes
the latest devel version of all Bioconductor packages, that will become
their next release version.
If developers were allowed to develop and test their package in the
_current_ version of R, there is no guarantee that those packages would
still work when they are made available with the _next_ version of R (e.g.
if one of their dependencies is about to introduce some breaking changes).
That could cause all sorts of trouble in the first builds on the next
Bioconductor release, which is meant to be a place storing stable working
code.

Overall, you will do yourself and your users a favor developing with the
_next_ version of R, as this is a forward-looking strategy, as explained
above. In contrast, the short-term benefit of developing with the _current_
version of R is largely outweighed by the risk of wasting time looking at
code that is about to be deprecated.

A short-term workaround can be to create a git branch (e.g. "3.4"), where
the R version requirement is downgraded. Then, you can always keep
developing against R-devel on your master branch and back-port the more
recent commit to the "3.4" branch by typing "git rebase master 3.4" in your
shell.
A recent example of this situation can be found in the discussion here as a
branch to the original repository https://github.com/csoneson/iSEE/pull/124
and here as a fork
https://github.com/mdshw5/iSEE/commit/6fb98192a635a6222491b66fb0474dc38f922495

I hope this helps.

Best wishes,
Kevin


On Mon, Feb 19, 2018 at 8:02 AM, Alexey Sergushichev 
wrote:

> Dear Bioconducotr community,
>
> I wonder, what is the reason behind requirement for dependency R >= 3.5
> (currently) for new packages?
>
> As a developer I really want an installation of my package to be as easy as
> possible and want my package to be easily installed from github. So
> currently, when I develop a package I put a R >= 3.4 in my DESCRIPTION and
> test it using Travis against bioc-release. Then, before submission
> to Bioconductor, I have to change R >= 3.4 dependency to R >= 3.5, so that
> the package passes BiocCheck. However, most users don't have R-devel
> installed, so they have R 3.4 in the best case, and for these users I
> create another repository branch with R >= 3.4 dependency.
>
> Overall, it is quite bothersome and it doesn't really make sense to me to
> to restrict potential users in this way. Am I the only one who have issues
> with this? Am I missing something? Or may be this check could be removed?
>
> Best,
> Alexey Sergushichev
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] R version check in BiocChech

2018-02-19 Thread Alexey Sergushichev
Dear Bioconducotr community,

I wonder, what is the reason behind requirement for dependency R >= 3.5
(currently) for new packages?

As a developer I really want an installation of my package to be as easy as
possible and want my package to be easily installed from github. So
currently, when I develop a package I put a R >= 3.4 in my DESCRIPTION and
test it using Travis against bioc-release. Then, before submission
to Bioconductor, I have to change R >= 3.4 dependency to R >= 3.5, so that
the package passes BiocCheck. However, most users don't have R-devel
installed, so they have R 3.4 in the best case, and for these users I
create another repository branch with R >= 3.4 dependency.

Overall, it is quite bothersome and it doesn't really make sense to me to
to restrict potential users in this way. Am I the only one who have issues
with this? Am I missing something? Or may be this check could be removed?

Best,
Alexey Sergushichev

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel