Re: [Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)

2014-06-16 Thread Michael Stadler
Dear Dan, Martin and Nate,

Thank you for looking into it. I guess that is pointing to a problem
within bowtie.

It looks like the EXC_BAD_ACCESS you see on petty in ebwt.h is not
reproducible on the other Mac or Linux machines we tried. Is it possible
to run valgrind on petty? That may confirm/rule out if the memory
(de-)allocation issues reported on Linux are related.

I would like to submit a bug-report to the bowtie developers, but am
reluctant to do that without being able to reproduce the problem or test
potential fixed. I would have the options to go through Rbowtie build
cycles, but would have to rely on the assumption that petty will keep
hitting this hickup even with modified bowtie code. The minor
differences between bowtie 1.0.1 and bowtie 1.0.1-bug-312 argue against
that.

I am tempted to stay with the current situtation:
  - OS X before 10.9 needs to use Rbowtie = 1.4.4
(based on bowtie 1.0.1)
  - OS X 10.9 onwards and everything else uses Rbowtie = 1.4.5
(based on bowtie 1.0.1 /patched bugs-312).

Thanks again for your efforts,
Michael


On 14.06.2014 01:31, Dan Tenenbaum wrote:
 Hi Michael,
 
 
 
 - Original Message -
 From: Michael Stadler michael.stad...@fmi.ch
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel@r-project.org
 Sent: Friday, June 13, 2014 12:32:52 AM
 Subject: differences between petty and perceval (OS X 10.6.8 build machines 
 for release/devel)

 Hi Dan,

 I'm cc'ing the list; maybe somebody else has experienced differences
 between petty and perceval.

 Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty).

 Rbowtie release (1.4.5) and development (1.5.5) are virtually
 identical
 (only DESCRIPTION and NEWS differ).

 The development version builds without problems on perceval, but the
 release version fails on petty:
 http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html
 http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html

 The only difference I can make out from the node info pages is that
 perceval has an additional section on C++11 compiler that is
 lacking
 from petty's NodeInfo page.

 Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and
 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using
 llvm-gcc-4.2.

 Do you have any idea what else could be different between petty and
 perceval?
 
 Martin and Nate and I took a look at this. I managed to come up with a bowtie 
 command line that would reliably reproduce the segfault on petty.
 
 Then we ran that under gdb (and turned off compiler optimizations) and came 
 up with this, which may or may not help you:
 
 petty:vignettes biocbuild$ gdb --args 
 '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie'
  -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata 'doit/refsIndex/index' 
 'doit/SpliceMapTemp_876c378e20ac/25mers.map' 
 'doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted' 
 GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Mon Aug 15 16:03:10 UTC 
 2011)
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as x86_64-apple-darwin...Reading symbols for shared 
 libraries ... done
 
 (gdb) run
 Starting program: 
 /Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie 
 -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata doit/refsIndex/index 
 doit/SpliceMapTemp_876c378e20ac/25mers.map 
 doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted
 Reading symbols for shared libraries ++. done
 
 Program received signal EXC_BAD_ACCESS, Could not access memory.
 Reason: KERN_INVALID_ADDRESS at address: 0x23d0d92d
 [Switching to process 36144 thread 0x20f]
 0x000478b1 in Ebwtseqan::Stringseqan::SimpleTypeunsigned char, 
 seqan::_Dna, seqan::Allocvoid  ::rowL (this=0xbfffda10, l=@0xa300e14) at 
 ebwt.h:1816
 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp);
 (gdb) l
 1811inline int EbwtTStr::rowL(const SideLocus l) const {
 1812// Extract and return appropriate bit-pair
 1813#ifdef SIXTY4_FORMAT
 1814return (((uint64_t*)l.side(this-_ebwt))[l._by  3]  l._by 
  7)  2) + l._bp)  1))  3;
 1815#else
 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp);
 1817#endif
 1818}
 1819
 1820/**
 (gdb) p this -_ebwt
 $1 = (uint8_t *) 0x4804a00 \b2
 (gdb) p *this -_ebwt
 $2 = 8 '\b'
 (gdb) p l._by
 $3 = 45
 (gdb) p l.side 
 $4 = SideLocus::side(unsigned char const*) const
 (gdb) p l.side(this-_ebwt)
 $5 = (uint8_t *) 0x23d0d900 Address 0x23d0d900 out of bounds
 (gdb) p l.side(this-_ebwt)[l._by]
 Cannot access memory at address 0x23d0d92d
 (gdb) p this -_ebwt
 $6 = (uint8_t *) 0x4804a00 \b2
 (gdb) 
 
 Running 

[Bioc-devel] question about affy::plotLocation

2014-06-16 Thread Kristóf Jakab

Dear BiocDevelR!

I'm working lot with the excelent *affy package* of Rafael A. Irizarry, 
I find it very useful.


I have a bit strange experience with it's *plotLocation function*.
It seems, *I have to mirror Y coordinates* to plot properly.
Perhaps it's because the CEL file reading starts from the top, and 
plotting starts from the bottom.


I'm not sure if I'm rigtht, can you check, that I haven't made mistake?
If yes, I suggest a (simple) solution for this.

I attach two plot made from a GEO GSM CEL file (see script).
First I've plotted all gene name (ProbeSet) on the CEL file images, 
second I've plotted after mirroring the Y coordinates.
As you can see on the raw plotting there are points on chip name 
(printed by BioB spots).


I attach my plotting script too, and a potential correction for the 
affy::plotLocation. (I've tried it, it seems good.)


Yours sincerly:
Kristóf Jakab

I've linked 2 files to this email:
geo_testing_spot_locations_mirrored.png 
https://www.box.com/shared/ow3q5sn3fpmyz3u8w533(6.0 MB)Box 
https://www.box.com/thunderbirdhttps://www.box.com/shared/ow3q5sn3fpmyz3u8w533
geo_testing_spot_locations_raw.png 
https://www.box.com/shared/3sj9i3lpkixkq85qar0r(6.1 MB)Box 
https://www.box.com/thunderbirdhttps://www.box.com/shared/3sj9i3lpkixkq85qar0r
Mozilla Thunderbird http://www.getthunderbird.com makes it easy to 
share large files over email.


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] question about affy::plotLocation - scripts

2014-06-16 Thread Kristóf Jakab
It seems I can't send attachments, I copy the codes here.


test_plotLocation_affy.R

#!/usr/bin/env Rscript
#kristof.ja...@hegelab.org

# MAKE AFFYBATCH
#--
# download CEL file
library(GEOquery)
getGEOSuppFiles(GSM229005)

#--
# read CEL file
library(affy)
geoS - ReadAffy(filenames=paste(GSM229005,GSM229005.CEL.gz, sep=/))

# PLOTTING TO PNG
#--
# raw
png(filename=geo_testing_spot_locations_raw.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()

#--
# mirrored
png(filename=geo_testing_spot_locations_mirrored.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
   plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
   plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()


correction_for_plotLocation.R

plotLocation - function(x, col=green, pch=22, ...) {
   if (is.list(x)) {
 x - cbind(unlist(lapply(x, function(x) x[,1])),
unlist(lapply(x, function(x) x[,2])))
   }
   points(x[,1], 743-x[,2] # mirroring 744Ã---744 matrix, numbered from 0 to 743
  , pch=pch, col=col, ...)
}


On 06/16/2014 10:59 AM, Kristóf Jakab wrote:
 Dear BiocDevelR!

 I'm working lot with the excelent *affy package* of Rafael A. 
 Irizarry, I find it very useful.

 I have a bit strange experience with it's *plotLocation function*.
 It seems, *I have to mirror Y coordinates* to plot properly.
 Perhaps it's because the CEL file reading starts from the top, and 
 plotting starts from the bottom.

 I'm not sure if I'm rigtht, can you check, that I haven't made mistake?
 If yes, I suggest a (simple) solution for this.

 I attach two plot made from a GEO GSM CEL file (see script).
 First I've plotted all gene name (ProbeSet) on the CEL file images, 
 second I've plotted after mirroring the Y coordinates.
 As you can see on the raw plotting there are points on chip name 
 (printed by BioB spots).

 I attach my plotting script too, and a potential correction for the 
 affy::plotLocation. (I've tried it, it seems good.)

 Yours sincerly:
 Kristóf Jakab

 I've linked 2 files to this email:
 geo_testing_spot_locations_mirrored.png 
 https://www.box.com/shared/ow3q5sn3fpmyz3u8w533(6.0 MB)Box 
 https://www.box.com/thunderbirdhttps://www.box.com/shared/ow3q5sn3fpmyz3u8w533
  

 geo_testing_spot_locations_raw.png 
 https://www.box.com/shared/3sj9i3lpkixkq85qar0r(6.1 MB)Box 
 https://www.box.com/thunderbirdhttps://www.box.com/shared/3sj9i3lpkixkq85qar0r
  

 Mozilla Thunderbird http://www.getthunderbird.com makes it easy to 
 share large files over email.


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] question about affy::plotLocation - scripts

2014-06-16 Thread James W. MacDonald

Hi Kristóf,

On 6/16/2014 10:20 AM, Kristóf Jakab wrote:

It seems I can't send attachments, I copy the codes here.


test_plotLocation_affy.R

#!/usr/bin/env Rscript
#kristof.ja...@hegelab.org

# MAKE AFFYBATCH
#--
# download CEL file
library(GEOquery)
getGEOSuppFiles(GSM229005)

#--
# read CEL file
library(affy)
geoS - ReadAffy(filenames=paste(GSM229005,GSM229005.CEL.gz, sep=/))

# PLOTTING TO PNG
#--
# raw
png(filename=geo_testing_spot_locations_raw.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()

#--
# mirrored
png(filename=geo_testing_spot_locations_mirrored.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()


correction_for_plotLocation.R

plotLocation - function(x, col=green, pch=22, ...) {
if (is.list(x)) {
  x - cbind(unlist(lapply(x, function(x) x[,1])),
 unlist(lapply(x, function(x) x[,2])))
}
points(x[,1], 743-x[,2] # mirroring 744Ã---744 matrix, numbered from 0 to 
743
   , pch=pch, col=col, ...)
}


Thanks for pointing this out. It's apparent almost nobody ever uses this 
code, as it has been in the affy package since pretty much the beginning 
(2002), and you are the first to notice this.


Unfortunately, hard-coding the number of rows isn't the answer, since 
Affy arrays have different dimensions. Probably the best fix is to add 
an additional required argument 'affybatch' that we can use to extract 
the chip dimensions from.


Best,

Jim





On 06/16/2014 10:59 AM, Kristóf Jakab wrote:

Dear BiocDevelR!

I'm working lot with the excelent *affy package* of Rafael A.
Irizarry, I find it very useful.

I have a bit strange experience with it's *plotLocation function*.
It seems, *I have to mirror Y coordinates* to plot properly.
Perhaps it's because the CEL file reading starts from the top, and
plotting starts from the bottom.

I'm not sure if I'm rigtht, can you check, that I haven't made mistake?
If yes, I suggest a (simple) solution for this.

I attach two plot made from a GEO GSM CEL file (see script).
First I've plotted all gene name (ProbeSet) on the CEL file images,
second I've plotted after mirroring the Y coordinates.
As you can see on the raw plotting there are points on chip name
(printed by BioB spots).

I attach my plotting script too, and a potential correction for the
affy::plotLocation. (I've tried it, it seems good.)

Yours sincerly:
Kristóf Jakab

I've linked 2 files to this email:
geo_testing_spot_locations_mirrored.png
https://www.box.com/shared/ow3q5sn3fpmyz3u8w533(6.0 MB)Box
https://www.box.com/thunderbirdhttps://www.box.com/shared/ow3q5sn3fpmyz3u8w533

geo_testing_spot_locations_raw.png
https://www.box.com/shared/3sj9i3lpkixkq85qar0r(6.1 MB)Box
https://www.box.com/thunderbirdhttps://www.box.com/shared/3sj9i3lpkixkq85qar0r

Mozilla Thunderbird http://www.getthunderbird.com makes it easy to
share large files over email.



[[alternative HTML version deleted]]



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] filterVcf: why require a filter?

2014-06-16 Thread Michael Lawrence
Hi,

I was trying to use filterVcf just to filter a VCF by a range, via which
in ScanVcfParam, without any filters, but it failed with:

Error in filterVcf(tbx, genome = genome, destination = destination, ...,
(from #2) :
  no 'prefilters' or 'filters' specified

Why not allow identity, i.e., where the filter is inherent in the
restricted query?

Michael

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] evaluation of C post-increments changed in GCC 4.8.2

2014-06-16 Thread Robert Castelo

hi Nathaniel cc Dan,

thanks a lot for clearing up completely the entire story. I'm afraid 
that one or two cycles ago of our conversation i did a simple reply 
instead of a reply-all and the bioc-devel list wasn't included anymore 
in the recipients of these emails.


since what you say below sounds like a relevant piece of information for 
anyone working with C code i'm cc'ing the bioc-devel list again.


cheers,
robert.

On 6/16/14 11:15 PM, Nathaniel Hayden wrote:
Hi, Robert. You are correct. zin2 and petty failed to emit warnings 
for the problematic code. After some digging we discovered that for 
gcc, any optimization level above 0 prevents emission of the 
-Wsequence-point warning in this case. But the optimizations must stay 
for production code.


As a follow-up to the recommendations before about flags to use during 
package development, we have added content to the Package Guidelines 
page on our website: 
http://www.bioconductor.org/developers/package-guidelines/#c-code


The failure of some build machines to emit the warning under 
production conditions underscores the importance of the original 
recommendation to enable as many warnings as possible during development.


Thanks for bringing it up!
Nate

On Mon 16 Jun 2014 07:42:36 AM PDT, Robert Castelo wrote:


hi Nathaniel,

On 06/14/2014 01:01 AM, Nathaniel Hayden wrote:


Hi, Robert. You're welcome.

It sounds like something isn't happening, but you think it
should. Could you be more precise about what you expect to happen (the
conditions that *should* lead to the warning, but do not)? There are
lots of variables floating around:
- devel or release? (I see similar commits to devel and release so
unclear which I should look at; current devel version looks like it
fails before it has a chance to give the warning.)



yes, this was an unrelated error, which actually Dan warned me about
and for which i sent a fix yesterday. the situation i was describing
was occurring in both, devel and release, but both are fixed by now.



- it sounds like you're talking about a Mavericks machine in the Bioc
build system; can you confirm which one?

Both the devel and release Mavericks build machines use clang, and
both linux machines (zin1/zin2) use gcc with -Wall.



so, for instance, the release version from VariantFiltering 1.0.1 was
giving these warnings i was talking about:

Found the following significant warnings:
methods-WeightMatrix.c:256:19: warning: unsequenced modification and
access to 'q' [-Wunsequenced]
methods-WeightMatrix.c:638:17: warning: unsequenced modification and
access to 'q' [-Wunsequenced]

*only* in the R CMD check from 'morelia' and not from 'petty' or
'zin2', while all three machines in principle have the -Wall option
activated.

currently, because i submitted the fix, version 1.0.2 does not give
these warnings anymore. however, i have just committed a new version
to de release branch, 1.0.3, that has this problem back in line 256:

while ((*q++=tolower(*q)));

and should recreate the odd situation i saw, that only 'morelia' warns
about this line, but not 'petty' or 'zin2'.


thanks!
robert.





Thanks,
Nate

On 06/13/2014 12:54 AM, Robert Castelo wrote:


hi Nathaniel,

thanks for the very clear examples. after all, probably it is just my
package which may have this problem. one further question below..

On 06/12/2014 07:12 PM, Nathaniel Hayden wrote:


Hi, Robert. C++ is my area so I can't speak as knowledgeably about C,


[...]


I confirm that using your test file gcc 4.6.3 indeed warns about
unsequenced shenanigans with -Wall 'warning: operation on ‘p’ may be
undefined [-Wsequence-point]'. I would add it's also a good idea
during
the development cycle to use -Wextra and -pedantic flags. (You can
read
about them here:
http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html)



the strange things is that the only machine at the building pipeline
of BioC that warned about this in my package was the one running Mac
OSX Mavericks with gcc 4.8.2 and not also the Linux zin2 which is
running gcc 4.6.3

you can see it if the 1.0.1 version of VariantFiltering is still at
the check report.

anyway, i'll use those options during development and that should
avoid me this kind of problems in the future.

thanks!
robert.









___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] index.search

2014-06-16 Thread Martin Maechler
 Adrian Dușa dusa.adr...@unibuc.ro
 on Mon, 16 Jun 2014 08:33:59 +0300 writes:

 On Mon, Jun 16, 2014 at 6:37 AM, Gabriel Becker
 gmbec...@ucdavis.edu wrote:
 [...]  You can. This is valid R source, so the parser
 will understand it
 
 expr = parse(text= example(deMorgan, package=QCA,
 give.lines=TRUE))
 
 You can then evaluate some or all of that expression
 using either R's own eval package or, e.g. Hadley
 Wickham's evaluate package (for your particular usecase
 evaluate will be easier I think).

 Oh, I see...! In that case I can use it, of course.  Did
 install the evaluate package, although one would expect
 some better documentation (no examples at all, especially
 at the main evaluate function).


 [...]
 index.search is an unexported function, which means that
 it is subject to change in how it behaves without notice
 or even externally available reasons. You can get it via
 :::, but again, it's really not the right tool here, and
 not safe to use in general in code you expect to keep
 working.

 Yes, I figured that much.  Of course it's not meant to be
 used in any decently working code, but I learn heavily by
 simply looking at these sort of (hidden) R functions.

 Thanks again, Adrian

Apropos not the right tool.  I'm a bit astonished that nobody
mentioned the fact R already provides the tool to
automatically compare all example outputs with a previous
version (of the packages example outputs):

*THE* manual (every package writer should know about,
 re-read/browse about once a year, and search in for such questions):

Writing R Extensions, section Package subdirectories

(e.g. on the CRAN master in Vienna,
 http://cran.r-project.org/doc/manuals/R-exts.html#Package-subdirectories )
says

|If directory 'tests' has a subdirectory 'Examples' containing a file
|'PKG-Ex.Rout.save', this is compared to the output file for running the
|examples when the latter are checked. 

So: After an 'R CMD check PKG' you only need to take and
keep the  PKG-Ex.Rout  file that is produced (in the
PKG.Rcheck/ directory), and save it into PKG/tests/PKG-Ex.Rout.save
and from then on, every time you run R CMD check PKG  the
comparison will be made. 

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-16 Thread Adrian Dușa
Oh my... this is so simple, why didn't I think of that...?
Thanks a lot Martin, beautiful,
Adrian


On Mon, Jun 16, 2014 at 10:32 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Adrian Duºa dusa.adr...@unibuc.ro
 on Mon, 16 Jun 2014 08:33:59 +0300 writes:

  On Mon, Jun 16, 2014 at 6:37 AM, Gabriel Becker
  gmbec...@ucdavis.edu wrote:
  [...]  You can. This is valid R source, so the parser
  will understand it
 
  expr = parse(text= example(deMorgan, package=QCA,
  give.lines=TRUE))
 
  You can then evaluate some or all of that expression
  using either R's own eval package or, e.g. Hadley
  Wickham's evaluate package (for your particular usecase
  evaluate will be easier I think).

  Oh, I see...! In that case I can use it, of course.  Did
  install the evaluate package, although one would expect
  some better documentation (no examples at all, especially
  at the main evaluate function).


  [...]
  index.search is an unexported function, which means that
  it is subject to change in how it behaves without notice
  or even externally available reasons. You can get it via
  :::, but again, it's really not the right tool here, and
  not safe to use in general in code you expect to keep
  working.

  Yes, I figured that much.  Of course it's not meant to be
  used in any decently working code, but I learn heavily by
  simply looking at these sort of (hidden) R functions.

  Thanks again, Adrian

 Apropos not the right tool.  I'm a bit astonished that nobody
 mentioned the fact R already provides the tool to
 automatically compare all example outputs with a previous
 version (of the packages example outputs):

 *THE* manual (every package writer should know about,
  re-read/browse about once a year, and search in for such questions):

 Writing R Extensions, section Package subdirectories
 
 (e.g. on the CRAN master in Vienna,
  http://cran.r-project.org/doc/manuals/R-exts.html#Package-subdirectories )
 says

 |If directory 'tests' has a subdirectory 'Examples' containing a file
 |'PKG-Ex.Rout.save', this is compared to the output file for running the
 |examples when the latter are checked.

 So: After an 'R CMD check PKG' you only need to take and
 keep the  PKG-Ex.Rout  file that is produced (in the
 PKG.Rcheck/ directory), and save it into PKG/tests/PKG-Ex.Rout.save
 and from then on, every time you run R CMD check PKG  the
 comparison will be made.

 Martin Maechler, ETH Zurich



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-16 Thread Brian Lee Yung Rowe
Thanks for the great insight. I love that there's always something else to 
learn in R. 

•••••
Brian Lee Yung Rowe
Founder, Zato Novo
Professor, M.S. Data Analytics, CUNY

On Jun 16, 2014, at 3:34 AM, Martin Maechler maech...@stat.math.ethz.ch wrote:

 Adrian Dușa dusa.adr...@unibuc.ro
on Mon, 16 Jun 2014 08:33:59 +0300 writes:
 
 On Mon, Jun 16, 2014 at 6:37 AM, Gabriel Becker
 gmbec...@ucdavis.edu wrote:
 [...]  You can. This is valid R source, so the parser
 will understand it
 
 expr = parse(text= example(deMorgan, package=QCA,
 give.lines=TRUE))
 
 You can then evaluate some or all of that expression
 using either R's own eval package or, e.g. Hadley
 Wickham's evaluate package (for your particular usecase
 evaluate will be easier I think).
 
 Oh, I see...! In that case I can use it, of course.  Did
 install the evaluate package, although one would expect
 some better documentation (no examples at all, especially
 at the main evaluate function).
 
 
 [...]
 index.search is an unexported function, which means that
 it is subject to change in how it behaves without notice
 or even externally available reasons. You can get it via
 :::, but again, it's really not the right tool here, and
 not safe to use in general in code you expect to keep
 working.
 
 Yes, I figured that much.  Of course it's not meant to be
 used in any decently working code, but I learn heavily by
 simply looking at these sort of (hidden) R functions.
 
 Thanks again, Adrian
 
 Apropos not the right tool.  I'm a bit astonished that nobody
 mentioned the fact R already provides the tool to
 automatically compare all example outputs with a previous
 version (of the packages example outputs):
 
 *THE* manual (every package writer should know about,
 re-read/browse about once a year, and search in for such questions):
 
 Writing R Extensions, section Package subdirectories
 
 (e.g. on the CRAN master in Vienna,
 http://cran.r-project.org/doc/manuals/R-exts.html#Package-subdirectories )
 says
 
 |If directory 'tests' has a subdirectory 'Examples' containing a file
 |'PKG-Ex.Rout.save', this is compared to the output file for running the
 |examples when the latter are checked. 
 
 So: After an 'R CMD check PKG' you only need to take and
 keep the  PKG-Ex.Rout  file that is produced (in the
 PKG.Rcheck/ directory), and save it into PKG/tests/PKG-Ex.Rout.save
 and from then on, every time you run R CMD check PKG  the
 comparison will be made. 
 
 Martin Maechler, ETH Zurich
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-16 Thread Duncan Murdoch

On 16/06/2014 3:32 AM, Martin Maechler wrote:

 Adrian Dușa dusa.adr...@unibuc.ro
 on Mon, 16 Jun 2014 08:33:59 +0300 writes:

  On Mon, Jun 16, 2014 at 6:37 AM, Gabriel Becker
  gmbec...@ucdavis.edu wrote:
  [...]  You can. This is valid R source, so the parser
  will understand it
 
  expr = parse(text= example(deMorgan, package=QCA,
  give.lines=TRUE))
 
  You can then evaluate some or all of that expression
  using either R's own eval package or, e.g. Hadley
  Wickham's evaluate package (for your particular usecase
  evaluate will be easier I think).

  Oh, I see...! In that case I can use it, of course.  Did
  install the evaluate package, although one would expect
  some better documentation (no examples at all, especially
  at the main evaluate function).


  [...]
  index.search is an unexported function, which means that
  it is subject to change in how it behaves without notice
  or even externally available reasons. You can get it via
  :::, but again, it's really not the right tool here, and
  not safe to use in general in code you expect to keep
  working.

  Yes, I figured that much.  Of course it's not meant to be
  used in any decently working code, but I learn heavily by
  simply looking at these sort of (hidden) R functions.

  Thanks again, Adrian

Apropos not the right tool.  I'm a bit astonished that nobody
mentioned the fact R already provides the tool to
automatically compare all example outputs with a previous
version (of the packages example outputs):

*THE* manual (every package writer should know about,
  re-read/browse about once a year, and search in for such questions):

Writing R Extensions, section Package subdirectories

(e.g. on the CRAN master in Vienna,
  http://cran.r-project.org/doc/manuals/R-exts.html#Package-subdirectories )
says

|If directory 'tests' has a subdirectory 'Examples' containing a file
|'PKG-Ex.Rout.save', this is compared to the output file for running the
|examples when the latter are checked.

So: After an 'R CMD check PKG' you only need to take and
keep the  PKG-Ex.Rout  file that is produced (in the
PKG.Rcheck/ directory), and save it into PKG/tests/PKG-Ex.Rout.save
and from then on, every time you run R CMD check PKG  the
comparison will be made.


It's also worth mentioning that there is something similar to test for 
changes to vignettes:


If there is a target output file .Rout.save in the vignette source 
directory, the output from running the code in that vignette is 
compared with the target output file and any differences are reported 
(but not recorded in the log file). 


The slightly surprising thing is that R CMD check doesn't produce 
vignette.Rout; the file that is compared to vignette.Rout.save is 
vignette.log.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-16 Thread Adrian Dușa
On Mon, Jun 16, 2014 at 10:32 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 [...]

 Apropos not the right tool.  I'm a bit astonished that nobody
 mentioned the fact R already provides the tool to
 automatically compare all example outputs with a previous
 version (of the packages example outputs):

As appealing as this is, while trying to figure out a solution of my
own (until Martin's email), I think I've succeeded in creating a
rather useful function which allows fine grained control over each and
every line of code in the examples sections:

#
helpfiles - c(
allExpressions,
calibrate,
createMatrix,
deMorgan,
demoChart,
eqmcc,
factorize,
findSubsets,
findSupersets,
findTh,
getRow,
pof,
solveChart,
superSubset,
truthTable
)

testQCAmaybe - function() {
results - vector(mode=list, length=length(helpfiles))
names(results) - helpfiles

for (i in seq(length(helpfiles))) {
Rdfile - file.path(find.package(QCA), paste(helpfiles[i],
.Rd, sep=))
commands - parse(text=capture.output(tools::Rd2ex(Rdfile)))

results[[i]] - vector(mode=list, length=length(commands))
names(results[[i]]) - commands
for (j in seq(length(commands))) {
results[[i]][[j]] -
suppressWarnings(capture.output(eval(commands[j])))
}
}
return(results)
}
#

Using all.equal(), over the entire list or sequentially over parts of
it quickly identifies sources of difference.

I hope this helps anyone,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] what is the current correct repos structure for mac osx binaries?

2014-06-16 Thread Skye Bender-deMoll

Dear R-devel,

Apologies for the confusing typo in the reported paths my previous 
question, thanks to Simon for providing the answer that the default 
repository type on the mac is now mac.binary.mavericks not 
mac.binary as the docs for install.packages state.


Perhaps the docs for install packages could be updated something like:


...
type  character, indicating the type of package to download and install

Possible values are (currently) source, mac.binary.BUILD_NAME and 
win.binary. The BUILD_NAME on OSX is determined internally by ???.

...


I'm still not quite clear how the CRAN-like repository should be 
structured for OSX.  CRAN seems to include .tgz packages in both


http://cran.r-project.org/bin/macosx/contrib/3.1/

and

http://cran.r-project.org/bin/macosx/mavericks/contrib/3.1/

The directory contents are not identical, but both include packages 
built as recently as today.  Is bin/macosx/contrib/3.1/  a snowleopard 
build?  Do I need to maintain two directories as well?


It seems like if I put my packages in

http://foo/bin/macosx/contrib/3.1/

the mavericks machines won't find them.  But if I put the packages in

http://foo/bin/macosx/mavericks/contrib/3.1/

people with the snowleopard build wont find them.  Perhaps this is the 
desired behavior if the mavericks binaries are not snowleopard compatible?


thanks again for your help,
 -skye





On 06/13/2014 05:22 PM, Simon Urbanek wrote:


On Jun 13, 2014, at 5:41 PM, Skye Bender-deMoll skyeb...@skyeome.net wrote:


Dear R-developers,

As part of our package building process, we maintain internal CRAN-like 
repositories of our packages.  This has worked pretty well, but we are running 
into issues with R 3.1 and OSX mavericks.

Specifically, machines with osx mavericks seem to, by default, expect packages 
to be located under a 'mavericks' sub-directory, but this is not the location 
reported when generating a mac.binary appropriate contrib url.


contrib.url('foo')

[1] foo/bin/macosx/mavericks/contrib/3.1/


If I ask where the mac binaries are on a linux machine (AND on mac mavericks 
machines) I get


contrib.url('foo',type='mac.binary')

[1] foo/bin/macosx/mavericks/contrib/3.1/



I don't think that is true. On all machines (Linux, OS X, ...) I get


contrib.url('foo', type='mac.binary')

[1] foo/bin/macosx/contrib/3.1


Note that the type for the mavericks build is mac.binary.mavericks, so on all 
machines you also get


contrib.url('foo',type='mac.binary.mavericks')

[1] foo/bin/macosx/mavericks/contrib/3.1

The only difference are the defaults for pkgType - they differ by the build, 
but the repo structure is fixed and consistent across all platforms.

Cheers,
Simon




But the OSX machine gives an error and fails to locate the packages if they are 
located at foo/bin/macosx/contrib/3.1/

So where are the mac binaries supposed to located in a CRAN-like repository so 
that they can be installed on a mac with the default install command?  And is 
there a way for a non-mac machine (i.e. our linux deploy server) to determine 
that directory other than contrib.url(,type='mac.binary) ?

thanks for your help,
-skye

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] model.frame and parent environment

2014-06-16 Thread Therneau, Terry M., Ph.D.
Someone has reported a problem with predict.coxph that I can't seem to solve.  The 
underlying issue is with model.frame.coxph; the same issue is also found in lm so I'll use 
that for the example.


--

 test - data.frame(y = 1:10 + runif(10), x=1:10)

 myfun - function(formula, nd) {
fit - lm(formula, data=nd, model=FALSE)
model.frame(fit)
}

 myfun(test)
Error in is.data.frame(data): object nd not found



1. The key line, in both model.frame.coxph and model.frame.lm is
eval(fcall, env, parent.frame())

and it appear (at least to me) that the parent.frame() part of this is effectively ignored 
when fcall is itself a reference to model.frame.  I'd like to understand this better.



2. The modeling functions coxph and survreg in the survival default to model=FALSE, 
originally in mimicry of lm and glm; I don't know when R changed the default to model=TRUE 
for lm and glm.  One possible response to my question would be advice to change my 
routine's defaults too.  I'm somewhat reluctant since I work with a few very large data 
sets, but would entertain that discussion as well.   I'd still like to understand how 
model.frame could be made to work under the current regimen.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] what is the current correct repos structure for mac osx binaries?

2014-06-16 Thread Simon Urbanek

On Jun 16, 2014, at 1:18 PM, Skye Bender-deMoll skyeb...@skyeome.net wrote:

 Dear R-devel,
 
 Apologies for the confusing typo in the reported paths my previous question, 
 thanks to Simon for providing the answer that the default repository type on 
 the mac is now mac.binary.mavericks not mac.binary as the docs for 
 install.packages state.
 

That is incorrect. The default varies by the distribution you use. For the 
regular binary based on 10.6+ it is mac.binary. For the special Mavericks 
distribution it is mac.binary.mavericks. 


 Perhaps the docs for install packages could be updated something like:
 
 
 ...
 type  character, indicating the type of package to download and install
 
 Possible values are (currently) source, mac.binary.BUILD_NAME and 
 win.binary. The BUILD_NAME on OSX is determined internally by ???.
 ...
 
 
 I'm still not quite clear how the CRAN-like repository should be structured 
 for OSX.  CRAN seems to include .tgz packages in both
 
 http://cran.r-project.org/bin/macosx/contrib/3.1/
 
 and
 
 http://cran.r-project.org/bin/macosx/mavericks/contrib/3.1/
 
 The directory contents are not identical, but both include packages built as 
 recently as today.  Is bin/macosx/contrib/3.1/  a snowleopard build?  Do I 
 need to maintain two directories as well?
 
 It seems like if I put my packages in
 
 http://foo/bin/macosx/contrib/3.1/
 
 the mavericks machines won't find them.  But if I put the packages in
 
 http://foo/bin/macosx/mavericks/contrib/3.1/
 
 people with the snowleopard build wont find them.  Perhaps this is the 
 desired behavior if the mavericks binaries are not snowleopard compatible?
 

Yes, Mavericks build is incompatible with the Snow Leopard build, that's why 
there are two separate distributions and two separate repositories.

Cheers,
Simon



 thanks again for your help,
 -skye
 
 
 
 
 
 On 06/13/2014 05:22 PM, Simon Urbanek wrote:
 
 On Jun 13, 2014, at 5:41 PM, Skye Bender-deMoll skyeb...@skyeome.net wrote:
 
 Dear R-developers,
 
 As part of our package building process, we maintain internal CRAN-like 
 repositories of our packages.  This has worked pretty well, but we are 
 running into issues with R 3.1 and OSX mavericks.
 
 Specifically, machines with osx mavericks seem to, by default, expect 
 packages to be located under a 'mavericks' sub-directory, but this is not 
 the location reported when generating a mac.binary appropriate contrib url.
 
 contrib.url('foo')
 [1] foo/bin/macosx/mavericks/contrib/3.1/
 
 
 If I ask where the mac binaries are on a linux machine (AND on mac 
 mavericks machines) I get
 
 contrib.url('foo',type='mac.binary')
 [1] foo/bin/macosx/mavericks/contrib/3.1/
 
 
 I don't think that is true. On all machines (Linux, OS X, ...) I get
 
 contrib.url('foo', type='mac.binary')
 [1] foo/bin/macosx/contrib/3.1
 
 
 Note that the type for the mavericks build is mac.binary.mavericks, so on 
 all machines you also get
 
 contrib.url('foo',type='mac.binary.mavericks')
 [1] foo/bin/macosx/mavericks/contrib/3.1
 
 The only difference are the defaults for pkgType - they differ by the build, 
 but the repo structure is fixed and consistent across all platforms.
 
 Cheers,
 Simon
 
 
 
 But the OSX machine gives an error and fails to locate the packages if they 
 are located at foo/bin/macosx/contrib/3.1/
 
 So where are the mac binaries supposed to located in a CRAN-like repository 
 so that they can be installed on a mac with the default install command?  
 And is there a way for a non-mac machine (i.e. our linux deploy server) to 
 determine that directory other than contrib.url(,type='mac.binary) ?
 
 thanks for your help,
 -skye
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.frame and parent environment

2014-06-16 Thread Prof Brian Ripley

On 16/06/2014 19:35, Therneau, Terry M., Ph.D. wrote:

Someone has reported a problem with predict.coxph that I can't seem to
solve.  The underlying issue is with model.frame.coxph; the same issue
is also found in lm so I'll use that for the example.

--

  test - data.frame(y = 1:10 + runif(10), x=1:10)

  myfun - function(formula, nd) {
 fit - lm(formula, data=nd, model=FALSE)
 model.frame(fit)
 }

  myfun(test)
Error in is.data.frame(data): object nd not found


You have specified formula = test and given no value for nd.  Is that 
really what you intended?  It is undocumented that it works for lm().






1. The key line, in both model.frame.coxph and model.frame.lm is
 eval(fcall, env, parent.frame())

and it appear (at least to me) that the parent.frame() part of this is
effectively ignored when fcall is itself a reference to model.frame.
I'd like to understand this better.


Way back (ca R 1.2.0) an advocate of lexical scoping changed 
model.frame.lm to refer to an environment not a data frame for 'env'. 
That pretty fundamental change means that your sort of example is not a 
recommended way to do this: you are mixing scoping models.



2. The modeling functions coxph and survreg in the survival default to
model=FALSE, originally in mimicry of lm and glm; I don't know when R

 changed the default to model=TRUE for lm and glm.  One possible response

I am not sure R ever did: model = TRUE was the default 16 years ago at 
the beginning of the CVS/SVN archive.



to my question would be advice to change my routine's defaults too.  I'm
somewhat reluctant since I work with a few very large data sets, but
would entertain that discussion as well.   I'd still like to understand
how model.frame could be made to work under the current regimen.


For smaller problems using model = TRUE is the most robust solution.  As 
the components of the model frame can be changed after fitting, there is 
no way to guarantee to recreate the model frame, so to be sure you need 
to store it.


If you called myfun(y ~ x, test) it will look for 'nd' in the global 
environment, the environment of the formula.  One way to get that to 
work more often is something like


myfun - function(formula, nd) {
 qnd - substitute(nd)
 fit - lm(formula, data=nd, model=FALSE)
 fit$call$data - qnd
 model.frame(fit)
}

so it looks for the value of 'nd' instead.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.frame and parent environment

2014-06-16 Thread Therneau, Terry M., Ph.D.
I had a typo in the prior example when transcribing from R to the message, now corrected 
below.  (The call to myfun).

  My apologies for that.  Corrected message below.

Someone has reported a problem with predict.coxph that I can't seem to solve.  
The
underlying issue is with model.frame.coxph; the same issue is also found in lm 
so I'll use
that for the example.

--


test - data.frame(y = 1:10 + runif(10), x=1:10)



myfun - function(formula, nd) {

fit - lm(formula, data=nd, model=FALSE)
model.frame(fit)
}


myfun( y~x, test)

Error in is.data.frame(data): object nd not found



1. The key line, in both model.frame.coxph and model.frame.lm is
eval(fcall, env, parent.frame())

and it appear (at least to me) that the parent.frame() part of this is 
effectively ignored
when fcall is itself a reference to model.frame.  I'd like to understand this 
better.


2. The modeling functions coxph and survreg in the survival default to 
model=FALSE,
originally in mimicry of lm and glm; I don't know when R changed the default to 
model=TRUE
for lm and glm.  One possible response to my question would be advice to change 
my
routine's defaults too.  I'm somewhat reluctant since I work with a few very 
large data
sets, but would entertain that discussion as well.   I'd still like to 
understand how
model.frame could be made to work under the current regimen.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel