Re: [Rd] Where to drop a python script?

2013-10-30 Thread Paul Gilbert
The old convention was that it went in the exec/ directory, but as you 
can see at 
http://cran.at.r-project.org/doc/manuals/r-devel/R-exts.html#Non_002dR-scripts-in-packages 
 it can be in inst/anyName/. A minor convenience of exec/ is that the 
directory has the same name in source and when installed, whereas 
inst/anyName gets moved to anyName/, so debugging can be a tiny bit 
easier with exec/.


Having just put a package (TSjson) on CRAN with a python script, here 
are a few other pointers for getting it on CRAN:


-SystemRequirements: should indicate if a particular version of python 
is needed, and any non-default modules that are needed. (My package does 
not work with Python 3 because some modules are not available.) Some of 
the libraries have changed, so it could be a bit tricky to make 
something work easily with both 2 and 3.


-You need a README to explain how to install Python. (If you look at or 
use mine, please let me know if you find problems.)


-The Linux and Sun CRAN test machines have Python 2 whereas winbuilder 
has Python 3. Be prepared to explain that the package will not work on 
one or the other.


Another option to system() is pipe()

Paul

On 13-10-30 03:15 PM, Dirk Eddelbuettel wrote:


On 30 October 2013 at 13:54, Jonathan Greenberg wrote:
| R-developers:
|
| I have a small python script that I'd like to include in an R package I'm
| developing, but I'm a bit unclear about which subfolder it should go in.  R
| will be calling the script via a system() call.  Thanks!

Up to you as you control the path. As "Writing R Extensions" explains,
everything below the (source) directory inst/ will get installed.  I like
inst/extScripts/ (or similar) as it denotes that it is an external script.

As an example, the gdata package has Perl code for xls reading/writing below a
directory inst/perl/ -- and I think there are more packages doing this.

Dirk




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Huge performance difference between implicit and explicit print

2013-10-30 Thread Gabriel Becker
Hadley,

As far as I can tell from a quick look, it is because implicit printing
uses a different mechanism which does a fair bit more work.

>From comments in  print.c in the R sources:

*  print.default()  -> do_printdefault (with call tree below)
 *
 *  auto-printing   ->  PrintValueEnv
 *  -> PrintValueRec
 *  -> call print() for objects
 *  Note that auto-printing does not call print.default.
 *  PrintValue, R_PV are similar to auto-printing.

PrintValueEnv includes, among other things, checks for functions, S4
objects, and s3 objects before constructing (in C code) an R call to print
for S3 objects and show for S4 objects  and evaluating it using Rf_eval. So
there is an extra trip to the R evaluator.

I imagine that extra work is where the hangup is but that is a
slightly-informed guess as I haven't done any detailed timings or checks.

Basically my understanding of the processes is as follows:

print(df)
print call is evaluated, S3 dispatch happens, print.default in C is called,
result printed to terminal, print call returns

df
expression "df" evaluated, auto-print initiated, type of object returned by
expression is determined, print call is constructed in C code, print call
is evaluated in C code, THEN all the stuff above happens.

I dunno if that helps or not as I can't speak to how to change/fix it atm.

~G



On Wed, Oct 30, 2013 at 3:22 PM, Hadley Wickham  wrote:

> Hi all,
>
> Can anyone help me understand why an implicit print (i.e. just typing
> df at the console), is so much slower than an explicit print (i.e.
> print(df)) in the example below?  I see the difference in both Rstudio
> and in a terminal.
>
> # Construct large df as quickly as possible
> dummy <- 1:18e6
> df <- lapply(1:10, function(x) dummy)
> names(df) <- letters[1:10]
> class(df) <- c("myobj", "data.frame")
> attr(df, "row.names") <- .set_row_names(18e6)
>
> print.myobj <- function(x, ...) {
>   print.data.frame(head(x, 2))
> }
>
> start <- proc.time(); df; flush.console(); proc.time() - start
> #  user  system elapsed
> # 0.408   0.557   0.965
> start <- proc.time(); print(df); flush.console(); proc.time() - start
> #  user  system elapsed
> # 0.019   0.002   0.020
>
> sessionInfo()
> # R version 3.0.2 (2013-09-25)
> # Platform: x86_64-apple-darwin10.8.0 (64-bit)
> #
> # locale:
> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> #
> # attached base packages:
> # [1] stats graphics  grDevices utils datasets  methods   base
>
> Thanks!
>
> Hadley
>
> --
> Chief Scientist, RStudio
> http://had.co.nz/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Huge performance difference between implicit and explicit print

2013-10-30 Thread Gabor Grothendieck
On Wed, Oct 30, 2013 at 6:22 PM, Hadley Wickham  wrote:
> Hi all,
>
> Can anyone help me understand why an implicit print (i.e. just typing
> df at the console), is so much slower than an explicit print (i.e.
> print(df)) in the example below?  I see the difference in both Rstudio
> and in a terminal.
>
> # Construct large df as quickly as possible
> dummy <- 1:18e6
> df <- lapply(1:10, function(x) dummy)
> names(df) <- letters[1:10]
> class(df) <- c("myobj", "data.frame")
> attr(df, "row.names") <- .set_row_names(18e6)
>
> print.myobj <- function(x, ...) {
>   print.data.frame(head(x, 2))
> }
>
> start <- proc.time(); df; flush.console(); proc.time() - start
> #  user  system elapsed
> # 0.408   0.557   0.965
> start <- proc.time(); print(df); flush.console(); proc.time() - start
> #  user  system elapsed
> # 0.019   0.002   0.020

If I change print(df) to print.data.frame(df) it hangs.

R version 3.0.2 Patched (2013-10-06 r64031) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Huge performance difference between implicit and explicit print

2013-10-30 Thread Hadley Wickham
Hi all,

Can anyone help me understand why an implicit print (i.e. just typing
df at the console), is so much slower than an explicit print (i.e.
print(df)) in the example below?  I see the difference in both Rstudio
and in a terminal.

# Construct large df as quickly as possible
dummy <- 1:18e6
df <- lapply(1:10, function(x) dummy)
names(df) <- letters[1:10]
class(df) <- c("myobj", "data.frame")
attr(df, "row.names") <- .set_row_names(18e6)

print.myobj <- function(x, ...) {
  print.data.frame(head(x, 2))
}

start <- proc.time(); df; flush.console(); proc.time() - start
#  user  system elapsed
# 0.408   0.557   0.965
start <- proc.time(); print(df); flush.console(); proc.time() - start
#  user  system elapsed
# 0.019   0.002   0.020

sessionInfo()
# R version 3.0.2 (2013-09-25)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)
#
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base

Thanks!

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where to drop a python script?

2013-10-30 Thread Marc Schwartz

On Oct 30, 2013, at 1:54 PM, Jonathan Greenberg  wrote:

> R-developers:
> 
> I have a small python script that I'd like to include in an R package I'm
> developing, but I'm a bit unclear about which subfolder it should go in.  R
> will be calling the script via a system() call.  Thanks!
> 
> --j


See Writing R Extensions Manual, section 1.1.7:

  
http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Non_002dR-scripts-in-packages

If you want to see a package example, my WriteXLS package uses Perl, but the 
concepts will be the same:

  https://github.com/marcschwartz/WriteXLS

If you look at WriteXLS.R around line 130, you can see an example of getting 
the $PATH to the included Perl scripts that I use, which are in the 'inst/Perl' 
folder. Further down around line 230, is where the script is called via 
system(). Note the use of shQuote() for some arguments.

Regards,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where to drop a python script?

2013-10-30 Thread Dirk Eddelbuettel

On 30 October 2013 at 13:54, Jonathan Greenberg wrote:
| R-developers:
| 
| I have a small python script that I'd like to include in an R package I'm
| developing, but I'm a bit unclear about which subfolder it should go in.  R
| will be calling the script via a system() call.  Thanks!

Up to you as you control the path. As "Writing R Extensions" explains,
everything below the (source) directory inst/ will get installed.  I like
inst/extScripts/ (or similar) as it denotes that it is an external script.

As an example, the gdata package has Perl code for xls reading/writing below a
directory inst/perl/ -- and I think there are more packages doing this.

Dirk


-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Where to drop a python script?

2013-10-30 Thread Jonathan Greenberg
R-developers:

I have a small python script that I'd like to include in an R package I'm
developing, but I'm a bit unclear about which subfolder it should go in.  R
will be calling the script via a system() call.  Thanks!

--j

-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unique(1:3,nmax=1) freezes R

2013-10-30 Thread Joris Meys
I can reproduce this bug.

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252
 LC_MONETARY=Dutch_Belgium.1252
[4] LC_NUMERIC=C   LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] grDevices datasets  splines   graphics  stats tcltk utils
methods   base

other attached packages:
[1] svSocket_0.9-55 TinnR_1.0-5 R2HTML_2.2.1Hmisc_3.12-2
 Formula_1.1-1
[6] survival_2.37-4

loaded via a namespace (and not attached):
[1] cluster_1.14.4  fortunes_1.5-0  grid_3.0.1  lattice_0.20-23
rpart_4.1-3
[6] svMisc_0.9-69   tools_3.0.1


On Wed, Oct 30, 2013 at 9:05 AM, Helske Jouni  wrote:

> Dear all,
> I was playing around with factor contrasts, and found the argument nmax on
> function factor.  When using nmax=1, R froze completely, and I had to close
> it from task manager. After some debugging, I found that the problem is
> actually in unique-function, where the internal unique function is called:
>
> .Internal(unique(x, incomparables, fromLast, nmax))
>
> More generally, it looks like unique(x,nmax=k) freezes R the length of x
> is larger than 2 and k=1, and when nmax unique.default(1:5, nmax = 3) : hash table is full".
>
> Of course using nmax=1 doesn't make much sense, but maybe some check would
> be in place before calling internal unique?
>
> Best regards,
>
> Jouni Helske
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Imports, importFrom slow (for Matrix)

2013-10-30 Thread Martin Maechler
> Gábor Csárdi 
> on Tue, 29 Oct 2013 10:31:14 -0400 writes:

> Oh, you mean to put Matrix:: in the functions that need
> Matrix, right, of course. Then yes, this could be a
> solution. I have some issue with some new class
> definitions, but I can probably work them out.

> Gabor

otherwise, please contact  matrix-auth...@r-project.org 
(me being one of the two).

To the whole issue in the 'Subject':  Yes, indeed importing from
Matrix i.e., loading the Matrix namespace, is slow notably compared to many 
other parts of R including its startup.

I have not had time to investigate, but I even have a vague
feeling that it got slower than it used to be a
couple of R (and Matrix) versions ago.

Note that your timings below are a bit biased because you use
Rscript which unfortunately does not pre-[load + attach] the
'methods' package.  But then your bias is only about 0.1 seconds
from my measurements (the time increases from 0.35 to 0.45; not
sure if this is still a good enough reason to omit  'methods'
from Rscript by default). 

And ... yes, R is free (aka "libre") software, and proposals for
changes that speedup the loadNamespace("Matrix") without having
to change the R code inside Matrix much are highly appreciated.

Martin Maechler, 
ETH Zurich



> On Tue, Oct 29, 2013 at 10:25 AM, Gábor Csárdi
>  wrote:
>> Unfortunately that seems to be (almost) just as slow.
>> 
>> ~$ time Rscript -e 'Matrix::summary; ls()' > /dev/null
>> real 0m2.785s user 0m2.668s sys 0m0.112s
>> 
>> Gabor
>> 
>> On Tue, Oct 29, 2013 at 10:11 AM, Prof Brian Ripley
>>  wrote:
>>> On 29/10/2013 14:03, Gábor Csárdi wrote:
 
 Dear All,
 
 before its latest version my package had 'Imports:
 Matrix' in its DESCRIPTION file, but it did not import
 anything in NAMESPACE. Rather, some functions
 explicitly loaded Matrix, as they needed. The reason
 for this was that importing Matrix is really slow, and
 only very few igraph functions need it. (I guess Matrix
 is slow because of the many registered names, but that
 is another question.)
 
 # Empty session: ~$ time Rscript -e 'ls()' > /dev/null
 real 0m0.251s user 0m0.196s sys 0m0.049s
 
 # Without importing from Matrix: ~$ time Rscript -e
 'library(igraph); ls()' > /dev/null Loading required
 package: methods real 0m0.419s user 0m0.363s sys
 0m0.049s
 
 # Adding importFrom(Matrix, sparseMatrix) to NAMESPACE:
 ~$ time Rscript -e 'library(igraph); ls()' > /dev/null
 Loading required package: methods real 0m2.963s user
 0m2.844s sys 0m0.115s
 
 This solution was fine with me, especially because
 other packages depending on igraph and using Matrix
 through igraph worked fine on the CRAN build servers,
 as igraph brought Matrix with it. (The build servers
 don't have recommended packages like Matrix available
 by default.)
 
 Recently, R CMD check does not allow me to list Matrix
 at Imports without importing something from it. This is
 understandable, because it is an inconsistency after
 all, but caused some headache for me.
 
 A 3s loading time for a package is IMHO much longer
 than ideal, especially that loading R itself is ten
 times faster. So I definitely don't want to import from
 Matrix right now.
 
 The solution I settled with was to include Matrix in
 'Suggests', and the load it selectively, as before. Now
 some packages depending on igraph are failing on the
 CRAN build servers, which don't have Matrix installed
 for these packages. (Luckily they are probably not
 failing for users, because most users do have the
 recommended packages.)
 
 In summary, it would be great to speed up imports.
 
 Another solution would be some mechanism that allows me
 to import from a package as needed, not at the package
 loading time. Something like a delayed importFrom().
>>> 
>>> 
>>> That is what Matrix:: does.  There is nothing like
>>> enough here for us to tell why it would not suffice for
>>> you.  (If you want to import something occasionally and
>>> then use it very many times, make a local copy.)
>>> 
 
 Just wanted to bring up this issue.
 
 Best, Gabor
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
>>> 
>>> 
>>> --
>>> Brian D. Ripley, rip...@stats.ox.ac.uk Professor of
>>> Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford, Tel: +44 1865 272861 (

Re: [Rd] Imports, importFrom slow (for Matrix)

2013-10-30 Thread Martin Maechler
> Gábor Csárdi 
> on Tue, 29 Oct 2013 10:31:14 -0400 writes:

> Oh, you mean to put Matrix:: in the functions that need
> Matrix, right, of course. Then yes, this could be a
> solution. I have some issue with some new class
> definitions, but I can probably work them out.

> Gabor

otherwise, please contact  matrix-auth...@r-project.org 
(me being one of the two).

To the whole issue in the 'Subject':  Yes, indeed importing from
Matrix i.e., loading the Matrix namespace, is slow notably compared to many 
other parts of R including its startup.

I have not had time to investigate, but I even have a vague
feeling that it got considerably slower than it used to be a
couple of R (and Matrix) versions ago.

Note that your timings below are a bit biased because you use
Rscript which unfortunately does not pre-[load + attach] the
'methods' package.
But your bias is only about 0.1 second from my measurements;
the time increases from 0.35 to 0.45  not sure if this is still
a good enough reason to omit  'methods' from Rscript.

Martin



> On Tue, Oct 29, 2013 at 10:25 AM, Gábor Csárdi
>  wrote:
>> Unfortunately that seems to be (almost) just as slow.
>> 
>> ~$ time Rscript -e 'Matrix::summary; ls()' > /dev/null
>> real 0m2.785s user 0m2.668s sys 0m0.112s
>> 
>> Gabor
>> 
>> On Tue, Oct 29, 2013 at 10:11 AM, Prof Brian Ripley
>>  wrote:
>>> On 29/10/2013 14:03, Gábor Csárdi wrote:
 
 Dear All,
 
 before its latest version my package had 'Imports:
 Matrix' in its DESCRIPTION file, but it did not import
 anything in NAMESPACE. Rather, some functions
 explicitly loaded Matrix, as they needed. The reason
 for this was that importing Matrix is really slow, and
 only very few igraph functions need it. (I guess Matrix
 is slow because of the many registered names, but that
 is another question.)
 
 # Empty session: ~$ time Rscript -e 'ls()' > /dev/null
 real 0m0.251s user 0m0.196s sys 0m0.049s
 
 # Without importing from Matrix: ~$ time Rscript -e
 'library(igraph); ls()' > /dev/null Loading required
 package: methods real 0m0.419s user 0m0.363s sys
 0m0.049s
 
 # Adding importFrom(Matrix, sparseMatrix) to NAMESPACE:
 ~$ time Rscript -e 'library(igraph); ls()' > /dev/null
 Loading required package: methods real 0m2.963s user
 0m2.844s sys 0m0.115s
 
 This solution was fine with me, especially because
 other packages depending on igraph and using Matrix
 through igraph worked fine on the CRAN build servers,
 as igraph brought Matrix with it. (The build servers
 don't have recommended packages like Matrix available
 by default.)
 
 Recently, R CMD check does not allow me to list Matrix
 at Imports without importing something from it. This is
 understandable, because it is an inconsistency after
 all, but caused some headache for me.
 
 A 3s loading time for a package is IMHO much longer
 than ideal, especially that loading R itself is ten
 times faster. So I definitely don't want to import from
 Matrix right now.
 
 The solution I settled with was to include Matrix in
 'Suggests', and the load it selectively, as before. Now
 some packages depending on igraph are failing on the
 CRAN build servers, which don't have Matrix installed
 for these packages. (Luckily they are probably not
 failing for users, because most users do have the
 recommended packages.)
 
 In summary, it would be great to speed up imports.
 
 Another solution would be some mechanism that allows me
 to import from a package as needed, not at the package
 loading time. Something like a delayed importFrom().
>>> 
>>> 
>>> That is what Matrix:: does.  There is nothing like
>>> enough here for us to tell why it would not suffice for
>>> you.  (If you want to import something occasionally and
>>> then use it very many times, make a local copy.)
>>> 
 
 Just wanted to bring up this issue.
 
 Best, Gabor
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
>>> 
>>> 
>>> --
>>> Brian D. Ripley, rip...@stats.ox.ac.uk Professor of
>>> Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford, Tel: +44 1865 272861 (self) 1
>>> South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG,
>>> UK Fax: +44 1865 272595
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https

[Rd] unique(1:3,nmax=1) freezes R

2013-10-30 Thread Helske Jouni
Dear all,
I was playing around with factor contrasts, and found the argument nmax on 
function factor.  When using nmax=1, R froze completely, and I had to close it 
from task manager. After some debugging, I found that the problem is actually 
in unique-function, where the internal unique function is called:

.Internal(unique(x, incomparables, fromLast, nmax))

More generally, it looks like unique(x,nmax=k) freezes R the length of x is 
larger than 2 and k=1, and when nmaxhttps://stat.ethz.ch/mailman/listinfo/r-devel