[Rd] 'parallel' package changes '.Random.seed'

2014-03-06 Thread Henric Winell

Hi,

I've implemented parallelization in one of my packages using the 
'parallel' package -- many thanks for providing it!


In my package I'm importing 'parallel' and so added it to the 
DESCRIPTION file's 'Import:' tag and also added a 
'importFrom(parallel, ...)' statement in the NAMESPACE file.


Parallelization works nicely, but my package no longer passes any parts 
of its (unparallelized) checks that depends on random number generation, 
e.g., the simulated data in the check suite are no longer the same as 
before parallelization was added.  This seems to be due to 'parallel' 
changing '.Random.seed' when loading its name space:


 set.seed(1)
 rs1 - .Random.seed
 rnorm(1)
[1] -0.6264538
 set.seed(1)
 rs2 - .Random.seed
 identical(rs1, rs2)
[1] TRUE
 loadNamespace(parallel)
environment: namespace:parallel
 rs3 - .Random.seed
 identical(rs1, rs3)
[1] FALSE
 rnorm(1)
[1] -0.3262334
 set.seed(1)
 rs4 - .Random.seed
 identical(rs1, rs4)
[1] TRUE

I've taken a look at the 'parallel' source code, and in a few places a 
call to 'runif(1)' is issued.  So, what effectively seems to happen when 
'parallel' is loaded is


 set.seed(1)
 runif(1)
[1] 0.2655087
 rnorm(1)
[1] -0.3262334

which reproduces the above.  But is this really necessary?  And more 
importantly (at least to me):  Can it somehow be avoided?


The current state of affairs is a bit unfortunate, since it implies that 
a user just by loading the new parallelized version of my package can no 
longer reproduce any subsequent results depending on random number 
generation (unless a call to 'set.seed' was issued *after* attaching my 
package).


I'd be most grateful for any help that you're able to provide here. 
Many thanks!


Kind regards,
Henric Winell



sessionInfo()

R Under development (unstable) (2014-01-26 r64897)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'parallel' package changes '.Random.seed'

2014-03-06 Thread Henric Winell

Comments below.

On 2014-03-06 11:17, Henric Winell wrote:

Hi,

I've implemented parallelization in one of my packages using the
'parallel' package -- many thanks for providing it!

In my package I'm importing 'parallel' and so added it to the
DESCRIPTION file's 'Import:' tag and also added a
'importFrom(parallel, ...)' statement in the NAMESPACE file.

Parallelization works nicely, but my package no longer passes any parts
of its (unparallelized) checks that depends on random number generation,
e.g., the simulated data in the check suite are no longer the same as
before parallelization was added.  This seems to be due to 'parallel'
changing '.Random.seed' when loading its name space:

  set.seed(1)
  rs1 - .Random.seed
  rnorm(1)
[1] -0.6264538
  set.seed(1)
  rs2 - .Random.seed
  identical(rs1, rs2)
[1] TRUE
  loadNamespace(parallel)
environment: namespace:parallel
  rs3 - .Random.seed
  identical(rs1, rs3)
[1] FALSE
  rnorm(1)
[1] -0.3262334
  set.seed(1)
  rs4 - .Random.seed
  identical(rs1, rs4)
[1] TRUE

I've taken a look at the 'parallel' source code, and in a few places a
call to 'runif(1)' is issued.  So, what effectively seems to happen when
'parallel' is loaded is

  set.seed(1)
  runif(1)
[1] 0.2655087
  rnorm(1)
[1] -0.3262334


Some digging reveals that this is due to no port number for the socket 
connection being set by default, in which case 'parallel' picks a random 
port in the 11000-11999 range using 'runif(1L)'.  So, by setting 
R_PARALLEL_PORT the '.Random.seed' object is no longer touched:


 Sys.setenv(R_PARALLEL_PORT = 11500)
 set.seed(1)
 rs1 - .Random.seed
 loadNamespace(parallel)
environment: namespace:parallel
 rs2 - .Random.seed
 identical(rs1, rs2)
[1] TRUE

This is handled in the 'initDefaultClusterOptions' function in 'snow.R', 
where line 88 has


port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300)%%1)

It seems to me that we can tread more carefully here.  I've attached a 
trivial patch that


1. Checks if '.Random.seed' exists
2. If TRUE:  a) save '.Random.seed'
 b) make the call above
 c) reset '.Random.seed' to its state in a)
   If FALSE: a) make the call above
 b) remove '.Random.seed'

In due course I hope someone is interested enough to review it.


Henric Winell





which reproduces the above.  But is this really necessary?  And more
importantly (at least to me):  Can it somehow be avoided?

The current state of affairs is a bit unfortunate, since it implies that
a user just by loading the new parallelized version of my package can no
longer reproduce any subsequent results depending on random number
generation (unless a call to 'set.seed' was issued *after* attaching my
package).

I'd be most grateful for any help that you're able to provide here. Many
thanks!

Kind regards,
Henric Winell



sessionInfo()

R Under development (unstable) (2014-01-26 r64897)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Index: snow.R
===
--- snow.R	(revision 65125)
+++ snow.R	(working copy)
@@ -84,8 +84,16 @@
 rscript - file.path(R.home(bin), Rscript)
 port - Sys.getenv(R_PARALLEL_PORT)
 port - if (identical(port, random)) NA else as.integer(port)
-if (is.na(port))
-port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+if (is.na(port)) {
+if (exists(.Random.seed, envir = .GlobalEnv, inherits = FALSE)) {
+seed - get(.Random.seed, envir = .GlobalEnv, inherits = FALSE)
+port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+assign(.Random.seed, seed, envir = .GlobalEnv, inherits = FALSE)
+} else {
+port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+rm(.Random.seed, seed, envir = .GlobalEnv, inherits = FALSE)
+}
+}
 options - list(port = as.integer(port),
 timeout = 60 * 60 * 24 * 30, # 30 days
 master =  Sys.info()[nodename],
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'parallel' package changes '.Random.seed'

2014-03-06 Thread Prof Brian Ripley

On 06/03/2014 10:17, Henric Winell wrote:

Hi,

I've implemented parallelization in one of my packages using the
'parallel' package -- many thanks for providing it!

In my package I'm importing 'parallel' and so added it to the
DESCRIPTION file's 'Import:' tag and also added a
'importFrom(parallel, ...)' statement in the NAMESPACE file.

Parallelization works nicely, but my package no longer passes any parts
of its (unparallelized) checks that depends on random number generation,
e.g., the simulated data in the check suite are no longer the same as
before parallelization was added.  This seems to be due to 'parallel'
changing '.Random.seed' when loading its name space:

  set.seed(1)
  rs1 - .Random.seed
  rnorm(1)
[1] -0.6264538
  set.seed(1)
  rs2 - .Random.seed
  identical(rs1, rs2)
[1] TRUE
  loadNamespace(parallel)
environment: namespace:parallel
  rs3 - .Random.seed
  identical(rs1, rs3)
[1] FALSE
  rnorm(1)
[1] -0.3262334
  set.seed(1)
  rs4 - .Random.seed
  identical(rs1, rs4)
[1] TRUE

I've taken a look at the 'parallel' source code, and in a few places a
call to 'runif(1)' is issued.  So, what effectively seems to happen when
'parallel' is loaded is

  set.seed(1)
  runif(1)
[1] 0.2655087
  rnorm(1)
[1] -0.3262334

which reproduces the above.  But is this really necessary?


Yes, in the places it is used.  Two are to do with setting up parallel 
streams when called, and the other is only called if R_PARALLEL_PORT is 
unset.


So set R_PARALLEL_PORT.

But your presumptions are wrong: R is perfectly entitled to use its 
random number generator, as is other code running in the R interpreter. 
 Once your call returns you cannot expect the session state to remain 
unchanged.



And more

importantly (at least to me):  Can it somehow be avoided?

The current state of affairs is a bit unfortunate, since it implies that
a user just by loading the new parallelized version of my package can no
longer reproduce any subsequent results depending on random number
generation (unless a call to 'set.seed' was issued *after* attaching my
package).

I'd be most grateful for any help that you're able to provide here. Many
thanks!

Kind regards,
Henric Winell



sessionInfo()

R Under development (unstable) (2014-01-26 r64897)


See what the posting guide says about updating before posting 


Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] makepredictcall

2014-03-06 Thread Therneau, Terry M., Ph.D.
An issue came up with the rms package today that makepredictcall would solve, and I was 
going to suggest it to the author.  But looking in the help documents I couldn't find any 
reference to it.  There is a manual page, but it does not give much aid in creating code 
for a new transformation function. Did I miss something?


If not, I'd be willing to draft a paragraph about that which could be added to the 
extensions document.  I figured it out, somehow, for the pspline function of the survival 
package.  Submit such draft to ?  The naresid function would be another useful addition.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Create dataframe in C from table and return to R

2014-03-06 Thread Sandip Nandi
Hi ,

I am trying to create a dataframe in C and sebd it back to R.  Can anyone
point me to the part of the source code where it is doing ,  let me explain
the problem I am having .


My simple implementation is like this

SEXP formDF() {

SEXP dfm ,df , dfint , dfStr,lsnm;
char *ab[3] = {aa,vv,gy};
int sn[3] ={99,89,12};
char *listnames[2] = {int,string};
int i;


PROTECT(df = allocVector(VECSXP,2));
PROTECT(dfint = allocVector(INTSXP,3));
PROTECT(dfStr = allocVector(STRSXP,3));
PROTECT(lsnm = allocVector(STRSXP,2));

SET_STRING_ELT(lsnm,0,mkChar(int));
SET_STRING_ELT(lsnm,1,mkChar(string));

for ( i = 0 ; i  3; i++ ) {
SET_STRING_ELT(dfStr,i,mkChar(ab[i]));
INTEGER(dfint)[i] = sn[i];
}
SET_VECTOR_ELT(df,0,dfint);
SET_VECTOR_ELT(df,1,dfStr);
setAttrib(df,R_NamesSymbol,lsnm);
//PROTECT(dfm=LCONS(dfm,list3(dfm,R_MissingArg,mkFalse(;

UNPROTECT(4);

dfm = PROTECT(lang2(install(data.frame),df));
SEXP res = PROTECT(eval(dfm,R_GlobalEnv));

UNPROTECT(2)

}


It works fine but i want it the other way

the output is
print(result)
  int string
1  99 aa
2  89 vv
3  12 gy


I want it in transposed . like

dft - as.data.frame(t(result))

*Can I do the transpose it from C itself ? Which part of code I should look
a*t .

What My objective ?

*Reading  rows of a table and create a dataframe out of it .  R is embedded
in database so cannot call the odbc .  Need to implement that part .
Database gives me API only to get a whole row at once .*

Thanks,
Sandip

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] version numbers for CRAN submissions that give warnings/notes

2014-03-06 Thread Michael Friendly
It often happens that I submit a new revision of a package, say 
mypkg-1.0-10, from R-Forge
to CRAN after running R CMD check locally and looking at the log files 
on R-Forge.
But R-Forge has the devel checks disabled, and I get an email from CRAN 
pointing out

some new warning or note I'm asked to correct.

OK, I correct this and commit a new rev to R-Forge.  But, is it still 
required to bump the
version number to mypkg-1.0-11 before resubmitting to CRAN, even though 
mypkg-1.0-10

did not make it there?

To do so means also modifying the DESCRIPTION, NEWS and
mypkg-package.Rd files even for a minor warning or note.

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.  Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Create dataframe in C from table and return to R

2014-03-06 Thread Duncan Murdoch

On 06/03/2014 1:47 PM, Sandip Nandi wrote:

Hi ,

I am trying to create a dataframe in C and sebd it back to R.  Can anyone
point me to the part of the source code where it is doing ,  let me explain
the problem I am having .


My simple implementation is like this

SEXP formDF() {

SEXP dfm ,df , dfint , dfStr,lsnm;
char *ab[3] = {aa,vv,gy};
int sn[3] ={99,89,12};
char *listnames[2] = {int,string};
int i;


PROTECT(df = allocVector(VECSXP,2));
PROTECT(dfint = allocVector(INTSXP,3));
PROTECT(dfStr = allocVector(STRSXP,3));
PROTECT(lsnm = allocVector(STRSXP,2));

SET_STRING_ELT(lsnm,0,mkChar(int));
SET_STRING_ELT(lsnm,1,mkChar(string));

for ( i = 0 ; i  3; i++ ) {
SET_STRING_ELT(dfStr,i,mkChar(ab[i]));
INTEGER(dfint)[i] = sn[i];
}
SET_VECTOR_ELT(df,0,dfint);
SET_VECTOR_ELT(df,1,dfStr);
setAttrib(df,R_NamesSymbol,lsnm);
//PROTECT(dfm=LCONS(dfm,list3(dfm,R_MissingArg,mkFalse(;

UNPROTECT(4);

dfm = PROTECT(lang2(install(data.frame),df));
SEXP res = PROTECT(eval(dfm,R_GlobalEnv));

UNPROTECT(2)

}


It works fine but i want it the other way

the output is
print(result)
   int string
1  99 aa
2  89 vv
3  12 gy


I want it in transposed . like

dft - as.data.frame(t(result))

*Can I do the transpose it from C itself ? Which part of code I should look
a*t .

What My objective ?

*Reading  rows of a table and create a dataframe out of it .  R is embedded
in database so cannot call the odbc .  Need to implement that part .
Database gives me API only to get a whole row at once .*


What you are asking for isn't a normal dataframe.  Dataframe columns are 
vectors all of a type.  You want the first row to be a string, the 
second row to be an integer.  You can't do that with simple atomic 
columns, and you probably don't want to mess with the alternative (which 
is to have your columns be lists), because no user will know how to deal 
with that.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] makepredictcall

2014-03-06 Thread Prof Brian Ripley
See the developer site, e.g. 
http://developer.r-project.org/model-fitting-functions.txt .


That is where specialized info is (and this is specialized).

On 06/03/2014 18:19, Therneau, Terry M., Ph.D. wrote:

An issue came up with the rms package today that makepredictcall would
solve, and I was going to suggest it to the author.  But looking in the
help documents I couldn't find any reference to it.  There is a manual
page, but it does not give much aid in creating code for a new
transformation function. Did I miss something?

If not, I'd be willing to draft a paragraph about that which could be
added to the extensions document.  I figured it out, somehow, for the
pspline function of the survival package.  Submit such draft to ?
The naresid function would be another useful addition.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] version numbers for CRAN submissions that give warnings/notes

2014-03-06 Thread Duncan Murdoch

On 06/03/2014 2:39 PM, Michael Friendly wrote:

It often happens that I submit a new revision of a package, say
mypkg-1.0-10, from R-Forge
to CRAN after running R CMD check locally and looking at the log files
on R-Forge.
But R-Forge has the devel checks disabled, and I get an email from CRAN
pointing out
some new warning or note I'm asked to correct.

OK, I correct this and commit a new rev to R-Forge.  But, is it still
required to bump the
version number to mypkg-1.0-11 before resubmitting to CRAN, even though
mypkg-1.0-10
did not make it there?

To do so means also modifying the DESCRIPTION, NEWS and
mypkg-package.Rd files even for a minor warning or note.



That sounds like a question about CRAN policy, so I think you'll need to 
write to c...@r-project.org for an answer.  But I would assume it could 
cause confusion if you submitted another identically named tarball, and 
I'd recommend bumping the version number.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] version numbers for CRAN submissions that give warnings/notes

2014-03-06 Thread Prof Brian Ripley

On 06/03/2014 20:22, Duncan Murdoch wrote:

On 06/03/2014 2:39 PM, Michael Friendly wrote:

It often happens that I submit a new revision of a package, say
mypkg-1.0-10, from R-Forge
to CRAN after running R CMD check locally and looking at the log files
on R-Forge.
But R-Forge has the devel checks disabled, and I get an email from CRAN
pointing out
some new warning or note I'm asked to correct.


So do as the CRAN policies ask, and check with R-devel locally (or on 
winbuilder).  CRAN does not run R-Forge and suggestions should be made 
to its management.




OK, I correct this and commit a new rev to R-Forge.  But, is it still
required to bump the
version number to mypkg-1.0-11 before resubmitting to CRAN, even though
mypkg-1.0-10
did not make it there?

To do so means also modifying the DESCRIPTION, NEWS and
mypkg-package.Rd files even for a minor warning or note.


Not really: much more courteous to follow the policies and get it right 
in the first place.



That sounds like a question about CRAN policy, so I think you'll need to
write to c...@r-project.org for an answer.  But I would assume it could
cause confusion if you submitted another identically named tarball, and
I'd recommend bumping the version number.


That's the correct advice.  CRAN does not in general currently insist 
that each submission has a new number, but enough maintainers get 
confused that it is recommended (and for maintainers with a track record 
of confusion, insisted on).



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Saptarshi Guha
Hello,

This  is a question that probably reveals my lack of understanding.
In a C function (call it cfunc), i created a SEXP, called S, and then
called R_PreserveObject on S.

I returned the SEXP to the calling R function (call it rfunc). Note, I
didn't call
R_ReleaseObject on S.

v - .Call(cfunc)

So, are the following  statements correct

1.  S is 'doubly' protected from the GC by  being associated both with 'v'
and because it has been added to the precious list (via a call to
R_PreserveObject without ReleaseObject being called)

2. I have another C function called cfunc2. In cfunc2, I call
R_ReleaseObject on S.  S , however, is still protected from the GC, because
it is associated with 'v'

Is (1) and (2) correct?

I have not used R_protect/unprotect, because if I return from cfunc without
the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd
rather not have to worry about that because i intend to balance it later.

Regards
Saptarshi

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Gabriel Becker
Saptarshi,

R_PreserveObject and R_ReleaseObject are, as far as I know, intended for
the rare situations where you need to maintain a SEXP in C that is not
pointed to by any R level symbols, and across e.g. .Calls.

If you are returning an object to R (v in this case) and intend to do
.Call(cfunc2, v) later, there is no benefit at all to having called
R_PreserveObject on it. The only case where your object would lose
protection (v and all other R symbols being rm'd/going out of scope) would
also cause to to lose your only reference to the SEXP, so by R_Preserve'ing
all you will have done is create an unreachable protected pointer.

If you are keeping a static pointer to the SEXP down in C code that R can't
see, then R_PreserveObject would be appropriate, but the situations where
doing that is a good idea are rare (though they do exist).

HTH,
~G




On Thu, Mar 6, 2014 at 2:32 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote:

 Hello,

 This  is a question that probably reveals my lack of understanding.
 In a C function (call it cfunc), i created a SEXP, called S, and then
 called R_PreserveObject on S.

 I returned the SEXP to the calling R function (call it rfunc). Note, I
 didn't call
 R_ReleaseObject on S.

 v - .Call(cfunc)

 So, are the following  statements correct

 1.  S is 'doubly' protected from the GC by  being associated both with 'v'
 and because it has been added to the precious list (via a call to
 R_PreserveObject without ReleaseObject being called)

 2. I have another C function called cfunc2. In cfunc2, I call
 R_ReleaseObject on S.  S , however, is still protected from the GC, because
 it is associated with 'v'

 Is (1) and (2) correct?

 I have not used R_protect/unprotect, because if I return from cfunc without
 the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd
 rather not have to worry about that because i intend to balance it later.

 Regards
 Saptarshi

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Saptarshi Guha
Hello,

However, I do need some sort of protection.
(pseudo code)

SEXP a = Rf_allocVector(STRSXP,)
protect a
for i = 1 to length of vector
  SET_STRING_ELT(a,i, Rf_mkChar(...))
end
unprotect a
return a

I _need _that protect because in the for loop i also call some R functions
and need the object 'a' to be protected.

However, as I pointed out,
1. I replaced protect by PreserveObject
2. remove the unprotect word

I can guarantee, that some time later ReleaseObject will be called on 'a'.

So ultimately, whether this good design or not, the question remains,
given that 'a' is the in precious list, and 'a' is assigned to 'v', if
ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and
therefore not get GC'd.

In rudimentary tests, it doesn't appear to cause seg faults. But is it safe?

Cheers
Thanks
Saptarshi




On Thu, Mar 6, 2014 at 2:48 PM, Gabriel Becker gmbec...@ucdavis.edu wrote:

 Saptarshi,

 R_PreserveObject and R_ReleaseObject are, as far as I know, intended for
 the rare situations where you need to maintain a SEXP in C that is not
 pointed to by any R level symbols, and across e.g. .Calls.

 If you are returning an object to R (v in this case) and intend to do
 .Call(cfunc2, v) later, there is no benefit at all to having called
 R_PreserveObject on it. The only case where your object would lose
 protection (v and all other R symbols being rm'd/going out of scope) would
 also cause to to lose your only reference to the SEXP, so by R_Preserve'ing
 all you will have done is create an unreachable protected pointer.

 If you are keeping a static pointer to the SEXP down in C code that R
 can't see, then R_PreserveObject would be appropriate, but the situations
 where doing that is a good idea are rare (though they do exist).

 HTH,
 ~G




 On Thu, Mar 6, 2014 at 2:32 PM, Saptarshi Guha 
 saptarshi.g...@gmail.comwrote:

 Hello,

 This  is a question that probably reveals my lack of understanding.
 In a C function (call it cfunc), i created a SEXP, called S, and then
 called R_PreserveObject on S.

 I returned the SEXP to the calling R function (call it rfunc). Note, I
 didn't call
 R_ReleaseObject on S.

 v - .Call(cfunc)

 So, are the following  statements correct

 1.  S is 'doubly' protected from the GC by  being associated both with 'v'
 and because it has been added to the precious list (via a call to
 R_PreserveObject without ReleaseObject being called)

 2. I have another C function called cfunc2. In cfunc2, I call
 R_ReleaseObject on S.  S , however, is still protected from the GC,
 because
 it is associated with 'v'

 Is (1) and (2) correct?

 I have not used R_protect/unprotect, because if I return from cfunc
 without
 the equivalent number of unprotects, i get 'unbalanced stack' warnings.
 I'd
 rather not have to worry about that because i intend to balance it later.

 Regards
 Saptarshi

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 --
 Gabriel Becker
 Graduate Student
 Statistics Department
 University of California, Davis


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Repost: (apologies for HTML post) A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Saptarshi Guha
Apologies, I am resending this because my emails seem to go in HTML form.


Hello,

This  is a question that probably reveals my lack of understanding.
In a C function (call it cfunc), i created a SEXP, called S, and then
called R_PreserveObject on S.

I returned the SEXP to the calling R function (call it rfunc). Note, I
didn't call
R_ReleaseObject on S.

v - .Call(cfunc)

So, are the following  statements correct

1.  S is 'doubly' protected from the GC by  being associated both with 'v'
and because it has been added to the precious list (via a call to
R_PreserveObject without ReleaseObject being called)

2. I have another C function called cfunc2. In cfunc2, I call
R_ReleaseObject on S.  S , however, is still protected from the GC, because
it is associated with 'v'

Is (1) and (2) correct?

I have not used R_protect/unprotect, because if I return from cfunc without
the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd
rather not have to worry about that because i intend to balance it later.

Regards
Saptarshi

There was a follow up in a subsequent email

Hello,

However, I do need some sort of protection.
(pseudo code)

SEXP a = Rf_allocVector(STRSXP,)
protect a
for i = 1 to length of vector
  SET_STRING_ELT(a,i, Rf_mkChar(...))
end
unprotect a
return a

I _need _that protect because in the for loop i also call some R functions
and need the object 'a' to be protected.

However, as I pointed out,
1. I replaced protect by PreserveObject
2. remove the unprotect word

I can guarantee, that some time later ReleaseObject will be called on 'a'.

So ultimately, whether this good design or not, the question remains,
given that 'a' is the in precious list, and 'a' is assigned to 'v', if
ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and
therefore not get GC'd.

In rudimentary tests, it doesn't appear to cause seg faults. But is it safe?

Cheers
Thanks
Saptarshi

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Many apologies: last post: A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Saptarshi Guha
Apologies, I am resending this because my emails seem to go in HTML form.
(I haven't as yet figured gmail web interface)
 Hello,

This  is a question that probably reveals my lack of understanding.
In a C function (call it cfunc), i created a SEXP, called S, and then
called R_PreserveObject on S.

I returned the SEXP to the calling R function (call it rfunc). Note, I
didn't call
R_ReleaseObject on S.

v - .Call(cfunc)

So, are the following  statements correct

1.  S is 'doubly' protected from the GC by  being associated both with 'v'
and because it has been added to the precious list (via a call to
R_PreserveObject without ReleaseObject being called)

2. I have another C function called cfunc2. In cfunc2, I call
R_ReleaseObject on S.  S , however, is still protected from the GC, because
it is associated with 'v'

Is (1) and (2) correct?

I have not used R_protect/unprotect, because if I return from cfunc without
the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd
rather not have to worry about that because i intend to balance it later.

Regards
Saptarshi

There was a follow up in a subsequent email

Hello,

However, I do need some sort of protection.
(pseudo code)

SEXP a = Rf_allocVector(STRSXP,)
protect a
for i = 1 to length of vector
  SET_STRING_ELT(a,i, Rf_mkChar(...))
end
unprotect a
return a

I _need _that protect because in the for loop i also call some R functions
and need the object 'a' to be protected.

However, as I pointed out,
1. I replaced protect by PreserveObject
2. remove the unprotect word

I can guarantee, that some time later ReleaseObject will be called on 'a'.

So ultimately, whether this good design or not, the question remains,
given that 'a' is the in precious list, and 'a' is assigned to 'v', if
ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and
therefore not get GC'd.

In rudimentary tests, it doesn't appear to cause seg faults. But is it safe?

Cheers
Thanks
Saptarshi

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Gabriel Becker
Saptarshi,

 a will be protected throughout the entirety of any r functions you call in
the loop. Do you have evidence that this is not the case? The protect stack
is last on first off, so assuming balance, even if UNPROTECT() is called
underneath the R functions you are calling, it won't touch a, only things
that function put onto the stack after a. There is still no reason to use
PreserveObject/ReleaseObject.

HTH,
~G


~G


On Thu, Mar 6, 2014 at 3:56 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote:

 Hello,

 However, I do need some sort of protection.
 (pseudo code)

 SEXP a = Rf_allocVector(STRSXP,)
 protect a
 for i = 1 to length of vector
   SET_STRING_ELT(a,i, Rf_mkChar(...))
 end
 unprotect a
 return a

 I _need _that protect because in the for loop i also call some R functions
 and need the object 'a' to be protected.

 However, as I pointed out,
 1. I replaced protect by PreserveObject
 2. remove the unprotect word

 I can guarantee, that some time later ReleaseObject will be called on 'a'.

 So ultimately, whether this good design or not, the question remains,
 given that 'a' is the in precious list, and 'a' is assigned to 'v', if
 ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and
 therefore not get GC'd.

 In rudimentary tests, it doesn't appear to cause seg faults. But is it
 safe?

 Cheers
 Thanks
 Saptarshi




 On Thu, Mar 6, 2014 at 2:48 PM, Gabriel Becker gmbec...@ucdavis.eduwrote:

 Saptarshi,

 R_PreserveObject and R_ReleaseObject are, as far as I know, intended for
 the rare situations where you need to maintain a SEXP in C that is not
 pointed to by any R level symbols, and across e.g. .Calls.

 If you are returning an object to R (v in this case) and intend to do
 .Call(cfunc2, v) later, there is no benefit at all to having called
 R_PreserveObject on it. The only case where your object would lose
 protection (v and all other R symbols being rm'd/going out of scope) would
 also cause to to lose your only reference to the SEXP, so by R_Preserve'ing
 all you will have done is create an unreachable protected pointer.

 If you are keeping a static pointer to the SEXP down in C code that R
 can't see, then R_PreserveObject would be appropriate, but the situations
 where doing that is a good idea are rare (though they do exist).

 HTH,
 ~G




 On Thu, Mar 6, 2014 at 2:32 PM, Saptarshi Guha 
 saptarshi.g...@gmail.comwrote:

 Hello,

 This  is a question that probably reveals my lack of understanding.
 In a C function (call it cfunc), i created a SEXP, called S, and then
 called R_PreserveObject on S.

 I returned the SEXP to the calling R function (call it rfunc). Note, I
 didn't call
 R_ReleaseObject on S.

 v - .Call(cfunc)

 So, are the following  statements correct

 1.  S is 'doubly' protected from the GC by  being associated both with
 'v'
 and because it has been added to the precious list (via a call to
 R_PreserveObject without ReleaseObject being called)

 2. I have another C function called cfunc2. In cfunc2, I call
 R_ReleaseObject on S.  S , however, is still protected from the GC,
 because
 it is associated with 'v'

 Is (1) and (2) correct?

 I have not used R_protect/unprotect, because if I return from cfunc
 without
 the equivalent number of unprotects, i get 'unbalanced stack' warnings.
 I'd
 rather not have to worry about that because i intend to balance it later.

 Regards
 Saptarshi

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 --
 Gabriel Becker
 Graduate Student
 Statistics Department
 University of California, Davis





-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Simon Urbanek
On Mar 6, 2014, at 5:32 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote:

 Hello,
 
 This  is a question that probably reveals my lack of understanding.
 In a C function (call it cfunc), i created a SEXP, called S, and then
 called R_PreserveObject on S.
 
 I returned the SEXP to the calling R function (call it rfunc). Note, I
 didn't call
 R_ReleaseObject on S.
 
 v - .Call(cfunc)
 
 So, are the following  statements correct
 
 1.  S is 'doubly' protected from the GC by  being associated both with 'v'
 and because it has been added to the precious list (via a call to
 R_PreserveObject without ReleaseObject being called)
 

yes


 2. I have another C function called cfunc2. In cfunc2, I call
 R_ReleaseObject on S.  S , however, is still protected from the GC, because
 it is associated with 'v'
 

yes (assuming the binding to v still exists at that point). Note, however, that 
is such a case you R_PreserveObject() is pointless since you don't need to 
protect it on exit (that's in fact the convention - return results are never 
protected).


 Is (1) and (2) correct?
  
 I have not used R_protect/unprotect, because if I return from cfunc without 
 the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd 
 rather not have to worry about that because i intend to balance it later.
 

Normally, you should not keep the result of a function protected since it means 
you *have* to guarantee the unprotect at a later point. That is in general 
impossible to guarantee unless you have another object that is holding a 
reference that will be cleared by an explicitly registered finalizer. So unless 
that is the case, you are creating an explicit leak = bad. If you don't have as 
stack-like design, you can always use explicitly managed object for the 
lifetime (personally, I prefer that) since all chained objects are protected by 
design, or use REPROTECT.

Cheers,
Simon


 Regards
 Saptarshi
 
   [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A question about multiple(?) out of order ReleaseObject

2014-03-06 Thread Saptarshi Guha
Hello Simon,

Thanks much for the replies. It makes sense now and I'm on the right
track. I'll explain why this
might happen.

I've written a package caller rterra [1] which essentially allows the
R user to write extensions in
Lua. The extensions are JIT compiled via LuaJIT. For deterministic
performance the user can write
their extension in Terra.

To make memory management easier for the extension writer, i wanted auto memory
management. Consider the following code sample which calls an
extension written in Lua to compute a sum
by traversing some JSON

1. In R, the extension, is called as

l - terra(computeCountsByDayAndCountry, values, c(2013-01-01,2014-12-31))

2. In Lua, the code looks like

function computeCountsByDayAndCountry(jsonstr,dateRange)
   local x,y = R.Robj(jsonstr),R.Robj(dateRange)
   local dstart,dend = ffi.string(y[0]),ffi.string(y[1])
   local f =  R.Robj{type='str', length =#x}
   R.autoProtect(f) -- line 'A'
   for index = 0, #x-1 do
  local ok,jc  = pcall(cjson.decode,ffi.string(x[index][0]))
  if ok then
 f[index] = cjson.encode( _computeSearchCountsCountry(jc,dstart,dend))
  else
 f[index] = cjson.encode({})
  end
   end
   return f
end

When this returns, the SEXP contained in 'f' is associated with 'l' (in step 1)

LuaJIT has garbage collection. At http://luajit.org/ext_ffi_api.html,
when 'f' above (line 'A') get
garbage collected, a finalizer is run. The call to autoProtect does

1. calls PReserveObject on f.sexp
2. sets the finalizer to call ReleaseObject.

When the user calls an extension written in Lua again, 'f' will get
garbage collected by Lua
(assuming it has no references in Lua).

And yes, there will be a mem leak, *iff* the user *never* calls an
extension written in Lua again,
otherwise no.

And of course, the user can not call autoProtect, but can manage it
themselves, i.e. using R.protect
and R.unprotect.

Thanks for the insight

http://people.mozilla.org/~sguha/blog/2013/08/01/rterra_first_post.html

Cheers
Saptarshi

On Thu, Mar 6, 2014 at 7:05 PM, Simon Urbanek
simon.urba...@r-project.org wrote:
 On Mar 6, 2014, at 5:32 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote:

 Hello,

 This  is a question that probably reveals my lack of understanding.
 In a C function (call it cfunc), i created a SEXP, called S, and then
 called R_PreserveObject on S.

 I returned the SEXP to the calling R function (call it rfunc). Note, I
 didn't call
 R_ReleaseObject on S.

 v - .Call(cfunc)

 So, are the following  statements correct

 1.  S is 'doubly' protected from the GC by  being associated both with 'v'
 and because it has been added to the precious list (via a call to
 R_PreserveObject without ReleaseObject being called)


 yes


 2. I have another C function called cfunc2. In cfunc2, I call
 R_ReleaseObject on S.  S , however, is still protected from the GC, because
 it is associated with 'v'


 yes (assuming the binding to v still exists at that point). Note, however, 
 that is such a case you R_PreserveObject() is pointless since you don't need 
 to protect it on exit (that's in fact the convention - return results are 
 never protected).


 Is (1) and (2) correct?

 I have not used R_protect/unprotect, because if I return from cfunc without 
 the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd 
 rather not have to worry about that because i intend to balance it later.


 Normally, you should not keep the result of a function protected since it 
 means you *have* to guarantee the unprotect at a later point. That is in 
 general impossible to guarantee unless you have another object that is 
 holding a reference that will be cleared by an explicitly registered 
 finalizer. So unless that is the case, you are creating an explicit leak = 
 bad. If you don't have as stack-like design, you can always use explicitly 
 managed object for the lifetime (personally, I prefer that) since all chained 
 objects are protected by design, or use REPROTECT.

 Cheers,
 Simon


 Regards
 Saptarshi

   [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel