[Rd] 'parallel' package changes '.Random.seed'
Hi, I've implemented parallelization in one of my packages using the 'parallel' package -- many thanks for providing it! In my package I'm importing 'parallel' and so added it to the DESCRIPTION file's 'Import:' tag and also added a 'importFrom(parallel, ...)' statement in the NAMESPACE file. Parallelization works nicely, but my package no longer passes any parts of its (unparallelized) checks that depends on random number generation, e.g., the simulated data in the check suite are no longer the same as before parallelization was added. This seems to be due to 'parallel' changing '.Random.seed' when loading its name space: set.seed(1) rs1 - .Random.seed rnorm(1) [1] -0.6264538 set.seed(1) rs2 - .Random.seed identical(rs1, rs2) [1] TRUE loadNamespace(parallel) environment: namespace:parallel rs3 - .Random.seed identical(rs1, rs3) [1] FALSE rnorm(1) [1] -0.3262334 set.seed(1) rs4 - .Random.seed identical(rs1, rs4) [1] TRUE I've taken a look at the 'parallel' source code, and in a few places a call to 'runif(1)' is issued. So, what effectively seems to happen when 'parallel' is loaded is set.seed(1) runif(1) [1] 0.2655087 rnorm(1) [1] -0.3262334 which reproduces the above. But is this really necessary? And more importantly (at least to me): Can it somehow be avoided? The current state of affairs is a bit unfortunate, since it implies that a user just by loading the new parallelized version of my package can no longer reproduce any subsequent results depending on random number generation (unless a call to 'set.seed' was issued *after* attaching my package). I'd be most grateful for any help that you're able to provide here. Many thanks! Kind regards, Henric Winell sessionInfo() R Under development (unstable) (2014-01-26 r64897) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'parallel' package changes '.Random.seed'
Comments below. On 2014-03-06 11:17, Henric Winell wrote: Hi, I've implemented parallelization in one of my packages using the 'parallel' package -- many thanks for providing it! In my package I'm importing 'parallel' and so added it to the DESCRIPTION file's 'Import:' tag and also added a 'importFrom(parallel, ...)' statement in the NAMESPACE file. Parallelization works nicely, but my package no longer passes any parts of its (unparallelized) checks that depends on random number generation, e.g., the simulated data in the check suite are no longer the same as before parallelization was added. This seems to be due to 'parallel' changing '.Random.seed' when loading its name space: set.seed(1) rs1 - .Random.seed rnorm(1) [1] -0.6264538 set.seed(1) rs2 - .Random.seed identical(rs1, rs2) [1] TRUE loadNamespace(parallel) environment: namespace:parallel rs3 - .Random.seed identical(rs1, rs3) [1] FALSE rnorm(1) [1] -0.3262334 set.seed(1) rs4 - .Random.seed identical(rs1, rs4) [1] TRUE I've taken a look at the 'parallel' source code, and in a few places a call to 'runif(1)' is issued. So, what effectively seems to happen when 'parallel' is loaded is set.seed(1) runif(1) [1] 0.2655087 rnorm(1) [1] -0.3262334 Some digging reveals that this is due to no port number for the socket connection being set by default, in which case 'parallel' picks a random port in the 11000-11999 range using 'runif(1L)'. So, by setting R_PARALLEL_PORT the '.Random.seed' object is no longer touched: Sys.setenv(R_PARALLEL_PORT = 11500) set.seed(1) rs1 - .Random.seed loadNamespace(parallel) environment: namespace:parallel rs2 - .Random.seed identical(rs1, rs2) [1] TRUE This is handled in the 'initDefaultClusterOptions' function in 'snow.R', where line 88 has port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300)%%1) It seems to me that we can tread more carefully here. I've attached a trivial patch that 1. Checks if '.Random.seed' exists 2. If TRUE: a) save '.Random.seed' b) make the call above c) reset '.Random.seed' to its state in a) If FALSE: a) make the call above b) remove '.Random.seed' In due course I hope someone is interested enough to review it. Henric Winell which reproduces the above. But is this really necessary? And more importantly (at least to me): Can it somehow be avoided? The current state of affairs is a bit unfortunate, since it implies that a user just by loading the new parallelized version of my package can no longer reproduce any subsequent results depending on random number generation (unless a call to 'set.seed' was issued *after* attaching my package). I'd be most grateful for any help that you're able to provide here. Many thanks! Kind regards, Henric Winell sessionInfo() R Under development (unstable) (2014-01-26 r64897) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Index: snow.R === --- snow.R (revision 65125) +++ snow.R (working copy) @@ -84,8 +84,16 @@ rscript - file.path(R.home(bin), Rscript) port - Sys.getenv(R_PARALLEL_PORT) port - if (identical(port, random)) NA else as.integer(port) -if (is.na(port)) -port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1) +if (is.na(port)) { +if (exists(.Random.seed, envir = .GlobalEnv, inherits = FALSE)) { +seed - get(.Random.seed, envir = .GlobalEnv, inherits = FALSE) +port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1) +assign(.Random.seed, seed, envir = .GlobalEnv, inherits = FALSE) +} else { +port - 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1) +rm(.Random.seed, seed, envir = .GlobalEnv, inherits = FALSE) +} +} options - list(port = as.integer(port), timeout = 60 * 60 * 24 * 30, # 30 days master = Sys.info()[nodename], __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'parallel' package changes '.Random.seed'
On 06/03/2014 10:17, Henric Winell wrote: Hi, I've implemented parallelization in one of my packages using the 'parallel' package -- many thanks for providing it! In my package I'm importing 'parallel' and so added it to the DESCRIPTION file's 'Import:' tag and also added a 'importFrom(parallel, ...)' statement in the NAMESPACE file. Parallelization works nicely, but my package no longer passes any parts of its (unparallelized) checks that depends on random number generation, e.g., the simulated data in the check suite are no longer the same as before parallelization was added. This seems to be due to 'parallel' changing '.Random.seed' when loading its name space: set.seed(1) rs1 - .Random.seed rnorm(1) [1] -0.6264538 set.seed(1) rs2 - .Random.seed identical(rs1, rs2) [1] TRUE loadNamespace(parallel) environment: namespace:parallel rs3 - .Random.seed identical(rs1, rs3) [1] FALSE rnorm(1) [1] -0.3262334 set.seed(1) rs4 - .Random.seed identical(rs1, rs4) [1] TRUE I've taken a look at the 'parallel' source code, and in a few places a call to 'runif(1)' is issued. So, what effectively seems to happen when 'parallel' is loaded is set.seed(1) runif(1) [1] 0.2655087 rnorm(1) [1] -0.3262334 which reproduces the above. But is this really necessary? Yes, in the places it is used. Two are to do with setting up parallel streams when called, and the other is only called if R_PARALLEL_PORT is unset. So set R_PARALLEL_PORT. But your presumptions are wrong: R is perfectly entitled to use its random number generator, as is other code running in the R interpreter. Once your call returns you cannot expect the session state to remain unchanged. And more importantly (at least to me): Can it somehow be avoided? The current state of affairs is a bit unfortunate, since it implies that a user just by loading the new parallelized version of my package can no longer reproduce any subsequent results depending on random number generation (unless a call to 'set.seed' was issued *after* attaching my package). I'd be most grateful for any help that you're able to provide here. Many thanks! Kind regards, Henric Winell sessionInfo() R Under development (unstable) (2014-01-26 r64897) See what the posting guide says about updating before posting Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] makepredictcall
An issue came up with the rms package today that makepredictcall would solve, and I was going to suggest it to the author. But looking in the help documents I couldn't find any reference to it. There is a manual page, but it does not give much aid in creating code for a new transformation function. Did I miss something? If not, I'd be willing to draft a paragraph about that which could be added to the extensions document. I figured it out, somehow, for the pspline function of the survival package. Submit such draft to ? The naresid function would be another useful addition. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Create dataframe in C from table and return to R
Hi , I am trying to create a dataframe in C and sebd it back to R. Can anyone point me to the part of the source code where it is doing , let me explain the problem I am having . My simple implementation is like this SEXP formDF() { SEXP dfm ,df , dfint , dfStr,lsnm; char *ab[3] = {aa,vv,gy}; int sn[3] ={99,89,12}; char *listnames[2] = {int,string}; int i; PROTECT(df = allocVector(VECSXP,2)); PROTECT(dfint = allocVector(INTSXP,3)); PROTECT(dfStr = allocVector(STRSXP,3)); PROTECT(lsnm = allocVector(STRSXP,2)); SET_STRING_ELT(lsnm,0,mkChar(int)); SET_STRING_ELT(lsnm,1,mkChar(string)); for ( i = 0 ; i 3; i++ ) { SET_STRING_ELT(dfStr,i,mkChar(ab[i])); INTEGER(dfint)[i] = sn[i]; } SET_VECTOR_ELT(df,0,dfint); SET_VECTOR_ELT(df,1,dfStr); setAttrib(df,R_NamesSymbol,lsnm); //PROTECT(dfm=LCONS(dfm,list3(dfm,R_MissingArg,mkFalse(; UNPROTECT(4); dfm = PROTECT(lang2(install(data.frame),df)); SEXP res = PROTECT(eval(dfm,R_GlobalEnv)); UNPROTECT(2) } It works fine but i want it the other way the output is print(result) int string 1 99 aa 2 89 vv 3 12 gy I want it in transposed . like dft - as.data.frame(t(result)) *Can I do the transpose it from C itself ? Which part of code I should look a*t . What My objective ? *Reading rows of a table and create a dataframe out of it . R is embedded in database so cannot call the odbc . Need to implement that part . Database gives me API only to get a whole row at once .* Thanks, Sandip [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] version numbers for CRAN submissions that give warnings/notes
It often happens that I submit a new revision of a package, say mypkg-1.0-10, from R-Forge to CRAN after running R CMD check locally and looking at the log files on R-Forge. But R-Forge has the devel checks disabled, and I get an email from CRAN pointing out some new warning or note I'm asked to correct. OK, I correct this and commit a new rev to R-Forge. But, is it still required to bump the version number to mypkg-1.0-11 before resubmitting to CRAN, even though mypkg-1.0-10 did not make it there? To do so means also modifying the DESCRIPTION, NEWS and mypkg-package.Rd files even for a minor warning or note. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Create dataframe in C from table and return to R
On 06/03/2014 1:47 PM, Sandip Nandi wrote: Hi , I am trying to create a dataframe in C and sebd it back to R. Can anyone point me to the part of the source code where it is doing , let me explain the problem I am having . My simple implementation is like this SEXP formDF() { SEXP dfm ,df , dfint , dfStr,lsnm; char *ab[3] = {aa,vv,gy}; int sn[3] ={99,89,12}; char *listnames[2] = {int,string}; int i; PROTECT(df = allocVector(VECSXP,2)); PROTECT(dfint = allocVector(INTSXP,3)); PROTECT(dfStr = allocVector(STRSXP,3)); PROTECT(lsnm = allocVector(STRSXP,2)); SET_STRING_ELT(lsnm,0,mkChar(int)); SET_STRING_ELT(lsnm,1,mkChar(string)); for ( i = 0 ; i 3; i++ ) { SET_STRING_ELT(dfStr,i,mkChar(ab[i])); INTEGER(dfint)[i] = sn[i]; } SET_VECTOR_ELT(df,0,dfint); SET_VECTOR_ELT(df,1,dfStr); setAttrib(df,R_NamesSymbol,lsnm); //PROTECT(dfm=LCONS(dfm,list3(dfm,R_MissingArg,mkFalse(; UNPROTECT(4); dfm = PROTECT(lang2(install(data.frame),df)); SEXP res = PROTECT(eval(dfm,R_GlobalEnv)); UNPROTECT(2) } It works fine but i want it the other way the output is print(result) int string 1 99 aa 2 89 vv 3 12 gy I want it in transposed . like dft - as.data.frame(t(result)) *Can I do the transpose it from C itself ? Which part of code I should look a*t . What My objective ? *Reading rows of a table and create a dataframe out of it . R is embedded in database so cannot call the odbc . Need to implement that part . Database gives me API only to get a whole row at once .* What you are asking for isn't a normal dataframe. Dataframe columns are vectors all of a type. You want the first row to be a string, the second row to be an integer. You can't do that with simple atomic columns, and you probably don't want to mess with the alternative (which is to have your columns be lists), because no user will know how to deal with that. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] makepredictcall
See the developer site, e.g. http://developer.r-project.org/model-fitting-functions.txt . That is where specialized info is (and this is specialized). On 06/03/2014 18:19, Therneau, Terry M., Ph.D. wrote: An issue came up with the rms package today that makepredictcall would solve, and I was going to suggest it to the author. But looking in the help documents I couldn't find any reference to it. There is a manual page, but it does not give much aid in creating code for a new transformation function. Did I miss something? If not, I'd be willing to draft a paragraph about that which could be added to the extensions document. I figured it out, somehow, for the pspline function of the survival package. Submit such draft to ? The naresid function would be another useful addition. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] version numbers for CRAN submissions that give warnings/notes
On 06/03/2014 2:39 PM, Michael Friendly wrote: It often happens that I submit a new revision of a package, say mypkg-1.0-10, from R-Forge to CRAN after running R CMD check locally and looking at the log files on R-Forge. But R-Forge has the devel checks disabled, and I get an email from CRAN pointing out some new warning or note I'm asked to correct. OK, I correct this and commit a new rev to R-Forge. But, is it still required to bump the version number to mypkg-1.0-11 before resubmitting to CRAN, even though mypkg-1.0-10 did not make it there? To do so means also modifying the DESCRIPTION, NEWS and mypkg-package.Rd files even for a minor warning or note. That sounds like a question about CRAN policy, so I think you'll need to write to c...@r-project.org for an answer. But I would assume it could cause confusion if you submitted another identically named tarball, and I'd recommend bumping the version number. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] version numbers for CRAN submissions that give warnings/notes
On 06/03/2014 20:22, Duncan Murdoch wrote: On 06/03/2014 2:39 PM, Michael Friendly wrote: It often happens that I submit a new revision of a package, say mypkg-1.0-10, from R-Forge to CRAN after running R CMD check locally and looking at the log files on R-Forge. But R-Forge has the devel checks disabled, and I get an email from CRAN pointing out some new warning or note I'm asked to correct. So do as the CRAN policies ask, and check with R-devel locally (or on winbuilder). CRAN does not run R-Forge and suggestions should be made to its management. OK, I correct this and commit a new rev to R-Forge. But, is it still required to bump the version number to mypkg-1.0-11 before resubmitting to CRAN, even though mypkg-1.0-10 did not make it there? To do so means also modifying the DESCRIPTION, NEWS and mypkg-package.Rd files even for a minor warning or note. Not really: much more courteous to follow the policies and get it right in the first place. That sounds like a question about CRAN policy, so I think you'll need to write to c...@r-project.org for an answer. But I would assume it could cause confusion if you submitted another identically named tarball, and I'd recommend bumping the version number. That's the correct advice. CRAN does not in general currently insist that each submission has a new number, but enough maintainers get confused that it is recommended (and for maintainers with a track record of confusion, insisted on). -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] A question about multiple(?) out of order ReleaseObject
Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Regards Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A question about multiple(?) out of order ReleaseObject
Saptarshi, R_PreserveObject and R_ReleaseObject are, as far as I know, intended for the rare situations where you need to maintain a SEXP in C that is not pointed to by any R level symbols, and across e.g. .Calls. If you are returning an object to R (v in this case) and intend to do .Call(cfunc2, v) later, there is no benefit at all to having called R_PreserveObject on it. The only case where your object would lose protection (v and all other R symbols being rm'd/going out of scope) would also cause to to lose your only reference to the SEXP, so by R_Preserve'ing all you will have done is create an unreachable protected pointer. If you are keeping a static pointer to the SEXP down in C code that R can't see, then R_PreserveObject would be appropriate, but the situations where doing that is a good idea are rare (though they do exist). HTH, ~G On Thu, Mar 6, 2014 at 2:32 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Regards Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A question about multiple(?) out of order ReleaseObject
Hello, However, I do need some sort of protection. (pseudo code) SEXP a = Rf_allocVector(STRSXP,) protect a for i = 1 to length of vector SET_STRING_ELT(a,i, Rf_mkChar(...)) end unprotect a return a I _need _that protect because in the for loop i also call some R functions and need the object 'a' to be protected. However, as I pointed out, 1. I replaced protect by PreserveObject 2. remove the unprotect word I can guarantee, that some time later ReleaseObject will be called on 'a'. So ultimately, whether this good design or not, the question remains, given that 'a' is the in precious list, and 'a' is assigned to 'v', if ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and therefore not get GC'd. In rudimentary tests, it doesn't appear to cause seg faults. But is it safe? Cheers Thanks Saptarshi On Thu, Mar 6, 2014 at 2:48 PM, Gabriel Becker gmbec...@ucdavis.edu wrote: Saptarshi, R_PreserveObject and R_ReleaseObject are, as far as I know, intended for the rare situations where you need to maintain a SEXP in C that is not pointed to by any R level symbols, and across e.g. .Calls. If you are returning an object to R (v in this case) and intend to do .Call(cfunc2, v) later, there is no benefit at all to having called R_PreserveObject on it. The only case where your object would lose protection (v and all other R symbols being rm'd/going out of scope) would also cause to to lose your only reference to the SEXP, so by R_Preserve'ing all you will have done is create an unreachable protected pointer. If you are keeping a static pointer to the SEXP down in C code that R can't see, then R_PreserveObject would be appropriate, but the situations where doing that is a good idea are rare (though they do exist). HTH, ~G On Thu, Mar 6, 2014 at 2:32 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Regards Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Repost: (apologies for HTML post) A question about multiple(?) out of order ReleaseObject
Apologies, I am resending this because my emails seem to go in HTML form. Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Regards Saptarshi There was a follow up in a subsequent email Hello, However, I do need some sort of protection. (pseudo code) SEXP a = Rf_allocVector(STRSXP,) protect a for i = 1 to length of vector SET_STRING_ELT(a,i, Rf_mkChar(...)) end unprotect a return a I _need _that protect because in the for loop i also call some R functions and need the object 'a' to be protected. However, as I pointed out, 1. I replaced protect by PreserveObject 2. remove the unprotect word I can guarantee, that some time later ReleaseObject will be called on 'a'. So ultimately, whether this good design or not, the question remains, given that 'a' is the in precious list, and 'a' is assigned to 'v', if ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and therefore not get GC'd. In rudimentary tests, it doesn't appear to cause seg faults. But is it safe? Cheers Thanks Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Many apologies: last post: A question about multiple(?) out of order ReleaseObject
Apologies, I am resending this because my emails seem to go in HTML form. (I haven't as yet figured gmail web interface) Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Regards Saptarshi There was a follow up in a subsequent email Hello, However, I do need some sort of protection. (pseudo code) SEXP a = Rf_allocVector(STRSXP,) protect a for i = 1 to length of vector SET_STRING_ELT(a,i, Rf_mkChar(...)) end unprotect a return a I _need _that protect because in the for loop i also call some R functions and need the object 'a' to be protected. However, as I pointed out, 1. I replaced protect by PreserveObject 2. remove the unprotect word I can guarantee, that some time later ReleaseObject will be called on 'a'. So ultimately, whether this good design or not, the question remains, given that 'a' is the in precious list, and 'a' is assigned to 'v', if ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and therefore not get GC'd. In rudimentary tests, it doesn't appear to cause seg faults. But is it safe? Cheers Thanks Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A question about multiple(?) out of order ReleaseObject
Saptarshi, a will be protected throughout the entirety of any r functions you call in the loop. Do you have evidence that this is not the case? The protect stack is last on first off, so assuming balance, even if UNPROTECT() is called underneath the R functions you are calling, it won't touch a, only things that function put onto the stack after a. There is still no reason to use PreserveObject/ReleaseObject. HTH, ~G ~G On Thu, Mar 6, 2014 at 3:56 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Hello, However, I do need some sort of protection. (pseudo code) SEXP a = Rf_allocVector(STRSXP,) protect a for i = 1 to length of vector SET_STRING_ELT(a,i, Rf_mkChar(...)) end unprotect a return a I _need _that protect because in the for loop i also call some R functions and need the object 'a' to be protected. However, as I pointed out, 1. I replaced protect by PreserveObject 2. remove the unprotect word I can guarantee, that some time later ReleaseObject will be called on 'a'. So ultimately, whether this good design or not, the question remains, given that 'a' is the in precious list, and 'a' is assigned to 'v', if ReleaseObject is called on 'a', will 'a 'still be assigned to 'v' and therefore not get GC'd. In rudimentary tests, it doesn't appear to cause seg faults. But is it safe? Cheers Thanks Saptarshi On Thu, Mar 6, 2014 at 2:48 PM, Gabriel Becker gmbec...@ucdavis.eduwrote: Saptarshi, R_PreserveObject and R_ReleaseObject are, as far as I know, intended for the rare situations where you need to maintain a SEXP in C that is not pointed to by any R level symbols, and across e.g. .Calls. If you are returning an object to R (v in this case) and intend to do .Call(cfunc2, v) later, there is no benefit at all to having called R_PreserveObject on it. The only case where your object would lose protection (v and all other R symbols being rm'd/going out of scope) would also cause to to lose your only reference to the SEXP, so by R_Preserve'ing all you will have done is create an unreachable protected pointer. If you are keeping a static pointer to the SEXP down in C code that R can't see, then R_PreserveObject would be appropriate, but the situations where doing that is a good idea are rare (though they do exist). HTH, ~G On Thu, Mar 6, 2014 at 2:32 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Regards Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Gabriel Becker Graduate Student Statistics Department University of California, Davis -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A question about multiple(?) out of order ReleaseObject
On Mar 6, 2014, at 5:32 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) yes 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' yes (assuming the binding to v still exists at that point). Note, however, that is such a case you R_PreserveObject() is pointless since you don't need to protect it on exit (that's in fact the convention - return results are never protected). Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Normally, you should not keep the result of a function protected since it means you *have* to guarantee the unprotect at a later point. That is in general impossible to guarantee unless you have another object that is holding a reference that will be cleared by an explicitly registered finalizer. So unless that is the case, you are creating an explicit leak = bad. If you don't have as stack-like design, you can always use explicitly managed object for the lifetime (personally, I prefer that) since all chained objects are protected by design, or use REPROTECT. Cheers, Simon Regards Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A question about multiple(?) out of order ReleaseObject
Hello Simon, Thanks much for the replies. It makes sense now and I'm on the right track. I'll explain why this might happen. I've written a package caller rterra [1] which essentially allows the R user to write extensions in Lua. The extensions are JIT compiled via LuaJIT. For deterministic performance the user can write their extension in Terra. To make memory management easier for the extension writer, i wanted auto memory management. Consider the following code sample which calls an extension written in Lua to compute a sum by traversing some JSON 1. In R, the extension, is called as l - terra(computeCountsByDayAndCountry, values, c(2013-01-01,2014-12-31)) 2. In Lua, the code looks like function computeCountsByDayAndCountry(jsonstr,dateRange) local x,y = R.Robj(jsonstr),R.Robj(dateRange) local dstart,dend = ffi.string(y[0]),ffi.string(y[1]) local f = R.Robj{type='str', length =#x} R.autoProtect(f) -- line 'A' for index = 0, #x-1 do local ok,jc = pcall(cjson.decode,ffi.string(x[index][0])) if ok then f[index] = cjson.encode( _computeSearchCountsCountry(jc,dstart,dend)) else f[index] = cjson.encode({}) end end return f end When this returns, the SEXP contained in 'f' is associated with 'l' (in step 1) LuaJIT has garbage collection. At http://luajit.org/ext_ffi_api.html, when 'f' above (line 'A') get garbage collected, a finalizer is run. The call to autoProtect does 1. calls PReserveObject on f.sexp 2. sets the finalizer to call ReleaseObject. When the user calls an extension written in Lua again, 'f' will get garbage collected by Lua (assuming it has no references in Lua). And yes, there will be a mem leak, *iff* the user *never* calls an extension written in Lua again, otherwise no. And of course, the user can not call autoProtect, but can manage it themselves, i.e. using R.protect and R.unprotect. Thanks for the insight http://people.mozilla.org/~sguha/blog/2013/08/01/rterra_first_post.html Cheers Saptarshi On Thu, Mar 6, 2014 at 7:05 PM, Simon Urbanek simon.urba...@r-project.org wrote: On Mar 6, 2014, at 5:32 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello, This is a question that probably reveals my lack of understanding. In a C function (call it cfunc), i created a SEXP, called S, and then called R_PreserveObject on S. I returned the SEXP to the calling R function (call it rfunc). Note, I didn't call R_ReleaseObject on S. v - .Call(cfunc) So, are the following statements correct 1. S is 'doubly' protected from the GC by being associated both with 'v' and because it has been added to the precious list (via a call to R_PreserveObject without ReleaseObject being called) yes 2. I have another C function called cfunc2. In cfunc2, I call R_ReleaseObject on S. S , however, is still protected from the GC, because it is associated with 'v' yes (assuming the binding to v still exists at that point). Note, however, that is such a case you R_PreserveObject() is pointless since you don't need to protect it on exit (that's in fact the convention - return results are never protected). Is (1) and (2) correct? I have not used R_protect/unprotect, because if I return from cfunc without the equivalent number of unprotects, i get 'unbalanced stack' warnings. I'd rather not have to worry about that because i intend to balance it later. Normally, you should not keep the result of a function protected since it means you *have* to guarantee the unprotect at a later point. That is in general impossible to guarantee unless you have another object that is holding a reference that will be cleared by an explicitly registered finalizer. So unless that is the case, you are creating an explicit leak = bad. If you don't have as stack-like design, you can always use explicitly managed object for the lifetime (personally, I prefer that) since all chained objects are protected by design, or use REPROTECT. Cheers, Simon Regards Saptarshi [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel