Re: [Rd] importing explicitly declared missing values in read.spss (foreign)
First of all, apologies if you feel misquoted, I was only trying to keep things clear. Now, I have installed and tried the new version of the package and it works perfectly. It does exactly what it should do. I tested it on some huge SPSS's sample files which contained a lot of variables with several types of missingness, and all missing values were correctly converted to R NA values. I find this a very big improvement, and it makes the transition from spss to R even easier. Thank you very much! Prof Brian Ripley wrote: I've put up an experimental version at http://www.stats.ox.ac.uk/pub/R/foreign_0.8-28.1.tar.gz See the new 'use.missings' argument. It does what I think should happen in your example and the other one I tried, but more experience would be helpful. On Mon, 4 Aug 2008, Jeroen Ooms wrote: Please don't silently excise context -- see the posting guide for the rights of posters to be quoted fairly (and your usage of my posting fails to be fair). Prof Brian Ripley wrote: From the messages you get I do not believe this is a recent version of read.spss (message 2 no longer appears)... I am sorry you are right here, I was using an outdated version of foreign. I have updated my packages. My current version is now R version 2.7.1 (2008-06-23) with foreign_0.8-28. I have experimented importing some spss datafiles, mostly from the sample data files that are included with SPSS. Most of these files do not generate any warnings, so I am not sure this is related to the missingness. However, the problem of read.spss() not returning any information on missingness persists in all of these datafiles. Prof Brian Ripley wrote: All that is 'harmfull' is that you are not told that value labels NA and NAP were to be regarded as 'missing' in SPSS. We've no idea whether if would be a more or less egregious choice to map them to R's NA, and certainly are not in a position to assert 'far less harmfull' in general. Of course the 'least harmfull' behavior of the function completely depends on the data and the user's intentions. I was explicitly suggesting making the mapping of missing values to NA's optional, to give users who consider this appropriate, the option to replace these missings. I do not claim this to be the best default behavior, just a very useful feature. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18829484.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Random number generation
Hi All, I have a few queries regarding Random Number generation in R. according to the help(Random.User) i defined my own functions for user_unif_rand and user_norm_rand (uniform and normal distribution) But what i figured out was even when i call rexp,rpois,rgeom and other distributions they were routed via user_unif_rand. 1. Does this mean that for all types of distribution it generates an Uniform distribution and transforms to the requested type? Also surprisingly even rnorm which i hoped to route via user_norm_rand was calling user_unif_rand... rnorm(1) inside user_unif_rand//printf and inside user_unif_rand [1] 0.5973648 Please help me out understanding this... 2. Our goal is to link with an vectorized Random number generator library for our multi-core architecture. So is it enough if we define user_unif_rand function alone and will it take care of all distributions? Thanks in Anticipation, R. Subramanian [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: 20% speed up of which() with two-character mod
HenrikB == Henrik Bengtsson [EMAIL PROTECTED] on Mon, 4 Aug 2008 21:14:12 -0700 writes: HenrikB Hi, HenrikB I just want to do a follow up this very simple HenrikB fix/correction/speedup/cleanup of the base::which() function. Here is HenrikB a diff: HenrikB diff src/library/base/R/which.R which.R HenrikB 21c21 HenrikB wh - seq_along(x)[ll - x !is.na(x)] HenrikB --- wh - seq_along(x)[x !is.na(x)] HenrikB 25c25 HenrikB names(wh) - names(x)[ll] HenrikB --- names(wh) - names(x)[wh] HenrikB FYI, the 'll' variable is not used elsewhere. I've been going through HenrikB this modifications several times and I cannot see any side effects. HenrikB Could someone of R core please commit this? I had added your proposition to my version of R-devel in order to commit it, and had wanted to do my own performance tests under different scenarios, but I had forgotten / postponed it. {I have more such things , notably the help.request() from Kate Mullen -- with quite a few of my own changes, not quite finished ... that will have to wait for after useR!2008 ..} In fact, it seems is pretty obvious that the version with [wh] instead of [ll] should be faster in most cases, and never slower, and so I do commit it now. Thank you Henrik, for the reminder. Martin HenrikB BTW, when one report diff:s, do you prefer to get it with or without HenrikB context information, e.g. -C 3? {My exact preference would depend on the size / style of the patch itself. It does not really matter, and as a general rule, I'd personally prefer '-u' (unified diffs which include context)} HenrikB /Henrik HenrikB On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL PROTECTED] wrote: On Thu, 10 Jul 2008, Henrik Bengtsson wrote: Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck CURRENT CODE: which - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[ll - x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 - base::which; environment(which1) - globalenv(); # Needed? N - 1e6; set.seed(0xbeef); x - sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) - seq_along(x); B - 10; t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); print(t1/t2); ## usersystem elapsed ## 1.283186 1.052632 1.25 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 HenrikB __ HenrikB R-devel@r-project.org mailing list HenrikB https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Random number generation
Please don't cross-post. This reply is going to R-devel only. On 8/5/2008 8:47 AM, subramanian R wrote: Hi All, I have a few queries regarding Random Number generation in R. according to the help(Random.User) i defined my own functions for That's help(Random.user)... user_unif_rand and user_norm_rand (uniform and normal distribution) But what i figured out was even when i call rexp,rpois,rgeom and other distributions they were routed via user_unif_rand. 1. Does this mean that for all types of distribution it generates an Uniform distribution and transforms to the requested type? Also surprisingly even rnorm which i hoped to route via user_norm_rand was calling user_unif_rand... rnorm(1) inside user_unif_rand//printf and inside user_unif_rand [1] 0.5973648 Please help me out understanding this... I think you didn't do things properly, but you didn't show us what you did. When I run the sample code in help(Random.user), adding an Rprintf() call to the user_norm_rand function, I see it being called. Duncan Murdoch 2. Our goal is to link with an vectorized Random number generator library for our multi-core architecture. So is it enough if we define user_unif_rand function alone and will it take care of all distributions? Thanks in Anticipation, R. Subramanian [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] literate programming
I'm working on the next iteration of coxme. (Rather slowly during the summer). This is the most subtle code I've done in S, both mathematically and technically, and seems a perfect vehicle for the literate programming paradym of Knuth. The Sweave project is pointed at S output however, not source code. I would appreciate any pointers to an noweb type client that was R-aware. Other suggestions are welcome as well. At the end of the day I'd like to have a good user guide, technical reference, and solid enough code documentation that others can begin to participate as well. (Retirement in 10 years -- I don't expect to maintain this forever!) Terry Therneau [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Adding .pdf files to package
Deal all, new as I am to developing packages for R-Project, I apologize on beforehand for questions that are too obvious. I am trying to 'add' a PDF document containing some detailed information to a package. The way I understand the Rexts.pdf document, I should add my .PDF document to the /inst/doc/ folder, and links to the files should be build automatically. However, after building it on MacOSX (10.4) using 'R CMD build' and checking it using 'R CMD check', no links seems to be present. All checks are OK, but no vignette or whatever appears in the help-files of the package. Clearly, I'm missing something. I would really appreciate some pointers on how to integrate an 'additional' pdf file to my package. Thanks in advance, Rense - - -- --- - Rense Nieuwenhuis +31 6 481 05 683 www.rensenieuwenhuis.nl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] patchDVI update: SyncTex support
A while ago I wrote a package called patchDVI, that allowed reverse-search from the YAP .dvi previewer (or others) to jump directly to the .Rnw source of an Sweave document. Recently support for reverse search has been integrated into pdflatex (through SyncTex). There aren't many PDF previewers on Windows that support this (only an unreleased version of SumatraPDF, as far as I know), but I believe Mac OSX previewers have supported it for some time, and there may be others on Linux or Windows too. I'd appreciate anyone who is using one of those previewers and who is interested in this to test the new code. It's available on http://www.stats.uwo.ca/faculty/murdoch/software/ (near the bottom of the page). Including the concordance info into a .pdf needs a patch to Sweave, and to Sweave.sty: Sweave should wrap the concordance in \Sconcordance{}, instead of \special{}. Sweave.sty should have this macro added: \newcommand{\Sconcordance}[1]{% \ifx\pdfoutput\undefined% \csname newcount\endcsname\pdfoutput\fi% \ifcase\pdfoutput\special{#1}% \else\immediate\pdfobj{#1}\fi} Improvements to this macro would also be appreciated. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] literate programming
Hi Terry, You can do this with Sweave (and something smart like emacs with ESS installed as your editor), but you have to work at it a little. The key is the fact that a couple of releases ago they added options for keep.source and expand. For example, you do the following: First, describe the various steps in the algorithm. (Unless you are defining functions to use later, you probably do not want to evaluate these.) extractParameters,eval=FALSE= # code here to get the parameters @ selectTrainingSet,eval=FALSE= # code here to split data into training and test sets @ trainModel,eval=FALSE= # code here to fit a model to training data @ testModel,eval=FALSE= # code here to see how well the model works @ Then you can put the pieces together, doing something like runSplits,keep.source=TRUE,expand=FALSE= for (i in 1:numberOfSplits) { extractParameters selectTrainingSet trainModel testModel } @ The expand=FALSE makes sure that the final report does not re-expand the lines of code in the displayed output, which allows you to focus on the structure of the algorithm. There are still two weaknesses compared to Knuth's original idea: [1] You cannot describe the overall algorithm first but wait until later to define the pieces. (Actually, I could be wrong about this; it just occurred to me that you might be able to manage this with yet another clever use of eval=FALSE, but I haven't tried that.) [2] The names that you assign to the code chunks do not appear in the report automatically, so you have to write text in front of them to make them show up. Without these, the references in the final piece do not necessarily make sense to the reader trying to follow the action. Best, Kevin Terry Therneau wrote: I'm working on the next iteration of coxme. (Rather slowly during the summer). This is the most subtle code I've done in S, both mathematically and technically, and seems a perfect vehicle for the literate programming paradym of Knuth. The Sweave project is pointed at S output however, not source code. I would appreciate any pointers to an noweb type client that was R-aware. Other suggestions are welcome as well. At the end of the day I'd like to have a good user guide, technical reference, and solid enough code documentation that others can begin to participate as well. (Retirement in 10 years -- I don't expect to maintain this forever!) Terry Therneau [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: 20% speed up of which() with two-character mod
Hi, thanks for this. I'll use unified diff next time, i.e. diff -u current.R new.R /Henrik On Tue, Aug 5, 2008 at 5:54 AM, Martin Maechler [EMAIL PROTECTED] wrote: HenrikB == Henrik Bengtsson [EMAIL PROTECTED] on Mon, 4 Aug 2008 21:14:12 -0700 writes: HenrikB Hi, HenrikB I just want to do a follow up this very simple HenrikB fix/correction/speedup/cleanup of the base::which() function. Here is HenrikB a diff: HenrikB diff src/library/base/R/which.R which.R HenrikB 21c21 HenrikB wh - seq_along(x)[ll - x !is.na(x)] HenrikB --- wh - seq_along(x)[x !is.na(x)] HenrikB 25c25 HenrikB names(wh) - names(x)[ll] HenrikB --- names(wh) - names(x)[wh] HenrikB FYI, the 'll' variable is not used elsewhere. I've been going through HenrikB this modifications several times and I cannot see any side effects. HenrikB Could someone of R core please commit this? I had added your proposition to my version of R-devel in order to commit it, and had wanted to do my own performance tests under different scenarios, but I had forgotten / postponed it. {I have more such things , notably the help.request() from Kate Mullen -- with quite a few of my own changes, not quite finished ... that will have to wait for after useR!2008 ..} In fact, it seems is pretty obvious that the version with [wh] instead of [ll] should be faster in most cases, and never slower, and so I do commit it now. Thank you Henrik, for the reminder. Martin HenrikB BTW, when one report diff:s, do you prefer to get it with or without HenrikB context information, e.g. -C 3? {My exact preference would depend on the size / style of the patch itself. It does not really matter, and as a general rule, I'd personally prefer '-u' (unified diffs which include context)} HenrikB /Henrik HenrikB On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL PROTECTED] wrote: On Thu, 10 Jul 2008, Henrik Bengtsson wrote: Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck CURRENT CODE: which - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[ll - x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 - base::which; environment(which1) - globalenv(); # Needed? N - 1e6; set.seed(0xbeef); x - sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) - seq_along(x); B - 10; t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); print(t1/t2); ## usersystem elapsed ## 1.283186 1.052632 1.25 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 HenrikB __ HenrikB R-devel@r-project.org mailing list HenrikB https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Adding .PDF files to a package
Deal all, new as I am to developing packages for R-Project, I apologize on beforehand for questions that are too obvious. I am trying to 'add' a PDF document containing some detailed information to a package. The way I understand the Rexts.pdf document, I should add my .PDF document to the /inst/doc/ folder, and links to the files should be build automatically. However, after building it on MacOSX (10.4) using 'R CMD build' and checking it using 'R CMD check', no links seems to be present. All checks are OK, but no vignette or whatever appears in the help-files of the package. Clearly, I'm missing something. I would really appreciate some pointers on how to integrate an 'additional' pdf file to my package. Thanks in advance, Rense Nieuwenhuis __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Adding .PDF files to a package
Rense Nieuwenhuis wrote: Deal all, new as I am to developing packages for R-Project, I apologize on beforehand for questions that are too obvious. I am trying to 'add' a PDF document containing some detailed information to a package. The way I understand the Rexts.pdf document, I should add my .PDF document to the /inst/doc/ folder, and links to the files should be build automatically. However, after building it on MacOSX (10.4) using 'R CMD build' and checking it using 'R CMD check', no links seems to be present. All checks are OK, but no vignette or whatever appears in the help-files of the package. Clearly, I'm missing something. I would really appreciate some pointers on how to integrate an 'additional' pdf file to my package. Thanks in advance, Rense Nieuwenhuis Hi Rense This is probably not the prettiest solution, and I'd be curious about better ones from people on this list, but here's one possibility: 1.) Put your PDF file foo.pdf into the inst/pdfs folder of your package. 2.) Create a file foo.Rnw with following content in inst/doc -- %\VignetteIndexEntry{The Foo Bar} \documentclass{article} \begin{document} \end{document} -- 3.) Create a Makefile in inst/doc, with contents like: --- all: foo bar bar: bar.tex pdflatex bar pdflatex bar foo: foo.tex cp -p ../pdfs/foo.pdf . --- Bw Wolfgang Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Adding .PDF files to a package
Wolfgang Huber wrote: Rense Nieuwenhuis wrote: Deal all, new as I am to developing packages for R-Project, I apologize on beforehand for questions that are too obvious. I am trying to 'add' a PDF document containing some detailed information to a package. The way I understand the Rexts.pdf document, I should add my .PDF document to the /inst/doc/ folder, and links to the files should be build automatically. However, after building it on MacOSX (10.4) using 'R CMD build' and checking it using 'R CMD check', no links seems to be present. All checks are OK, but no vignette or whatever appears in the help-files of the package. Clearly, I'm missing something. I would really appreciate some pointers on how to integrate an 'additional' pdf file to my package. Thanks in advance, Rense Nieuwenhuis Hi Rense This is probably not the prettiest solution, and I'd be curious about better ones from people on this list, but here's one possibility: 1.) Put your PDF file foo.pdf into the inst/pdfs folder of your package. 2.) Create a file foo.Rnw with following content in inst/doc -- %\VignetteIndexEntry{The Foo Bar} \documentclass{article} \begin{document} \end{document} -- 3.) Create a Makefile in inst/doc, with contents like: --- all: foo bar bar: bar.tex pdflatex bar pdflatex bar foo: foo.tex cp -p ../pdfs/foo.pdf . --- I'd like to add that doing so will frustrate those users that have come to expect that a vignette is reproducible and can be reproduced by the user through running Sweave on the source file. It will depend on your particular context how to best provide for that (e.g. by telling people how to build your PDF using other tools, or by explicitly advertising that this is not a reproducible document.) browseVignettes() and vignette() provide nice standardized ways of finding vignettes, and they are easily found in the index page of the package manual pages; perhaps a similarly standardized way of accessing such additional PDFs etc. without the above subversion of vignette infrastructure would be the best solution. Best wishes Wolfgang __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] literate programming
G'day Terry, On Tue, 5 Aug 2008 09:38:23 -0500 (CDT) Terry Therneau [EMAIL PROTECTED] wrote: I'm working on the next iteration of coxme. (Rather slowly during the summer). This is the most subtle code I've done in S, both mathematically and technically, and seems a perfect vehicle for the literate programming paradym of Knuth. The Sweave project is pointed at S output however, not source code. I would appreciate any pointers to an noweb type client that was R-aware. I would suggest you look at relax: http://www.wiwi.uni-bielefeld.de/~wolf/software/relax/relax.html Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel