[R] too large for hashing
Hello, I'm doing some analysis on a rather large data set. In this case, some simple commands are failing. For example, this one: x$eventtype - factor(x$eventtype) Error in unique.default(x) : length 1093574297 is too large for hashing ...I think this is a bug, because hashing should not be required for the factor function. Am I right? The whole column does not need to be hashed, only the unique keys. Sure, there is the potential to overflow the key register, but this error should be thrown only if that occurs, no? Cordially, Adam D. I. Kramer, Ph.D. Data Scientist, Facebook, Inc. akra...@fb.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] too large for hashing
Thanks for your response, Duncan. x$eventtype is a character vector (because the same hashing error occurred when I tried to read.table() in the first place specifying colClasses = c(..., factor, ...). x really is that long: dim(x) [1] 1093574297 12 ...the x$eventtype field has three unique values. (I'm currently using a workaround of making a numeric column based on a string of ifelse() and then setting class() - factor and then setting the labels manually.) --Adam On Thu, 5 Apr 2012, Duncan Murdoch wrote: On 05/04/2012 2:03 PM, Adam D. I. Kramer wrote: Hello, I'm doing some analysis on a rather large data set. In this case, some simple commands are failing. For example, this one: x$eventtype- factor(x$eventtype) Error in unique.default(x) : length 1093574297 is too large for hashing ...I think this is a bug, because hashing should not be required for the factor function. Am I right? The whole column does not need to be hashed, only the unique keys. Sure, there is the potential to overflow the key register, but this error should be thrown only if that occurs, no? It looks as though the error is coming when unique() tries to determine the unique levels in the argument, but really there's no way to answer your question without more information. What type of object is x$eventtype? It is really 1093574297 elements long? How many unique values does it have? Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R's unfortunate treatment of X11 failures
Hi, I use R in a terminal environment on a linux box, using X11 for graphics, often tunnelled to the terminal I'm using. When an internet connection dies (say, if the ssh connection dies when forwarding), R usually gives a scary warning like Error: X11 fatal IO error: please save work and shut down R. That's all well and good. Then, if I ignore it and try to plot() something, R dies quite ungracefully, producing (e.g.) something like this: R: ../../src/xcb_io.c:385: _XAllocID: Assertion `ret != inval_id' failed. Aborted ...and returning me to the shell. If I have not saved, I have lost my work. Clearly, this is my fault--I ignored the warning. However, I just had the above happen to me without my being warned...I lost some data, but not a lot. I am working on creating a reproducible case currently. That said, in the meantime, I would suggest a broader fix. I'm sure we can agree that R aborting due to failed assertions like this are pretty unfortunate. In the case of X11 failures like this, however, there is an alternative: Print an error message and take a preventative action. Error: X11 fatal IO error: X11 device disabled. ...and then call dev.off(). Indeed, if I just run dev.off() when I see this error, nothing about R seems corrupt at all. I can start a new X11 window with, X11(localhost:10) and things (such as plotting) function fine. Cordially, Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-SIG-Mac] How to interrupt an R process that hangs
+1--this is the single most-annoying issue with R that I know of. My usual solution, after accomplishing nothing as R spins idly for a couple hours, is to kill the process and lose any un-saved work. save.history() is my friend, but is a big delay when you work with big data sets as I do, so I don't run it after every command. I have cc'd r-help here, however, because I experience this problem with non-OSX R as well...when I run it in Linux or from the OSX command-line (I compile R for Darwin without aqua/R-framework), the same thing happens. Is there some way around this? Is this a known problem? Google searching suggests no solution, timeline, or anything, but the problem has been annoying users for at least twelve years: http://tolstoy.newcastle.edu.au/R/help/9704/0151.html Cordially, Adam On Mon, 15 Mar 2010, Matthew Keller wrote: HI all, Apologies for this question. I'm sure it's been asked many times, but despite 20 minutes of looking, I can't find the answer. I never use the GUI, I use emacs, but my postdoc does, so I don't know what to tell her about the following: Occasionally she'll mess up in her code and cause R to hang indefinitely (e.g., R is trying to do something that will take days). In these situations, is there an option other than killing R (and the work you've done on your script to that point)? Thank you, Matthew Keller -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com ___ R-SIG-Mac mailing list r-sig-...@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parallel computing--help
Hi Sayan, Just stumbled upon this email, three months old, since I was having the same problem, and noticed you hadn't been answered. I was able to solve this problem by noting that the snow library was not installed on the nodes--it has to be installed everywhere. So, you need to ssh to the ip to another computer on the network , then be sure the same version of R and the snow package are installed on that computer, and THEN it should work. Good luck! --Adam Hey R Users I am trying to to implement a linux snow sockets parallelism in R, I am pretty new to parallel computing Here is the problem I am facing Whenever I am doing newSOCKnode(localhost) Its working fine But whenever I do newSOCKnode(ip to another computer on the network ) It throws up the following error Fatal error: cannot open file '/home/t136/R/i486-pc-linux-gnu-library/2.9/snow/RSOCKnode.R': No such file or directory R is stuck I have to force quit it with CTRL C . Now t136 is the machine I am working on and manually I have checked that there exists a file RSOCKnode.R on the specified directory SSH is working fine as I have checked system(ssh -l me 'ip to another computer on the network' ls) Works fine and I am getting the list of files on the other computer on my machine Please help! Sayan Dasgupta __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Command-line arguments and --interactive
On Tue, 10 Nov 2009, Duncan Murdoch wrote: --interactive tells R that there is a human producing the input stream, so it can ask questions and expect them to be answered. In your experiments with it, your input stream was the pipe holding the output of echo, and R got confused because that pipe wouldn't answer its question. I see. That makes sense, and thank you. Your problem is that you want an input stream that starts out from your fixed code and then switches to your shell's stdin. Correct. Or, you might consider it an ability to add a command or two to the effective end of .Rprofile. I think you can do that on some systems by saving your input to a file then concatenating it to stdin, e.g. something like this: echo '10*5; scan()' test.R cat test.R - | R --interactive I don't know if there's a way to do this in one line, and I'd expect some oddities. Creating a new file progammatically is difficult. Another solution, suggested by peterdc in the #R irc channel, was to run R itself and then use the system() command to complete the prior-to-launching-R task...that is the solution I'm going with, but thanks very much for your help! --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hand-crafting an .RData file
Thanks as always for a very helpful response. I'm now loading a few million rows in only a few seconds. Cordially, Adam Kramer On Mon, 9 Nov 2009, Prof Brian Ripley wrote: The R 'save' format (as used for the saved workspace .RData) is described in the 'R Internals' manual (section 1.8). It is intended for R objects, and you would first have to create one[*] of those in your other application. That seems a lot of work. The normal way to transfer numeric data between applications is to write a binary file: R can read such files with readBin(), and it also has wrappers/C-code to read a number of commmon binary data formats (e.g. those from SPSS). With character data there are more issues (and more formats, see also readChar()), but load() is not particularly fast for those. Ultimately the R functions pay a performance price for their flexibility so hand-crafted C code to read the format can be worthwhile: but see the comments below about whether I/O speed is that important. [*] the 'save' format is a serialization of a single R object, even if you save many objects, since the object(s) are combined into a pairlist. On Sun, 8 Nov 2009, Adam D. I. Kramer wrote: Hello, I frequently have to export a large quantity of data from some source (for example, a database, or a hand-written perl script) and then read it into R. This occasionally takes a lot of time; I'm usually using read.table(filename,comment.char=,quote=) to read the data once it is written to disk. Specifying colClasses and nrows will usually help. To read from a database, packages such as RODBC use binary data transfer: with suitable tuning this can be fast. However, I *know* that the program that generates the data is more or less just calling printf in a for loop to create the csv or tab-delimited file, writing, then having R parse it, which is pretty inefficient. Instead, I am interested in figuring out how to write the data in .RData format so that I can load() it instead of read.table() it. Without more details it is hard to say if it is inefficient. read.table() can read data pretty fast (millions of items per second) if used following the hints in the 'R Data' manual. See e.g. https://stat.ethz.ch/pipermail/r-devel/2004-December/031733.html Almost anything non-trivial one might do with such data is much slower. The trend is to write richer (and slower to read) data formats. Trolling the internet, however, has not suggested anything about the specification for an .RData file. Could somebody link me to a specification or some information that would instruct me on how to construct a .RData file (either compressed or uncompressed)? Also, I am open to other suggestions of how to get load()-like efficiency in some other way. Many thanks, Adam D. I. Kramer -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Command-line arguments and --interactive
Hello, I am interested in passing a command or two to R on the command line. The desired behavior is for R to run these commands first, and then begin an interactive session. For example: $ R -e 'foo - read.csv(/tmp/foo.csv)' ...which would launch R and execute that command, so when I return from getting coffee, foo would be loaded. I know that if I put this command into my .Rprofile, it will execute as desired, but I'm looking for something a bit more modular. I have tried the following: $ echo '10*5' | R ...fails unless I specify --save --no-save or --vanilla. I choose the latter; R prints 50 and quits...but I do not want it to quit! $ echo '10*5' | R --interactive ...fails in an interesting way: I get an infinite loop of Save workspace image? [y/n/c]: Save workspace image? [y/n/c]:, etc. I have to kill R in another window. This also happens with $ echo '10*5; scan()' | R --interactive ...even though I'd expect scan() to prompt for input. If I specify $ echo '10*5; scan()' | R --interactive --vanilla ...then R just prints 50 and quits. Is this a bug with interactive? The description of Force an interactive session is not informative enough for me to have any further guess as to what to do here, but my expectation is that it would make R not auto-quit. ...perhaps I don't know how to use --interactive properly? Could somebody point me on the right track? Googling for R --interactive is nigh untu useless. :-\ This behavior is present in R 2.9.2 and 2.10.0. Many thanks! Cordially, Adam Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hand-crafting an .RData file
Hello, I frequently have to export a large quantity of data from some source (for example, a database, or a hand-written perl script) and then read it into R. This occasionally takes a lot of time; I'm usually using read.table(filename,comment.char=,quote=) to read the data once it is written to disk. However, I *know* that the program that generates the data is more or less just calling printf in a for loop to create the csv or tab-delimited file, writing, then having R parse it, which is pretty inefficient. Instead, I am interested in figuring out how to write the data in .RData format so that I can load() it instead of read.table() it. Trolling the internet, however, has not suggested anything about the specification for an .RData file. Could somebody link me to a specification or some information that would instruct me on how to construct a .RData file (either compressed or uncompressed)? Also, I am open to other suggestions of how to get load()-like efficiency in some other way. Many thanks, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing loadings (next to each other)
Dear colleagues, I've been running some principal components analyses, which generate tables of loadings that I'm interested in looking at. print(f1$rot$load,cutoff=.4) is what I use, and it gives me what I want. However, I'm now interested in comparing these loadings across a few data sets. In other words, I would like R to match the loadings on rownames() and display them next to each other, e.g.: Loadings: PC 1PC 2PC 1PC 2PC 1PC 2 col10.400.450.90 col2 0.80 0.55 col3 0.77 0.70 0.42 ...I could certainly just cbind them together, but then I can't class them as loadings: class(x) - loadings Error in class(x) - loadings : cannot coerce type 'closure' to vector of type 'character' ...and I'm very interested in using the cutoff feature of print.loadings. Any suggestions? I could go in and alter print.loadings myself, but if there's an easier way let me know. Many thanks, -- Adam D. I. Kramer a...@uoregon.edu Ph.D. Candidate, Social and Personality Psycholgoy University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sem and nlm and ols instead of ml
Dear colleagues, Has anybody any experience using the sem package to fit structural equation models using a fitting function other than ML? I have heard tell that OLS may provide better estimates when using standardized matrices generated from small sample sizes, so I was interested in comparing the two for a few models. However, ML appears to be hard-coded into the source for sem...but maybe there is some way around this. So, if anybody has done this, has a hint as to how to do this, or would be able to say that this is perhaps way too much trouble to try, I would appreciate advice on the topic. -- Adam D. I. Kramer Ph.D. Candidate, Social and Personality Psychology University of Oregon a...@uoregon.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why na.rm=FALSE is the default
Dear Colleagues, I've been searching for a post or article or something which explains why having na.rm=FALSE or na.action=na.fail as the default is a better choice than TRUE or na.omit. I understand the basic argument: it does not make sense to average a nonexistance into an aggregate, and removing them implicitly leads to accidental pairwise deletion in some cases, and sum(x) / length(x) mean(x) (which many would find disturbing)...I'm just looking for a source to cite on this issue to support mimicking R's behavior in a database system's aggregating functions (sum, avg, var, etc.). Cordially, Adam Kramer Ph.D. Candidate, Social Psychology University of Oregon adik at uoregon dot edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregating multiple columns
Dear colleagues, Consider the following data frame: x - data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10)) ...it is my goal to aggregate x to compute a linear effect of order for each subject. So, ideally, result would be a vector containing a single number for each subject, representing the linear relationship between y and order. I first tried this: result - aggregate(x[1:2,],list(subject=x$subject), function (z) { lm(y ~ order, data=z)$coefficients[2] } ) ...because lm(y ~ order, data=x, subset=x$subject==1)$coefficients[2] would give me the correct term for subject 1 (i.e., that is the number I am actually looking for). However, when used on data frames, aggregate() aggregates every COLUMN in x _separately_ using FUN...while lm needs both columns *together.* ...I then turned to tapply, but that is useful only on atomic objects, and not data frames. I have two solutions, which I find inelegant and slow: 1) result - sapply(levels(factor(x$subject)), function(z) { lm(y ~ order, data=x, subset=subject==z)$coefficients[2]} ) ...this gets the job done, but is very slow. 2) result - c(); for (z in 1:nlevels(x$s2)) { result[z] - lm(y ~ order, data=x, subset=x$s2==levels(x$s2)[z])$coefficients[2] }; result - unlist(result); ...also does the job, but is also very slow. Is there a better solution? I miss the speed of tapply and aggregate; the example has only 100 rows and 10 subjects, but the actual data has many more of each. Cordially, Adam D. I. Kramer Ph.D. Candidate, Social and Personality Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.exp?
Dear Colleagues, The foreign library, ever-exceptionally useful, does not seem able to deal with this ancient SPSS ascii portable file I have with extension '.exp'. If anybody has familiarity with this filetype and could point me towards an R function that would allow me to get the data from this file, I would be much obliged! Or, even knowledge of the specification for these filetypes would be helpful, and I'll try my own hand at writing the import function. Cordially, Adam Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with step(): Fails to look for object$model
Hi, I'm playing around with stepwise regression, using the step function, and believe I have found a bug (or at least, a strong case for improvement): ex - data.frame(y=rnorm(100),x=rnorm(100)) l - lm(y ~ x, data=ex) step(l) [output is correct] rm(ex) step(l) Start: AIC=11.79 y ~ x Df Sum of Sq RSS AIC - x 1 0.120 108.221 9.900 none 108.100 11.789 Error in inherits(x, data.frame) : object ex not found ...ex is not found, so step fails. However, all of the necessary data to run the step function is present in l$model. I would also argue that step() *should* use l$model if at all possible, as it seems reasonable to expect that ex may undergo changes. Further, step() does not appear to test any other columns present in ex (even if direction=both is specified), unless they are specified in scope. If I am misunderstanding step() or if there is a good reason why it operates this way, please let me know! --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with step(): Fails to look for object$model
A fine answer, and indeed I am aware of the option to leave the model empty. I like the option of consistently not being protected from myself. That said, some other R functions (predict comes to mind) use $model if it exists (which it usually does, though needn't) but allow new (or other) data to be provided. So while I can't argue against trusting $call, I think there's a decent argument that the data held within the model is the most trustworthy and appropriate to use (though an explicitly named other data frame should surely be useful as well). In the meantime, I'll just modify l$call, and I thank you for that suggestion. --Adam On Sun, 15 Feb 2009, bill.venab...@csiro.au wrote: Without arguing you case I would point out that the data need not be there in l$model: l - lm(y ~ x, data=ex, model = FALSE) l Call: lm(formula = y ~ x, data = ex, model = FALSE) Coefficients: (Intercept)x 0.1310 -0.1736 l$model NULL Model objects are often huge simply because by default they squirrel away a copy of the original data inside themselves. One way to avoid memory problems can be only to keep this backup copy within the fitted model object only when you really need to do so, which is not all that often, in fact. I can see why step() (and allies) do not work the way you think they should. These functions use the call component of the fitted model object and modify that. And if the call component says your data are in the data frame 'ex', they take you seriously. They your word for it. We're not dealing with Microsoft software here, you know. Bill Venables http://www.cmis.csiro.au/bill.venables/ -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adam D. I. Kramer Sent: Sunday, 15 February 2009 12:44 PM To: r-help@r-project.org Subject: [R] Issue with step(): Fails to look for object$model Hi, I'm playing around with stepwise regression, using the step function, and believe I have found a bug (or at least, a strong case for improvement): ex - data.frame(y=rnorm(100),x=rnorm(100)) l - lm(y ~ x, data=ex) step(l) [output is correct] rm(ex) step(l) Start: AIC=11.79 y ~ x Df Sum of Sq RSS AIC - x 1 0.120 108.221 9.900 none 108.100 11.789 Error in inherits(x, data.frame) : object ex not found ...ex is not found, so step fails. However, all of the necessary data to run the step function is present in l$model. I would also argue that step() *should* use l$model if at all possible, as it seems reasonable to expect that ex may undergo changes. Further, step() does not appear to test any other columns present in ex (even if direction=both is specified), unless they are specified in scope. If I am misunderstanding step() or if there is a good reason why it operates this way, please let me know! --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tunnelling X for R graphics
Thanks very much for the reassurance. Really, I can just open a new X11 device on the same display, since the display (localhost:10) is effectively reconnected when I ssh in again. I'll reply again to this post if I find other parts of R working poorly after the disconnection. --Adam On Tue, 3 Feb 2009, Prof Brian Ripley wrote: To answer your basic question, you do need to shut down everything involivng X, that is X11() devices and the X11 dataeditor. If you do that (and graphics.off() will suffice for the first), you should be able to re-open an X11 device on another display (which is what presumably a new VNC connection gives you). The warning comes from any X erorr, and it is not possible to know how serious it is without external information. On Mon, 2 Feb 2009, Adam D. I. Kramer wrote: On Tue, 3 Feb 2009, Patrick Connolly wrote: The problem, and maybe I'm just whining here, is that because the data sets are large this takes several minutes where I'm basically just sitting around. This happens once every other day as the VPN software I'm using times out after about 24 hours and thus the ssh session dies. Is it possible to do anything about the VPN software? I use tightVNC to do something similar and it doesn't time out after 24 hours. Even closing the desktop machine down altogether does not lose the ssh connexion. Restarting the desktop a week later will still find the X session without loss. The VPN software is managed and maintained by the company I'm doing statistical computing work for...out of my control. Your comments about TightVNC are pretty impressive, though--I'm not really sure how that would work...though if you set your ssh connection to not push any data towards your computer, I gather the server would have no reason to believe you were unresponsive? In any case, this sadly doesn't help me, but many thanks! For now, I'm just trying my hardest to remeber to dev.off() when I'm done using graphics. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tunnelling X for R graphics
Indeed, I am running R in screen. That is the context in which this error occurs. The problem is that screen passes $DISPLAY as the $DISPLAY for the actual terminal. So when the ssh session dies, the X11 connection is broken. The REST of R works fine...which is why I use screen in the first place. My data is not lost, etc., however it tells me I need to save and quit immediately. That is my concern. --Adam On Sat, 31 Jan 2009, Dylan Beaudette wrote: Try starting your R session after starting a 'screen' session. Like this: $ screen $ R # do stuff, when taking a break do CTRL-A D to disconnect # use as normal See the man page for screen, it is basically a terminal multiplexer that can gracefully accommodate connection failures. If you get disconnected, re-connect, and then re-attach the screen process: $ screen -r and you should be ok. Cheers, Dylan On 1/31/09, Adam D. I. Kramer a...@ilovebacon.org wrote: Dear colleagues, I run R on a few different machines, and view graphs and the like by tunnelling X through SSH to my local machine. This is useful for me because my local machine can't easily handle some of the data sets I work with. However, when an ssh connection dies, the tunnelled X session also dies, which breaks R's device connection, generating this error: Error: X11 fatal IO error: please save work and shut down R ...that's kinda scary, so I quit(save=yes) and then run R again. The problem, and maybe I'm just whining here, is that because the data sets are large this takes several minutes where I'm basically just sitting around. This happens once every other day as the VPN software I'm using times out after about 24 hours and thus the ssh session dies. I can't really guess at why a broken X session would corrupt a running session of R so severely that it would need to be completely restarted. Can anyone explain this to me? Or perhaps (hopefully) someone has enough knowledge of the X11 device to be able to tell me that I can ignore this message, and just use dev.off() and then X11(localhost:10) to open a new working X11 connection? Cordially, Adam Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems in Recommending R
On Tue, 3 Feb 2009, Rolf Turner wrote: I think the R website is just fine as it is. Effort should be put into content and not into cosmetics. Those (potential) users who would be likely to be influenced by such trivialities as the appearance of the web page are unlikely to be the sort of people who would use R anyway. I respectfully disagree. In my repeated experience, I have seen colleagues in industry and university simply write R off as too difficult or not worth the effort based on purely cosmetic grounds, and then at my urging and after some instruction embrace R as being a fantastic piece of software. The reality of the situation is that before you read a book, you only have its cover to judge. Suggesting that people should read every book regardless of the cover does not make sense for people who have other things to do. In the ecological context of open-source software, the cover or cosmetics of a software program, its documentation, and its support structure are actually quite correlated with overall ease of use, and if functionality is modeled as the factorial interaction of information produced with the amount of time it takes to produce the information, then functionality correlates with ease of use, and so the appearance of the webpage is not a triviality. Cordially, -- Adam D. I. Kramer Ph.D. Student, Social Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
Hi Rick, I understand the authors' point and also agree that post-hoc power analysis is basically not telling me anything more than the p-value and initial statistic for the test I am interested in computing power for. Beta is a simple function of alpha, p, and the statistic. My intention is, as I mentioned in my response to Stephan Kolassa, to transform my p-value and statistic into a form of effect size--sample size necessary to attain significance at alpha=.05. This will communicate no more information, it is just a mathematical re-representation of my data in a way I believe my readers will find more informative and useful. In other words, there is no more information *encoded*, but there is more information *communicated,* just like for any effect size measure. If you have any suggestions on a more reliable effect size for MANOVA which is *also* commonly known in the social psychology community (e.g., a correlation or Cohen's d analogue), I'm interested--but the multivariate nature of the beast makes these more or less impossible to translate. The poster I was asking for is now printed, and we reported the multivariate R-squared using the techniques in Cohen (1988), though I'm expecting to spend a lot of time explaining what that means to people in a multivariate context, rather than describing the results of the study. Cordially, Adam D. I. Kramer On Sun, 1 Feb 2009, Rick Bilonick wrote: On Wed, 2009-01-28 at 21:21 +0100, Stephan Kolassa wrote: Hi Adam, first: I really don't know much about MANOVA, so I sadly can't help you without learning about it an Pillai's V... which I would be glad to do, but I really don't have the time right now. Sorry! Second: you seem to be doing a kind of post-hoc power analysis, my result isn't significant, perhaps that's due to low power? Let's look at the power of my experiment! My impression is that post-hoc power analysis and its interpretation is, shall we say, not entirely accepted within the statistical community, see: Hoenig, J. M., Heisey, D. M. (2001, February). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55 (1), 1-6 And this: http://staff.pubhealth.ku.dk/~bxc/SDC-courses/power.pdf However, I am sure that lots of people can discuss this more competently than me... Best wishes Stephan The point of the article was that doing a so-called retrospective power analysis leads to logical contradictions with respect to the confidence intervals and p-values from the analysis of the data. In other words, DON'T DO IT! All the information is contained in the confidence intervals which are based on the observed data - an after the fact power analysis cannot provide any insight - it's not data analysis. Rick B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tunnelling X for R graphics
On Tue, 3 Feb 2009, Patrick Connolly wrote: The problem, and maybe I'm just whining here, is that because the data sets are large this takes several minutes where I'm basically just sitting around. This happens once every other day as the VPN software I'm using times out after about 24 hours and thus the ssh session dies. Is it possible to do anything about the VPN software? I use tightVNC to do something similar and it doesn't time out after 24 hours. Even closing the desktop machine down altogether does not lose the ssh connexion. Restarting the desktop a week later will still find the X session without loss. The VPN software is managed and maintained by the company I'm doing statistical computing work for...out of my control. Your comments about TightVNC are pretty impressive, though--I'm not really sure how that would work...though if you set your ssh connection to not push any data towards your computer, I gather the server would have no reason to believe you were unresponsive? In any case, this sadly doesn't help me, but many thanks! For now, I'm just trying my hardest to remeber to dev.off() when I'm done using graphics. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplots by variable
Does boxplot(as.data.frame(final)) do what you want? --Adam On Mon, 2 Feb 2009, Vemuri, Aparna wrote: Dear R users, I have a matrix final which looks like this: final oSO4 oNO3 mSO4 mNO3 [1,] 3.3728 0.2110 1.9517421 1.01883602 [2,] 0.8249 0.0697 1.5970292 0.11368781 [3,] 0.2636 0.1004 0.6012445 0.24356332 [4,] 8.0072 0.3443 6.1016998 3.63207149 [5,] 13.5079 0.6593 12.4011068 1.55323386 [6,] 6.1293 0.1989 5.7620926 0.12884845 [7,] 0.6004 0.0661 0.7375408 0.17218600 [8,] 0.6912 0.1672 1.1563314 0.13469750 [9,] 1.0478 0.1504 1.5637809 0.99000758 [10,] 0.4825 0.1160 0.2297545 0.08121805 I would like to create boxplots for this matrix with data binned by oNO3, oSO4, mNO3 and mSO4, all in the same plot. I tried boxplot(final), boxplot(final[,1]) etc. But all those commands create individual plots and not what I am trying to achieve. I was wondering if there is an R equivalent of the matlab command hold on or if there is a simpler way around this. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tunnelling X for R graphics
Dear colleagues, I run R on a few different machines, and view graphs and the like by tunnelling X through SSH to my local machine. This is useful for me because my local machine can't easily handle some of the data sets I work with. However, when an ssh connection dies, the tunnelled X session also dies, which breaks R's device connection, generating this error: Error: X11 fatal IO error: please save work and shut down R ...that's kinda scary, so I quit(save=yes) and then run R again. The problem, and maybe I'm just whining here, is that because the data sets are large this takes several minutes where I'm basically just sitting around. This happens once every other day as the VPN software I'm using times out after about 24 hours and thus the ssh session dies. I can't really guess at why a broken X session would corrupt a running session of R so severely that it would need to be completely restarted. Can anyone explain this to me? Or perhaps (hopefully) someone has enough knowledge of the X11 device to be able to tell me that I can ignore this message, and just use dev.off() and then X11(localhost:10) to open a new working X11 connection? Cordially, Adam Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: can't get package boot to load
If you check ?library, you will see that the package argument only takes one packages. So your code was like saying library(package=RODBC, help=boot) which does not make sense...but which would indeed load RODBC and not boot. That is why your problem occurred and why your solution worked. --Adam On Sat, 31 Jan 2009, Gary Smith wrote: Changing library(RODBC,boot) to library(RODBC) library(boot) seems to have solved the problem. _ _ From: Gary Smith [mailto:gary.smit...@comcast.net] Sent: Saturday, January 31, 2009 12:55 PM To: 'r-help@R-project.org' Subject: can't get package boot to load Hi, I am new to R and I'm totally confused by this problem. I'm trying to load data and run a simple correlation using corr() in the boot package. Yesterday my code worked. Today it can't find the function corr(). I've tried searching the web and haven't found the root of my problem. Apologies if I've missed the obvious. My code is simple: install.packages('RODBC') install.packages('boot') library(RODBC,boot) setwd(C:/Documents and Settings/Gary Smith/My Documents/Blog/) data = odbcConnectExcel(LafferCurve2.xls) mydata = sqlFetch(data,Data) n=23 m=73 gnp=mydata[[3]][n:m] incTaxFirst=mydata[[7]][n:m] corr(cbind(gnp,incTax)) Here are the console messages: install.packages('RODBC') Warning: package 'RODBC' is in use and will not be installed install.packages('boot') trying URL 'http://cran.stat.ucla.edu/bin/windows/contrib/2.8/boot_1.2-35.zip' Content type 'application/zip' length 779420 bytes (761 Kb) opened URL downloaded 761 Kb package 'boot' successfully unpacked and MD5 sums checked The downloaded packages are in C:\Documents and Settings\Gary Smith\Local Settings\Temp\RtmpRT1RUh\downloaded_packages updating HTML package descriptions library(RODBC,boot) setwd(C:/Documents and Settings/Gary Smith/My Documents/Blog/) data = odbcConnectExcel(LafferCurve2.xls) mydata = sqlFetch(data,Data) n=23 m=73 gnp=mydata[[3]][n:m] incTaxFirst=mydata[[7]][n:m] corr(cbind(gnp,incTax)) Error: could not find function corr Using search(), I see: [1] .GlobalEnvpackage:RODBC [3] package:stats package:graphics [5] package:grDevices package:utils [7] package:datasets package:methods [9] Autoloads package:base Typing install.packages('boot') at the prompt gets me the same result. Thanks for any help. Gary ___ Accept no assertions without evidence. --Goompah wisdom from Omega by Jack McDevitt-- Nature is complete because it does not serve itself. --Tao De Jing by Lao Tze-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
Thanks for the response, Stephan. Really, I am trying to say, My result is insignificant, my effect sizes are tiny, you may want to consider the possibility that there really are no meaningful differences. Computing post-hoc power makes a bit stronger of a claim in this setting. My real goal in this case was to put a single line on a poster that says, Significance using our estimates would require N observations, which is larger than the population. I am trying to solve for N. N in this case is a sort of effect size. In this case, it is indeed a simple transformation of Pillai's V and the p-value for the study, and I do not intend to suggest that it is anything more than that. However, I believe that the latter effect size makes a much more compelling case, given that a lot of people (such as yourself) don't have much experience with Pillai's V. --Adam On Wed, 28 Jan 2009, Stephan Kolassa wrote: Hi Adam, first: I really don't know much about MANOVA, so I sadly can't help you without learning about it an Pillai's V... which I would be glad to do, but I really don't have the time right now. Sorry! Second: you seem to be doing a kind of post-hoc power analysis, my result isn't significant, perhaps that's due to low power? Let's look at the power of my experiment! My impression is that post-hoc power analysis and its interpretation is, shall we say, not entirely accepted within the statistical community, see: Hoenig, J. M., Heisey, D. M. (2001, February). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55 (1), 1-6 And this: http://staff.pubhealth.ku.dk/~bxc/SDC-courses/power.pdf However, I am sure that lots of people can discuss this more competently than me... Best wishes Stephan Adam D. I. Kramer schrieb: On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help combining two datasets
You probably want the merge function. ?merge --Adam On Wed, 28 Jan 2009, Somani, Dinesh K wrote: Hi I am a new R user. I have two CSV files, one with daily stock returns using method A {date, stock, returnA, some uninteresting columns}, and another with method B {date, stock, returnB, more columns}. Both have different sets of stocks. I want to combine the two into a single data table, so that I can run some analyses for the overlapping date ranges and stocks. I know how to do this using a database but is there an equivalent way to perform a similar kind of join in R? Data size is small - just a few years worth of daily data. Would appreciate your help. Thanks a lot Dinesh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Power analysis for MANOVA?
Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
On Mon, 26 Jan 2009, Charles C. Berry wrote: If you know what a 'general linear hypothesis test' is see http://cran.r-project.org/src/contrib/Archive/hpower/hpower_0.1-0.tar.gz I do, and am quite interested, however this package will not install on R 2.8.1: First, it said that there was no maintainer in the description, so I added one (figuring that the 1991 date of the package was to blame), however it still will not compile: parmesan:tmp$ sudo R CMD INSTALL hpower/ * Installing to library '/usr/local/lib/R/library' * Installing *source* package 'hpower' ... ** R ** preparing package for lazy loading Error in parse(n = -1, file = file) : unexpected '{' at 5: ## 6: pfnc_function(q,df1,df2,lm,iprec=c(6)) { Calls: Anonymous - code2LazyLoadDB - sys.source - parse Execution halted ERROR: lazy loading failed for package 'hpower' ** Removing '/usr/local/lib/R/library/hpower' parmesan:tmp$ ...any tips? --Adam HTH, Chuck On Mon, 26 Jan 2009, Adam D. I. Kramer wrote: On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print a list in columns
Hi Borja, write.table doesn't work, because it's not clear how to print it in two columns, so we can't help you yet either. Do you want to match the first item in v1 with the first item in v2, and have a bunch of NAs hanging off the end of the shorter column in the output? Or align the last items? Or do you want to just drop the extra items in one column? Etc. On Sat, 20 Dec 2008, Borja Soto Varela wrote: Dear R-Users I have a list with two vectors of doubles tha have different lengths. I want to export it to a file and I also want to print it in two columns. I try with write.table but it need vectors of the same length. Does anyone know how to do it? Thanks Borja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tab completion / X11 fatal IO error
Dear colleagues, I have two questions about R that I cannot answer via google searching: 1) Is there any way to speed up tab completion? This is near-instant in bash, and occasionally takes upwards of 10 seconds in R. When I hit tab, I've usually typed dataframe$ab or lst$ab or something, and length(colnames(dataframe)) or length(names(lst)) are rarely very long. 2) I often run R from the linux command-line in a screen session on a remote computer. As such, to see graphics, I tunnel X through ssh, then run screen, then run X11(localhost:10) which opens an X window on my screen, just as desired. However, if I forget to dev.off() before logging out of ssh (or before the ssh connection drops on its own, which it does with some frequency, hence running 'screen'), I come back to an error like this one: Error: X11 fatal IO error: please save work and shut down R ...and then I shut down R (saving my data, which takes a few minutes) and then load R again (loading my data, which once again takes several minutes), which is a pretty big time sink. Is this error message quite serious? I know and do not mind that the X connection has died, and I can't imagine that this would really cause too many troubles, but I'm a bit afraid to ignore a message that says to save and shut down. version _ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 8.0 year 2008 month 10 day20 svn rev46754 language R version.string R version 2.8.0 (2008-10-20) Thanks in advance, Adam Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difficulty understanding sem errors / failed confirmatory factor analysis
Hello, I'm trying to fit a pretty simple confirmatory factor analysis using the sem package. There's a CFA example in the examples, which is helpful, but the output for my (failing) model is hard to understand. I'd be interested in any other ways to do a CFA in R, if this proves troublesome. The CFA is replicating a 5 uncorrelated-factor structure (for those interested, it is a structure of word usage patterns in weblogs) in a special population. The model looks like model.txt (attached as many people hate long emails); the correlation matrix cors.txt as well. I'm setting no overlap between factors, no correlation between factors, and estimating a separate variance for each observed variable (which should be everything on the right-hand side of the - arrows), but setting the factor variances equal to 1...pretty standard. I've ensured that everything is typed correctly to the best I am able. The problem: library(sem) model.kr - specify.model(file=model.txt) # printing it checks out ok correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok kr.sem - sem(ram=model.kr,S=correl,N=3034) ...about 10 seconds pass... Warning message: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = vars, : Could not compute QR decomposition of Hessian. Optimization probably did not converge. (running qr on correl works fine; randomly-generated correl matrices fail in the same way; I do not know how to further troubleshoot this) ...and then the model itself (which is produced, as the above was just a warning): summary(kr.sem) Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))), par.code) : arguments imply differing number of rows: 47, 0 ...both of these error messages are beyond my ability to troubleshoot. Any help would be greatly appreciated. Because I am unsure what exactly the problem with this analysis is, I can't create a simpler example for testing purposes...but I think my model and correlation matrix are fairly simple. unlist(R.Version()) platform arch x86_64-unknown-linux-gnu x86_64 os system linux-gnux86_64, linux-gnu status major 2 minor year 7.2 2008 monthday 08 25 svn rev language 46428R version.string R version 2.7.2 (2008-08-25) ...sem installed via install.packages(sem) which I assume is current. Cordially, Adam Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis
No new info, but the model and correlation table are pasted at the end of this message. --Adam On Thu, 18 Sep 2008, Adam D. I. Kramer wrote: Hello, I'm trying to fit a pretty simple confirmatory factor analysis using the sem package. There's a CFA example in the examples, which is helpful, but the output for my (failing) model is hard to understand. I'd be interested in any other ways to do a CFA in R, if this proves troublesome. The CFA is replicating a 5 uncorrelated-factor structure (for those interested, it is a structure of word usage patterns in weblogs) in a special population. The model looks like model.txt (attached as many people hate long emails); the correlation matrix cors.txt as well. I'm setting no overlap between factors, no correlation between factors, and estimating a separate variance for each observed variable (which should be everything on the right-hand side of the - arrows), but setting the factor variances equal to 1...pretty standard. I've ensured that everything is typed correctly to the best I am able. The problem: library(sem) model.kr - specify.model(file=model.txt) # printing it checks out ok correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok kr.sem - sem(ram=model.kr,S=correl,N=3034) ...about 10 seconds pass... Warning message: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = vars, : Could not compute QR decomposition of Hessian. Optimization probably did not converge. (running qr on correl works fine; randomly-generated correl matrices fail in the same way; I do not know how to further troubleshoot this) ...and then the model itself (which is produced, as the above was just a warning): summary(kr.sem) Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))), par.code) : arguments imply differing number of rows: 47, 0 ...both of these error messages are beyond my ability to troubleshoot. Any help would be greatly appreciated. Because I am unsure what exactly the problem with this analysis is, I can't create a simpler example for testing purposes...but I think my model and correlation matrix are fairly simple. unlist(R.Version()) platform arch x86_64-unknown-linux-gnu x86_64 os system linux-gnux86_64, linux-gnu status major 2 minor year 7.2 2008 monthday 08 25 svn rev language 46428R version.string R version 2.7.2 (2008-08-25) ...sem installed via install.packages(sem) which I assume is current. Cordially, Adam Kramer model.kr - specify.model() Melancholy - Affect, mel.aff, NA Melancholy - Negemo, mel.neg, NA Melancholy - Sad, mel.sad, NA Melancholy - Physcal, mel.phys, NA Melancholy - Body, mel.phys, NA Melancholy - Eating, mel.eat, NA Melancholy - Groom, NA, 1 Social - numlines, soc.nrow, NA Social - Leisure, soc.leis, NA Social - Home, soc.home, NA Social - Sports, soc.sports, NA Social - TV, soc.tv, NA Social - Music, soc.mus, NA Social - Money, NA, 1 Rant - Swear, rant.swear, NA Rant - Sexual, rant.sex, NA Rant - Anger, rant.anger, NA Rant - I, NA, 1 Metaphysical - Metaph, met.met, NA Metaphysical - Relig, met.relig, NA Metaphysical - Death, NA, 1 Work - Occup, work.occ, NA Work - School, work.school, NA Work - Job, NA, 1 Affect - Affect,Affect.var,NA Negemo - Negemo,Negemo.var,NA Sad - Sad,Sad.var,NA Physcal - Physcal,Physcal.var,NA Body - Body,Body.var,NA Eating - Eating,Eating.var,NA Groom - Groom,Groom.var,NA numlines - numlines,numlines.var,NA Leisure - Leisure,Leisure.var,NA Home - Home,Home.var,NA Sports - Sports,Sports.var,NA TV - TV,TV.var,NA Music - Music,Music.var,NA Money - Money,Money.var,NA Swear - Swear,Swear.var,NA Sexual - Sexual,Sexual.var,NA Anger - Anger,Anger.var,NA I - I,I.var,NA Metaph - Metaph,Metaph.var,NA Relig - Relig,Relig.var,NA Death - Death,Death.var,NA Occup - Occup,Occup.var,NA School - School,School.var,NA Job - Job,Job.var,NA Melancholy - Melancholy, NA, 1 Social - Social, NA, 1 Rant - Rant, NA, 1 Work - Work, NA, 1 Metaphysical - Metaphysical, NA, 1 correl - matrix(0,nrow=24,ncol=24) correl[lower.tri(correl,diag=TRUE)] - c(1, 0.940530496413442, 0.765560263936915, 0.705665939921134, 0.659038546655712, 0.282665099938120, 0.234892888297051, 0.0554321360979252, 0.671137592040541, 0.54382418910777, 0.463922205203901, 0.353418190097785, 0.414864918025334, 0.436177075274485, 0.401019650838241, 0.370116202378091, 0.777297151879925, 0.676782523496444
Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis
Dear John, On Thu, 18 Sep 2008, John Fox wrote: I'm trying to fit a pretty simple confirmatory factor analysis using the sem package. There's a CFA example in the examples, which is helpful, but the output for my (failing) model is hard to understand. I'd be interested in any other ways to do a CFA in R, if this proves troublesome. The CFA is replicating a 5 uncorrelated-factor structure (for those interested, it is a structure of word usage patterns in weblogs) in a special population. The model looks like model.txt (attached as many people hate long emails); the correlation matrix cors.txt as well. As far as I can see, the attachments aren't there. If you like, you can send them to me privately. Without the input covariance matrix and your model, it's very hard to tell what the source of the problem is, but one guess (assuming that you've specified the model correctly) is that the assumption of uncorrelated factors is too far off. Also see below. I have pasted the matrix into another email; apologies for failing to attach them acceptably before. I also augmented the model to allow the factors to correlate, by adding these lines to the model: Melancholy - Social, Soc.Mel, NA Melancholy - Rant, Rant.Mel, NA Melancholy - Work, Work.Mel, NA Melancholy - Metaphysical, Meta.Mel, NA Social - Rant, Soc.Rant, NA Social - Work, Soc.Work, NA Social - Metaphysical, Soc.Meta, NA Rant - Work, Rant.Work, NA Rant - Metaphysical, Rant.Meta, NA Work - Metaphysical, Work.Meta, NA ...and obtain the same errors. I'm setting no overlap between factors, no correlation between factors, and estimating a separate variance for each observed variable (which should be everything on the right-hand side of the - arrows), but setting the factor variances equal to 1...pretty standard. I've ensured that everything is typed correctly to the best I am able. The problem: library(sem) model.kr - specify.model(file=model.txt) # printing it checks out ok correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok kr.sem - sem(ram=model.kr,S=correl,N=3034) ...about 10 seconds pass... Warning message: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = vars, : Could not compute QR decomposition of Hessian. Optimization probably did not converge. (running qr on correl works fine; randomly-generated correl matrices fail in the same way; I do not know how to further troubleshoot this) Doing a QR decomposition on the correlation matrix of the data is essentially irrelevant. The issue is the Hessian. (The scaled inverse Hessian is the covariance matrix of the parameter estimates, not of the data.) That you observe similar problems for randomly generated covariance matrices may or may not be troublesome, depending upon how you generated them. df - as.data.frame(matrix(rnorm(3034*24),nrow=3034,ncol=24)) df.cor - cor(df) rownames(df.cor) - colnames(df.cor) - colnames(correl) sem.df - sem(model.kr, df.cor, 3034) ...which now does not throw errors with the new model, even though that syntax was copied from my .Rhistory. I think I may have gotten unlucky with random data the first time. Thanks for the info on what the error message means, though--I was largely in the dark on that. ...and then the model itself (which is produced, as the above was just a warning): summary(kr.sem) Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))), par.code) : arguments imply differing number of rows: 47, 0 If the Hessian isn't positive-definite, it won't be possible to get estimated coefficient standard errors. I suspect that this is the source of this error message. If so, it would be better for summary.sem() to provide a more informative error message. This makes sense. It may also be useful for the sem() function to throw an error rather than a warning if the Hessian matrix cannot be decomposed, perhaps? How often is an SEM model without estimated coefficient standard errors desirable? Thanks again for the assistance. I think the trouble may now be in my correlation matrix; I will play around with my model and see whether something else is more reasonable. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis
Hi John, Thanks very much for your response. It does appear now that my correlation matrix and not your software is the problem. I apologize for wasting your time! I do think that a more informative error message may have prompted me to consider this possibility more fully. --Adam On Thu, 18 Sep 2008, John Fox wrote: Dear Adam, (1) Note that your input correlation matrix appears to be numerically singular: solve(R) Error in solve.default(R) : system is computationally singular: reciprocal condition number = 2.38183e-17 det(R) [1] -1.753523e-25 qr(R)$rank [1] 23 (2) In addition, you have specified what are essentially redundant constraints on the model, fixing *both* the loading for a reference indicator for each factor to 1 *and* fixing the factor variances to 1. One would normally do one or the other. I think that (1) rather than (2) is the essential source of the problem, and fixing (2) doesn't make the problem go away. Because sem() can't compute the covariance matrix of the estimated parameters, this component is missing from the returned object, causing the cryptic error message in summary.sem(). I'll provide a more informative error or warning. I hope this helps, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam D. I. Kramer Sent: September-18-08 1:37 PM To: r-help@r-project.org Subject: Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis No new info, but the model and correlation table are pasted at the end of this message. --Adam On Thu, 18 Sep 2008, Adam D. I. Kramer wrote: Hello, I'm trying to fit a pretty simple confirmatory factor analysis using the sem package. There's a CFA example in the examples, which is helpful, but the output for my (failing) model is hard to understand. I'd be interested in any other ways to do a CFA in R, if this proves troublesome. The CFA is replicating a 5 uncorrelated-factor structure (for those interested, it is a structure of word usage patterns in weblogs) in a special population. The model looks like model.txt (attached as many people hate long emails); the correlation matrix cors.txt as well. I'm setting no overlap between factors, no correlation between factors, and estimating a separate variance for each observed variable (which should be everything on the right-hand side of the - arrows), but setting the factor variances equal to 1...pretty standard. I've ensured that everything is typed correctly to the best I am able. The problem: library(sem) model.kr - specify.model(file=model.txt) # printing it checks out ok correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok kr.sem - sem(ram=model.kr,S=correl,N=3034) ...about 10 seconds pass... Warning message: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = vars, : Could not compute QR decomposition of Hessian. Optimization probably did not converge. (running qr on correl works fine; randomly-generated correl matrices fail in the same way; I do not know how to further troubleshoot this) ...and then the model itself (which is produced, as the above was just a warning): summary(kr.sem) Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))), par.code) : arguments imply differing number of rows: 47, 0 ...both of these error messages are beyond my ability to troubleshoot. Any help would be greatly appreciated. Because I am unsure what exactly the problem with this analysis is, I can't create a simpler example for testing purposes...but I think my model and correlation matrix are fairly simple. unlist(R.Version()) platform arch x86_64-unknown-linux-gnu x86_64 os system linux-gnux86_64, linux-gnu status major 2 minor year 7.2 2008 monthday 08 25 svn rev language 46428R version.string R version 2.7.2 (2008-08-25) ...sem installed via install.packages(sem) which I assume is current. Cordially, Adam Kramer model.kr - specify.model() Melancholy - Affect, mel.aff, NA Melancholy - Negemo, mel.neg, NA Melancholy - Sad, mel.sad, NA Melancholy - Physcal, mel.phys, NA Melancholy - Body, mel.phys, NA Melancholy - Eating, mel.eat, NA Melancholy - Groom, NA, 1 Social - numlines, soc.nrow, NA
[R] Trouble with difftime()
Hello again, I'm interested in manipulating some date objects, and have been playing with the difftime function. I'm not sure that this is the desired behavior: pa$days.before.apr30 - difftime(may01,as.POSIXct(pa$training.max,origin=as.POSIXct(1969-12-31 16:00:00,tz=PDT)),units=days) units(pa$days.before.apr30) [1] days units(pa$days.before.apr30) [1] days range(pa$days.before.apr30,na.rm=TRUE) Time differences in secs [1] 1817 15723146 ...the oddity being that the range is displayed in seconds, while the unit attached is days. The ?difftime info says that, If the units are changed, the numerical value is scaled accordingly. This appears to be true, as the numbers reported by range() do indeed correspond to the desired output * 60*60*24 (days converted to seconds). Also, plotting a histogram via hist(pa$days.before.apr30) ...produces a histogram of seconds, not days. Using as.numeric(pa$days.before.apr30) or as.numeric(pa$days.before.apr30,units=days) to convert the difftime object into a number both as expected, so I just used that in order to be working (reliably) with days, but this seems like it might be a bug, so I thought I would bring it to your attention. Also, I spent a bit of time freaking out about the histogram being on a scale of 1e7, so it might be wise to print a warning if a unit conversion occurs implicitly even if this behavior is as intended. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble with difftime()
I apologize, my original google searches were insufficient to turn up this archived message: http://tolstoy.newcastle.edu.au/R/help/06/04/24642.html ...in which my question is answered. Yes, this is the expected behavior, because range() is designed to operate on difftime objects with any unit...and it does so by converting to the theoretically-smallest unit. My (and that poster's) confusion is likely due to the fact that, for cases where seconds is not the highest common factor, the conversion is surprising. I think part of my surprise is also due to the fact that using the lowest common multiple doesn't necessarily seem appropriate; why not, for instance, convert everything to days, to weeks, or to the smallest or largest unit passed to the summary function? Also, please note here that I am not intending to whine or complain about the functionality here, I'm just trying to understand it--much better to convert all vectors to the same unit (ANY unit!). Having a better idea of why choices like this have been made in R's user interface help me to better guess what's up, to be able to solve my problems faster, and on my own. --Adam On Thu, 18 Sep 2008, Adam D. I. Kramer wrote: Hello again, I'm interested in manipulating some date objects, and have been playing with the difftime function. I'm not sure that this is the desired behavior: pa$days.before.apr30 - difftime(may01,as.POSIXct(pa$training.max,origin=as.POSIXct(1969-12-31 16:00:00,tz=PDT)),units=days) units(pa$days.before.apr30) [1] days units(pa$days.before.apr30) [1] days range(pa$days.before.apr30,na.rm=TRUE) Time differences in secs [1] 1817 15723146 ...the oddity being that the range is displayed in seconds, while the unit attached is days. The ?difftime info says that, If the units are changed, the numerical value is scaled accordingly. This appears to be true, as the numbers reported by range() do indeed correspond to the desired output * 60*60*24 (days converted to seconds). Also, plotting a histogram via hist(pa$days.before.apr30) ...produces a histogram of seconds, not days. Using as.numeric(pa$days.before.apr30) or as.numeric(pa$days.before.apr30,units=days) to convert the difftime object into a number both as expected, so I just used that in order to be working (reliably) with days, but this seems like it might be a bug, so I thought I would bring it to your attention. Also, I spent a bit of time freaking out about the histogram being on a scale of 1e7, so it might be wise to print a warning if a unit conversion occurs implicitly even if this behavior is as intended. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] moving from aov() to lmer()
On Sat, 13 Sep 2008, roberto toro wrote: Hello, I've used this command to analyse changes in brain volume: mod1-aov(Volume~Sex*Lobe*Tissue+Error(Subject/(Lobe*Tissue)),data.vslt) I'm comparing males/females. For every subject I have 8 volume measurements (4 different brain lobes and 2 different tissues (grey/white matter)). As aov() provides only type I anovas, I would like to use lmer() with type II, however, I have struggled to find the right syntaxis. How should I write the model I use with aov() using lmer()?? Specifying Subject as a random effect is straightforward mod2-lmer(Volume~Sex*Lobe*Tissue+(1|Subject),data.vslt) but I can't figure out the /(Lobe*Tissue) part... You're trying to model a separate effect of lobe, of tissue, and of the interaction between lobe and tissue for each subject, so you want mod2-lmer(Volume~Sex*Lobe*Tissue+(Lobe*Tissue|Subject),data.vslt) ...the resulting fixed effect for Lobe, Tissue, and L:T in the summary() then corresponds to the within-subjects effect aggregated (but not exactly AVERAGED) across subjects. So, it's not exactly providing you a Type II ANOVA...it's doing a mixed-effects model (or HLM, if you prefer), which as you've written it is a Type III analysis (though once again, not an ANOVA in the classical sense). To get something more akin to type II using the lmer function (and I trust someone will pipe up if there is a better way), you could first fit mod2.additive-lmer(Volume~Sex*Lobe+Tissue+(Lobe+Tissue|Subject),data.vslt) ...and interpret the coefficients and effects provided by it, then fit the crossed model to get the coefficients and effects for the higher-order terms. I hope this made sense and that I have understood you correctly. --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fetching a range of columns
Hi Jason, data[] is a data frame, remember--you need to specify rows AND columns. So, data[,c(2,12,17)] is what you should be doing in the first place, and data[,842:2411] in the second place. Not sure if the help you needed was using the comma, or the : syntax, or if you're trying to read only certain columns during the read.csv process (which I don't think that's possible). --Adam On Sun, 14 Sep 2008, Jason Thibodeau wrote: Hello, I realize that using: x[x 3 x 5] I can fetch all elements between 3 and 5. However I read in from a CSV file, and I would like to fetch all columns from within a range ( 842-2411). In teh past, I have done this to fetch just select few columns: data - read.csv(filein, header=TRUE, nrows=320, skip=nskip) data_filter - data[c(2,12,17)] write.table(data_filter, fileout, append = TRUE, sep= ,, row.names= FALSE, col.names = FALSE) nskip - nskip+320 This time, however, instead of grabbing columns 2, 12, 17, I woudl like all columns in the range of 842-2411. I can't seem to do this correctly. Could somebody please provide some insight? Thanks in advance. -- Jason Thibodeau [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] k-sample Kolmogorov-Smirnov test?
Maybe you should look a little harder. help.search(Kolmogorov) PLEASE do read the posting guide http://www.R-project.org/posting-guide.html --Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power PC with a linux distribution and R
To Stephen's credit, Apple will no longer support PowerPC chips (such as his G4) in the next operating system (Snow Leopard), so some sort of switchover will be necessary in order to maintain SOME sort of state-of-the-art software packaging for PPC-based Mac users in the near future. Also, it is important to note that Leopard on a G4 iBook would probably run disgustingly (read: unusably) slow unless the memory is upgraded: It shipped with 256MB of RAM (and that money would be better saved for a new computer), and Leopard (on my Macbook Pro) is currently using about 700MB for the operating system. Switching to linux and X-windows allows for an old system to be, in a word, functional. While I love MacOSX and use it on the computers of mine which can run it usefully (G5 tower, PowerBook G4 w/2GB of ram), I've been much happier with Linux on my older macs. The machine on my desk is a G4 iMac with 256MB of RAM which has no trouble fitting mixed models in R while I browse the web on Firefox 3 while the computer itself serves 80% of the Psychology Department's web surveys. Under Leopard, these programs would not even run simultaneously, let alone usably. Further, while some amount of hand-compiling is necessary (Debian Linux provides almost all of the software I would need), it's also quite helpful--using Simon Urbanek's R optimization flags for G5 and G4 (see https://stat.ethz.ch/pipermail/r-sig-mac/2005-February/001641.html ) and my own version of ATLAS, the performance of R on that machine is comparable to the unoptimized internal-BLAS no-effort-necessary .pkg from CRAN running on a G4 with 2GB of RAM and 1.5 times the processor speed under Leopard. Sure, it took a whole night to compile/optimize ATLAS and most of the next day to compile R with those flags, but to a grad student like myself, that's vastly superior to waiting twice as long for my analyses to run...or to the impossibly high cost of a new computer. --Adam On Fri, 12 Sep 2008, Doran, Harold wrote: Are you aware that a BSD unix OS runs on the mac already? -Original Message- From: [EMAIL PROTECTED] on behalf of Dr Eberhard W Lisse Sent: Fri 9/12/2008 4:38 PM To: stephen sefick Cc: R-help Mailing List; Dr Eberhard W Lisse Subject: Re: [R] Power PC with a linux distribution and R Yes, I'd install the OS X Leopard (10.5.4) distribution on it :-)-O el On 12 Sep 2008, at 22:30 , stephen sefick wrote: This is an operating system question, but it is with the intent of using R on that operating system. I have an ibook G4 Power PC that I am going to install linux on. Is there a better, worse, or perhaps easier (I am a linux newby migrating from mac) distribution that I should look at. I appreciate your help. I didn't post this in the sig-mac because I don't know if it fits there better than anywhere else. thanks -- Stephen Sefick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] match and incomparables
I can replicate this and also do not understand it. match(1:3,1:3,incomparables=5) [1] NA 2 3 match(1:3,1:3,incomparables=4) [1] 1 2 3 match(1:3,1:3,incomparables=3) [1] 1 2 3 match(1:3,1:3,incomparables=2) [1] 1 2 3 match(1:3,1:3,incomparables=1) [1] NA 2 3 ...every other integer value for incomparables produces 1 2 and 3 for output. I'm using R 2.7.2, self-compiled, under linux. --Adam On Fri, 12 Sep 2008, McGehee, Robert wrote: Hello, I was playing around with the newly implemented 'incomparables' argument in 'match' and realized the argument does not behave anything like I expected. Can someone explain what is going on here? Sorry if I'm misreading the documentation. match(1:3, 1:3, incomparables=1) [1] NA 2 3 # This seems right, the 1 in 'x' is 'incomparable' match(1:3, 1:3, incomparables=2) [1] 1 2 3 # Shouldn't this be 1 NA 3? Why isn't the 2 incomparable? match(1:3, 1:3, incomparables=5) [1] NA 2 3 # Why isn't the 5 ignored? Note from ?match: incomparables: a vector of values that cannot be matched. Any value in x matching a value in this vector is assigned the nomatch value. For historical reasons, FALSE is equivalent to NULL. Thanks in advance! Robert platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 7.2 year 2008 month 08 day25 svn rev46428 language R version.string R version 2.7.2 (2008-08-25) Robert McGehee, CFA Geode Capital Management, LLC One Post Office Square, 28th Floor | Boston, MA | 02109 Tel: 617/392-8396Fax:617/476-6389 mailto:[EMAIL PROTECTED] This e-mail, and any attachments hereto, are intended fo...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power PC with a linux distribution and R
Hi El, As does my PowerBook G4 with 2GB of RAM. The iBook shipped with 256MB, however, which Leopard does not run well on. My point was more that Linux was superior to MacOS in the case of very low memory...not that the chip was somehow the problem. ...but now I am solidly off-topic! Cheers, Adam On Sat, 13 Sep 2008, Dr Eberhard W Lisse wrote: Adam, I (or rather the kids and the wife) got two iMinis and one iBook with G4. They all run 10.5.4 happily on their 1GB of RAM... greetings, el On 12 Sep 2008, at 23:43 , Adam D. I. Kramer wrote: To Stephen's credit, Apple will no longer support PowerPC chips (such as his G4) in the next operating system (Snow Leopard), so some sort of switchover will be necessary in order to maintain SOME sort of state-of-the-art software packaging for PPC-based Mac users in the near future. Also, it is important to note that Leopard on a G4 iBook would probably run disgustingly (read: unusably) slow unless the memory is upgraded: It shipped with 256MB of RAM (and that money would be better saved for a new computer), and Leopard (on my Macbook Pro) is currently using about 700MB for the operating system. Switching to linux and X-windows allows for an old system to be, in a word, functional. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: most repeated component of a list
That is indeed different from what I thought the first time. x - sapply(1:length(l), function(x) { sum(sapply(l, function(y) { if ( nrow(l[[x]]) != nrow(y) | ncol(l[[x]]) != ncol(y) ) FALSE else sum(y != l[[x]]) == 0 })) } ) names(x) - names(l) Then, x has the same names as l, and x[i] is the number of matches that l[[i]] has...so you want the index or indices of max(x). --Adam On Thu, 11 Sep 2008, Muhammad Azam wrote: May be i could not explain properly. Actually there are components of list i.e. [[1]] to [[500]]. Each component containing r-rows (may be different for each [[ k ]] and c-columns same for all). I have to compare all the [[ k ]] components of that list and found the one appearing maximum no of times. e.g. from three components [[1]] to [[3]] given below. The most repeated is [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000Please help to find it. Thanks and best regards Muhammad Azam - Original Message From: jim holtman [EMAIL PROTECTED] To: Muhammad Azam [EMAIL PROTECTED] Cc: R-help request [EMAIL PROTECTED]; R Help r-help@r-project.org Sent: Wednesday, September 10, 2008 5:59:28 PM Subject: Re: [R] request: most repeated component of a list If want you want is the summary from all of them, then 'rbind' the data together into one matrix and analyze it: totalMat - do.call(rbind, listOfMatrices) On Wed, Sep 10, 2008 at 11:49 AM, Muhammad Azam [EMAIL PROTECTED] wrote: Dear R community I have stored the results of arrays in a list consist of J-components (say 200 components). Each component containing same no of columns but may be different no of rows. e.g [[1]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000 [[2]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000 [[3]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]44100 [3,]44100 [4,]44000 [5,]44000 For 200 components i want to make a frequency table. How can i make a frequency table of these components or the most repeated component out of all? Any help in this regard will be appreciated. Muhammad Azam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: most repeated component of a list
Did you try my function? I don't see how it doesn't fit the need of this reexplanation. For this set, x would end up equal to c(2,2,1), indicating that array 1 appears twice in the set and array 2 appears twice in the set. So, array 1 and array 2 appear maximally in the set. So, look at a[[1]] or a[[2]] and that is your answer. --Adam On Thu, 11 Sep 2008, Muhammad Azam wrote: Thanks for the effort but still we are far from the desired result. May be this example will help you to understand the situation. Example a1=c(1:12); a1=array(a1,dim=c(3,4)); a2=c(1:12); a2=array(a2,dim=c(3,4)); a3=c(1:16) a3=array(a3,dim=c(4,4)); a=list(a1,a2,a3); a [[1]] [,1] [,2] [,3] [,4] [1,]147 10 [2,]258 11 [3,]369 12 [[2]] [,1] [,2] [,3] [,4] [1,]147 10 [2,]258 11 [3,]369 12 [[3]] [,1] [,2] [,3] [,4] [1,]159 13 [2,]26 10 14 [3,]37 11 15 [4,]48 12 16 Here [[1]] and [[2]] are same out of three (internal values wise). The whole array [[1]] or [[2]] is in majority. So i want to get the whole array or component of list which is in majority. The result should be like this [,1] [,2] [,3] [,4] [1,]147 10 [2,]258 11 [3,]369 12 Hope it is much more clear as before. best regards Muhammad Azam - Original Message From: Adam D. I. Kramer [EMAIL PROTECTED] To: Muhammad Azam [EMAIL PROTECTED] Cc: R Help r-help@r-project.org Sent: Thursday, September 11, 2008 9:53:40 AM Subject: Re: [R] request: most repeated component of a list That is indeed different from what I thought the first time. x - sapply(1:length(l), function(x) { sum(sapply(l, function(y) { if ( nrow(l[[x]]) != nrow(y) | ncol(l[[x]]) != ncol(y) ) FALSE else sum(y != l[[x]]) == 0 })) } ) names(x) - names(l) Then, x has the same names as l, and x[i] is the number of matches that l[[i]] has...so you want the index or indices of max(x). --Adam On Thu, 11 Sep 2008, Muhammad Azam wrote: May be i could not explain properly. Actually there are components of list i.e. [[1]] to [[500]]. Each component containing r-rows (may be different for each [[ k ]] and c-columns same for all). I have to compare all the [[ k ]] components of that list and found the one appearing maximum no of times. e.g. from three components [[1]] to [[3]] given below. The most repeated is [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000Please help to find it. Thanks and best regards Muhammad Azam - Original Message From: jim holtman [EMAIL PROTECTED] To: Muhammad Azam [EMAIL PROTECTED] Cc: R-help request [EMAIL PROTECTED]; R Help r-help@r-project.org Sent: Wednesday, September 10, 2008 5:59:28 PM Subject: Re: [R] request: most repeated component of a list If want you want is the summary from all of them, then 'rbind' the data together into one matrix and analyze it: totalMat - do.call(rbind, listOfMatrices) On Wed, Sep 10, 2008 at 11:49 AM, Muhammad Azam [EMAIL PROTECTED] wrote: Dear R community I have stored the results of arrays in a list consist of J-components (say 200 components). Each component containing same no of columns but may be different no of rows. e.g [[1]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000 [[2]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000 [[3]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]44100 [3,]44100 [4,]44000 [5,]44000 For 200 components i want to make a frequency table. How can i make a frequency table of these components or the most repeated component out of all? Any help in this regard will be appreciated. Muhammad Azam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo
Re: [R] Calculate mean/var by ID
aggregate(value,list(ID=ID),mean) aggregate(value,list(ID=ID),var) --Adam On Thu, 11 Sep 2008, liujb wrote: Hello, I have a data set that looks like this. IDvalue 111 5 111 6 111 2 178 7 178 3 138 3 138 8 138 7 138 6 . . . I'd like to calculate the mean and var for each object identified by the ID. I can in theory just loop through the whole thing..., but is there a easier way/command which let me calculate the mean/var by ID? Thanks, Julia -- View this message in context: http://www.nabble.com/Calculate-mean-var-by-ID-tp19440461p19440461.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calcaulate matrices for two subsets
Hi Bill, Tell me more about the Obs object. The subset of an lm should be a vector telling the lm which observations to use...if Obs[197,396] is a single number, only one observation will be used, and chances are your model is not what you intended. Also, predict(result1,newdata=Obs[397,339]) needs that cell in Obs to have a column called SP500 index, and predict.lm (check the man page for it) will only return one number, the prediction for the given new data. If you want it to have 3 numbers, maybe Obs[397:399] is what you want? The colon means 397 through 399, inclusive while the comma means row 397, column 399. If this is your mistake, you actually want subset=197:396, data=Obs in your call to lm. Hope this helped. --Adam On Thu, 11 Sep 2008, Xianchun Liao wrote: I am an R beginner and trying to run a market model using event study in R framework. First, I run a market model, that is lm(stock security~SP500 index, subset=Obs[197, 396]) -result1 Then I get predict results for a new dataset using predict (result1, newdata=Obs[397,399]) -pred1 Pred1 should have three numbers. Now I need to calculate abnormal return by the formula stock security [397,399] -pred1 But it does not work after trying many times. So, how to write a R code to implement the formula and get right results. Thanks, Bill [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to load functions in R
Source(file.path) executes the file at file.path in order, just as if you had typed it in. So, the source file should in fact name each function in turn: f1 - function(x) { ... } f2 - function(x) { ... } ...etc. So a good way to debug is to just copy and paste lines from your source file into the R command line, and see if they behave as expected. --Adam On Thu, 11 Sep 2008, Adaikalavan Ramasamy wrote: Strange. source() should read all the function in that file unless there was a syntax error or something else preventing the other function from being parsed correctly. Could you send us a simplified example that reproduces this problem? Thanks. Regards, Adai [EMAIL PROTECTED] wrote: Hello, It seems that all methods work. Source() however loads only the last function. with save(a,b,file=path) i can save more than 1 function. Thanks a lot, Mihai -Ursprüngliche Nachricht- Von: Yihui Xie [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 11. September 2008 16:48 An: [EMAIL PROTECTED] Cc: Mirauta, Mihai; r-help@r-project.org Betreff: Re: [R] How to load functions in R We may just read them in the R console instead of an external editor, and fix() or edit() them when we need to make any modifications. A trivial advantage of saving them as an image file in Windows is that you can double-click the file and R will be started with these objects loaded automatically. Anyway, to save the functions as ASCII files or even write a package are also good solutions :-) Regards, Yihui On Thu, Sep 11, 2008 at 10:34 PM, Adaikalavan Ramasamy [EMAIL PROTECTED] wrote: I would recommend saving the functions into a separate file and then using source() as bartjoosen suggested. I do not recommend using save() here because the output is non-readable (even when using ascii=TRUE option). Which means that you have to load() it, then copy-and-paste into an editor before making changes and then running it again in R and then save() again. Another better option is to consider making your own package. It may sound complicated but once you mastered it, it makes your functions more portable and encourages you to document it. Further, the function package.skeleton() simplifies much of it. Regards, Adai Yihui Xie wrote: Hi, you may save your functions somewhere on your disk using save() and load them next time when you want to use them. See ?save and ?load Yihui On Thu, Sep 11, 2008 at 9:30 PM, [EMAIL PROTECTED] wrote: Hello, I am trying to use self created functions in other scripts than the one where they are stored. For the moment I am using the following structure of commands to do that: 1. Load the text file with the functions in the current script: x=parse(path) 2. transform the tex in a function: f1=eval(x[1]), f2=eval(x[2]) if more than one function is stored in the text file 3. use the functions as normal Is there another possibility to do the same? Thank you, Mihai Mirauta [[alternative HTML version deleted]] -- Yihui Xie [EMAIL PROTECTED] Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About Plot.new
You can't add=TRUE unless the graph exists in the first place. So, drop that if you're creating the graph. Or if that's there because you want to put a boxplot on top of a preexisting graph, make sure you have created the preexisting graph already. --Adam On Thu, 11 Sep 2008, cathelf wrote: Hi, sorry for bothering your guys. I will trying to make some nice graph using boxplot. when I check the help file of boxplot, there is a sample code as: boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == OJ, col = orange) legend(2, 9, c(Ascorbic acid, Orange juice), fill = c(yellow, orange)) But when I run it, it shows the following error: Error in xypolygon(xx, yy, lty = blank, col = boxfill[i]) : plot.new has not been called yet what does it mean? If I first run plot.new(), then running the above code, only the x-axis and y-axis is on the graph, no boxplot inside. Can anyone tell me how to call plot.new or at least how to run the above code correctly? Thank you very much! -- View this message in context: http://www.nabble.com/About-%22Plot.new%22-tp19446258p19446258.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixed model MANCOVA
Hi Erika, I have not tried this before, and I hope that somebody will correct me if I'm wrong, but the glmer function in the lme4 library appears to do what you want. From examples(lmer): lmer (gm1 - glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), family = binomial, data = cbpp)) ...I guess that this will do what you want it to because it has multiple variables on the LHS and both continuous and categorical variables on the RHS, along with an explicit grouping structure. In your case, you probably want to leave the family= argument out, as noted in ?glmer, If 'family' is missing then a linear mixed model is fit; otherwise a generalized linear mixed model is fit. ...MANCOVA tend to be generalized linear models. Once again, though, I have not used this system personally, haven't seen your data, and don't know what output to expect. Hopefully somebody else can confirm or deny this solution's efficacy. --Adam On Mon, 8 Sep 2008, Erika Crispo wrote: Hello, I need to perform a mixed-model (with nesting) MANCOVA, using Type III sums of squares. I know how to perform each of these types of tests individually, but I am not sure if performing a mixed-model MANCOVA is possible. Please let me know. Erika Erika Crispo, PhD candidate McGill University, Department of Biology http://www.biology.mcgill.ca/grad/erika/index.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble compiling R with self-compiled LAPACK/ATLAS under Linux
Hello, I just had a lot of trouble compiling a version of R which uses a recently-compiled version of ATLAS and LAPACK. ATLAS and LAPACK compiled correctly and installed fine and passed all of their checks (and yes, I did compile them all with -fPIC as is required for R). This was my initial configure line: ./configure --with-x --enable-threads=posix --with-blas=-L/usr/local/lib -lptf77blas -lptcblas -latlas -lpthread --with-lapack=-L/usr/local/lib -llapack -lptf77blas -lptcblas -latlas -lpthread --prefix=/usr/local ...this configure line was chosen by examining R-admin.pdf and ./configure --help. It configured fine, no problem. The trouble was when I attempt to make the package: ...a lot of stuff compiles correctly... gcc -std=gnu99 -shared -L/usr/local/lib64 -o grDevices.so chull.o devNull.o devPicTeX.o devPS.o devQuartz.o init.o ../../../../library/grDevices/libs/grDevices.so is unchanged make[5]: Leaving directory `/home/akramer/R-2.7.2/src/library/grDevices/src' make[4]: Leaving directory `/home/akramer/R-2.7.2/src/library/grDevices/src' Warning in solve.default(rgb) : unable to load shared library '/home/akramer/R-2.7.2/modules//lapack.so': /home/akramer/R-2.7.2/modules//lapack.so: undefined symbol: cblas_izamax Error in solve.default(rgb) : lapack routines cannot be loaded Error: unable to load R code in package 'grDevices' Execution halted make[3]: *** [all] Error 1 make[3]: Leaving directory `/home/akramer/R-2.7.2/src/library/grDevices' make[2]: *** [R] Error 1 make[2]: Leaving directory `/home/akramer/R-2.7.2/src/library' make[1]: *** [R] Error 1 make[1]: Leaving directory `/home/akramer/R-2.7.2/src' make: *** [R] Error 1 ...So, I looked into the troublesome library: $ nm /home/akramer/R-2.7.2/modules//lapack.so | grep cblas_izamax U cblas_izamax ...the function is not there. However, referring to my configure line, the function is indeed in the libraries I passed to R: [EMAIL PROTECTED] ~/R-2.7.2] nm /usr/local/lib/libptcblas.a | grep cblas_izamax T cblas_izamax ...so, it appears to me that there is a problem in R building lapack.so. So, I looked back into my config.log file, and found this line: configure:37951: checking for zgeev_ configure:38015: gcc -std=gnu99 -o conftest -O3 -mtune=opteron -I/usr/local/include -L/usr/local/lib64 conftest.c -L/usr/local/lib -lptf77blas -lptcblas -latlas -lpthread -lgfortran -lm -ldl -lm 5 ...which then failed. The key to the above line is that -llapack was *not included,* even though zgeev_ is a lapack function! Then, I added -llapack to my --with-blas line, and R compiled correctly, tested fine, and runs nicely. This information is provided because I feel like I have either done something wrong, I am misunderstanding the process of building R, or there is a bug in the configuration for people using --with-blas. Cordially, Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: most repeated component of a list
You might try apply(t(sapply(l,apply,2,sum)),2,sum) --Adam On Wed, 10 Sep 2008, Muhammad Azam wrote: Dear R community I have stored the results of arrays in a list consist of J-components (say 200 components). Each component containing same no of columns but may be different no of rows. e.g [[1]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]43400 [3,]43400 [4,]43000 [[2]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]44300 [3,]44300 [4,]44000 [5,]44000 [[3]] [,1] [,2] [,3] [,4] [,5] [1,]40000 [2,]44100 [3,]44100 [4,]44000 For 200 components i want to make a frequency table. How can i make a frequency table of these components or the most repeated component out of all? Any help in this regard will be appreciated. Muhammad Azam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing a list and vector and returnig the listname
Maybe, for vector v and list l match(v,names(l)) ? --Adam On Wed, 10 Sep 2008, Rajasekaramya wrote: hi, I have list of length 5453 and vector of length 14318.I need to compare the vector with the list and return the list name if matched.I am thinking of using an lapply but how to retrive the listname is wat i am puzzled abt. kindly let me know how to go abt it. Ramya -- View this message in context: http://www.nabble.com/comparing-a-list-and-vector-and-returnig-the-listname-tp19415038p19415038.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixed model MANCOVA
Hi Erika, As mentioned, I haven't run the model before and I don't have access to your data set, so you might want to post your reply to the list as well (cc'd again). As another guessing-without-trying-anything, I'd first make sure that in fact pop and family are factor variables with the same length as a,b,c,treat,centroidsize. Also having not used glmer before, I'm not sure how to get the p values...since estimates and std. errors and t-values are reported, the df's are likely known and so they probably exist in the model object you created. Of course, your highest t-value is 1.92, so none of your fixed effects would be significant at the .05 level (the two-tailed z-score cutoff is 1.93, which is the limit for t). --Adam On Wed, 10 Sep 2008, Erika Crispo wrote: Thanks! I am still having some problems. I have tried the following: model=glmer(cbind(a,b,c)~pop*treat+centroidsize+(1|pop/family)) Error: Matrices must have same number of columns in rbind2(..1, r) In addition: Warning messages: 1: In family:pop : numerical expression has 104 elements: only the first used 2: In family:pop : numerical expression has 104 elements: only the first used I don't get the error messages if I exclude the nesting (i.e. exclude pop on the RHS). But even then, I don't know how to interpret the output. How can I get P values for pop and treat? I've attached my data file. summary(model) Linear mixed model fit by REML Formula: cbind(a, b, c) ~ pop * treat + centroidsize + (1 | family) AICBIC logLik deviance REMLdev -685.3 -656.2 353.6 -822.8 -707.3 Random effects: Groups NameVariance Std.Dev. family (Intercept) 9.8877e-13 9.9437e-07 Residual 2.3502e-05 4.8478e-03 Number of obs: 104, groups: family, 28 Fixed effects: Estimate Std. Error t value (Intercept) 4.518e-03 4.329e-03 1.0438 popkah -2.338e-03 1.902e-03 -1.2297 popkant-2.328e-03 1.881e-03 -1.2380 poprwe -3.728e-03 1.941e-03 -1.9204 treatn -8.703e-04 1.957e-03 -0.4448 centroidsize-1.886e-06 2.464e-06 -0.7656 popkah:treatn 3.440e-03 2.695e-03 1.2765 popkant:treatn 1.198e-03 2.699e-03 0.4439 poprwe:treatn 4.662e-03 2.746e-03 1.6976 Correlation of Fixed Effects: (Intr) popkah popknt poprwe treatn cntdsz ppkh:t ppknt: popkah -0.228 popkant -0.335 0.507 poprwe -0.193 0.490 0.492 treatn -0.092 0.485 0.476 0.479 centroidsize -0.951 0.009 0.119 -0.023 -0.128 popkah:trtn 0.121 -0.705 -0.352 -0.346 -0.719 0.036 popknt:trtn 0.095 -0.352 -0.680 -0.347 -0.721 0.063 0.520 poprwe:trtn 0.117 -0.346 -0.346 -0.707 -0.706 0.037 0.510 0.511 Erika Crispo, PhD candidate McGill University, Department of Biology http://www.biology.mcgill.ca/grad/erika/index.htm - Original Message - From: Adam D. I. Kramer [EMAIL PROTECTED] To: Erika Crispo [EMAIL PROTECTED] Cc: r-help@r-project.org Sent: Wednesday, September 10, 2008 5:47 PM Subject: Re: [R] mixed model MANCOVA Hi Erika, I have not tried this before, and I hope that somebody will correct me if I'm wrong, but the glmer function in the lme4 library appears to do what you want. From examples(lmer): lmer (gm1 - glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), family = binomial, data = cbpp)) ...I guess that this will do what you want it to because it has multiple variables on the LHS and both continuous and categorical variables on the RHS, along with an explicit grouping structure. In your case, you probably want to leave the family= argument out, as noted in ?glmer, If 'family' is missing then a linear mixed model is fit; otherwise a generalized linear mixed model is fit. ...MANCOVA tend to be generalized linear models. Once again, though, I have not used this system personally, haven't seen your data, and don't know what output to expect. Hopefully somebody else can confirm or deny this solution's efficacy. --Adam On Mon, 8 Sep 2008, Erika Crispo wrote: Hello, I need to perform a mixed-model (with nesting) MANCOVA, using Type III sums of squares. I know how to perform each of these types of tests individually, but I am not sure if performing a mixed-model MANCOVA is possible. Please let me know. Erika Erika Crispo, PhD candidate McGill University, Department of Biology http://www.biology.mcgill.ca/grad/erika/index.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. No virus found in this incoming message. Checked by AVG - http
Re: [R] Splitting Data Frame into Two Based on Source Array
data_main[ match(src,data_main$V1), ] and the compliment of src (call it srcc) data_main[ match(srcc,data_main$V1), ] ...this only works so long as there is only one occurrance of each item in V1 in V1. --Adam On Tue, 9 Sep 2008, Gundala Viswanath wrote: Dear all, Suppose I have this data frame: data_main V1 V2 foo13.1 bar 12.0 qux 10.4 cho 20.33 pox 8.21 And I want to split the data into two parts first part are the one contain in the source array: src [1] bar pox and the other one the complement. In the end we hope to get this two dataframes: data_child1 V1 V2 bar 13.1 pox 8.21 and data_child2_complement foo 13.1 qux 10.4 cho 20.33 Is there a compact way to do it in R? - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] isolate elements in vector that match one of many possible values
Check out ?match, ?%in% x - c(1,2,3,4) y - c(1,2,4) match(y,x) [1] 1 2 4 --Adam On Mon, 8 Sep 2008, Andrew Barr wrote: Hi all, I want to get the index numbers of all elements of a vector which match any of a long series of possible values. Say x - c(1,2,3,4) and I want to know which values are equal to 1, 2 or 4. I could do which(x == 1 | x==2 | x==4) [1] 1 2 4 This gets really ugly though, when the list of values of interest is really long. Is there a nicer way to do this? Something akin to the MySQL construction in(), as in #MySQL script example Select * from table where parameter in(x,y,z); Thanks! -- W. Andrew Barr Biological Anthropology University of Texas at Austin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] printing all rows
options(max.print) $max.print [1] 9 options(max.print=10) options(max.print) $max.print [1] 1e+05 ...so check what your max.print is, and figure out whether you need to set it to nrow, ncol, or nrow*ncol of your data frame...then do so...though of course, this is a global variable, so everything you print from then on will just keep printing and printing. Really, though, you might get more utility out of write.table and then using a word processor to read the data in your table. --Adam On Tue, 9 Sep 2008, ANJAN PURKAYASTHA wrote: Hi, my data table has 38939 rows. R prints the first 1 columns and then prints an error message:[ reached getOption(max.print) -- omitted 27821 rows ]]. is it possible to set the maxprint parameter so that R prints all the rows? tia, anjan -- = anjan purkayastha, phd bioinformatics analyst whitehead institute for biomedical research nine cambridge center cambridge, ma 02142 purkayas [at] wi [dot] mit [dot] edu 703.740.6939 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating table of averages
Maybe something like this: by(df[,c(77,81,86,90,94,98,101,106)],df$category,apply,2,mean) ...which would then need to be reformatted into a data frame (there is probably an easy way to do this which I don't know). aggregate seems like a more reasonable choice, but the function for aggregate must return scalars, not rows...tapply doesn't take data.frame inputs. Maybe someone else has a suggestion? --Adam On Tue, 9 Sep 2008, Lawrence Hanser wrote: Dear Colleagues, I have a dataframe with variables: [1] ID category a11a12 a13a21 [7] a22a23a31a32 b11b12 [13] b13b21b31b32 b33b41 [19] b42c11c12c21 c22c23 [25] c31c32c33d11 d12d13 [31] d14d21d22d23 d24d25 [37] d31d32d33e11 e12e13 [43] e21e22e23e31 e32e33 [49] f11f12f13f14 f21f22 [55] f23f24g11g12 g13g14 [61] g21g22g23g24 g31g32 [67] g33g41g42g43 h11h12 [73] h13h21h22h23 C1.Employ SC11.Ops [79] SC12.Unit SC13.Nonadvers C2.Enterprise SC21.Structure SC22.Gov SC23.Culture [85] SC24.Stratcomm C3.Manage SC31.Resource SC32.Change SC33.Continue C4.Stratthink [91] SC41.VisionSC42.Decision SC43.Adapt C5.Lead SC51.Develop SC52.Care [97] SC53.Diversity C6.Foster SC61.Teams SC62.Negotiate C7.Embody SC71.Ethical [103] SC72.Follower SC73.Warrior SC74.Develop C8.Comm C81.Speak C82.Listen [109] OverallImp The variable category has four values: Regular, CCM, CFM, and Other I'd like to create a table like this to feed into barplot2: row.name C1.Employ C2.Enterprise C3.Manage C4.Stratthink C5.Lead C6.Foster C7.Embody C8.Comm Regular 3.68 4.27 3.22 etc.. CCM 4.32 4.56 etc. CFM etc. Other etc. So far, I have been able to get this far: mean(subset(impchiefs08,category==Regular,select=c(C1.Employ,C2.Enterprise,C3.Manage,C4.Stratthink,C5.Lead,C6.Foster,C7.Embody,C8.Comm ))) C1.Employ C2.Enterprise C3.Manage C4.Stratthink C5.Lead C6.Foster C7.Embody C8.Comm 3.60 3.85 4.48 4.346667 4.608889 4.44 4.60 4.49 But I am stumped as to how to get what I want. Thanks in advance. Larry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equality of complicatedly related average correlations
Hi Ralph, My approach provides the same answer by asking a different question. Effectively, my approach tests whether the difference between timepoints is larger for X than for Y (and also gives you the main effects of whether X has a higher mean than Y and whether scores increase or decrease over time, which may or may not be of interest but as control variables serve the same purpose as using correlations rather than difference scores). One objection to this mode of comparing test-retest reliabilities is that one test may be more variable over time (at the item level) than the other, but that the overall means don't differ...but the correlational approch you asked about also suffers from this problem. --Adam On Sun, 7 Sep 2008, Ralph79 wrote: I have to get a bit more familiar with the model you propose in order to understand if it applies to my problem as well. My question is not really does time show a different effect but which one of two measures is more reliable: My respondents have completed exactly the same questionnaire twice (t=1 and t=2). The questionnaire consisted of two ways of measuring attribute importance, and the better method of measuring these importances is the one that gives the same importances for each respondent in t=1 and t=2. In other words: I want to examine test-retest reliability of the two measures. Naturally, if X(t=1,t=2)-correlation is higher for a specific respondent than the Y(t=1,t=2)-corralation, than for this respondent the method that yields the X-importances is more reliable. All I want to do is to see if this holds for the whole sample as well... Anyway, thank you again, I will think of your approach. Ralph Adam D. I. Kramer-3 wrote: Hi Ralph, I had the same problem you do a few months ago, and realized that the question I had (does time show a different effect for X than Y) was not best modeled as differences between correlations across individuals, but as whether time interacts with condition. I answered this question with library(nlme) lme(obs ~ cond*time, random=~cond*time|subj) ...where obs is the responses on the X or Y variable, cond is a factor of either X or Y, and subj is your subject variable. This fits a heirarchical linear model to the data. The relationship between X and time is sig. diff. from the relationship between Y and time if the cond:time fixed effect is true. This approach makes better use of your data, because when you correlate the observations, you're effectively losing variability (because correlations are doubly standardized) as well as degrees of freedom (you have 9 df within each individual, but each correlation is only one number). --Adam On Sat, 6 Sep 2008, Ralph79 wrote: Dear R-Users, I am currently looking for a way to test the equality of two correlations that are related in a very special way. Let me describe the situation with an example. - There are 100 respondents, and there are 2 points in time, t=1 and t=2. - For each of the respondents and at each of the time points, I have information on 10 X-variables and on 10 Y-variables. - Based on this information, I calculate two correlations for each respondent: cor(X[t=1],X[t=2]) and cor(Y[t=1],Y[t=2]), with X and Y being the vectors of the corresponding 10 variables. - Now I get the average correlations over the whole sample using Fishers Z-transformation, i.e. I have mean(cor(X[t=1],X[t=2])) and mean(cor(X[t=1],X[t=2])) and want to know if the mean correlations are significantly different! I haven't found any test that deals with exactly my situation. Therefore, I simply apply a paired t-test based on the individual z-correlations. From my point of view this should be ok, because of the z's normality. However, I am unsure if there is a better way to test the hypothesis that I am interested in? I'd be grateful for any comment or hint. Thank you very much, Ralph - Ralph Wirth University Erlangen-Nuremberg, Chair of Statistics GfK Group, Department of Methods and Product Development -- View this message in context: http://www.nabble.com/Test-for-equality-of-complicatedly-related-average-correlations-tp19346312p19346312.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Ralph Wirth University Erlangen-Nuremberg, Chair of Statistics GfK Group, Department of Methods and Product Development -- View this message in context: http://www.nabble.com/Test-for-equality
Re: [R] ON MAC, how to copy a plot on to Word document?
For what it's worth, copy/paste between R and Word 2008 works perfectly. It doesn't work at all for Word 2004...so that's at least ONE improvement (the only one I've seen) in versions of Word. --Adam On Sun, 7 Sep 2008, Stefan Evert wrote: On 7 Sep 2008, at 16:57, John Kane wrote: I think you need to save the plot and import it into Word. AFAIK you can only copy and paste a plot in Windows. In the R.app GUI, you can click the plot window and then select Copy from the Edit menu (or press Command-C). You can now past the plot into most Mac applications (e.g. Apple's own iWork suit), but it doesn't work in Word 2004 -- so it's really a problem with Word. Have a look at ?png (There are other formats available) Or, much better, a PDF file by selecting Save As ... from the File menu (Shift-Command-S). You should then be able to drag and drop the PDF file into Word. HTH, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Averaging 'blocks' of data
Hi Steve, You probably want to check out ?by or ?aggregate, maybe using (rownames(df) %/% 60) : (colnames(df) %/% 60) as your index variable. --Adam On Sun, 7 Sep 2008, Steve Murray wrote: Dear all, I have a large dataset which I hope to reduce in size, to make it more useable. I hope to do this by taking an average of each 60 x 60 blockof values and forming a new data frame out of the averaged values. How would I go about taking averages of 60 x 60 'blocks' in R, and cycling through the whole dataset, recording each calculated value in a new table/data frame? Many thanks for any advice offered. Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to draw a vertical line from points to x-axis
I think you want the ?lines function. To connect a point (x,y) to the x-axis, lines(x=c(x,x),y=c(y,0)) ...draws a line from that point to the x-axis. You may also want to specify pch=c(?,),type=b where ? is the original point type (which you don't want to run over) and is the pch for theline on the axis. --Adam On Sun, 7 Sep 2008, Anny Huang wrote: Hello, I want to know how to draw a line connecting each point to the x-axis perpendicularly (i.e. a vertical line). abline(v=...) seems not to work for my purpose, because it runs over the data point. Can anyone help? Thanks. Anny [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested ANOVA with fixed and random variables missing data
On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote: I have a nested ANOVA, with a fixed factor tmt nested within site (random). There are missing values in the data set. aeucs, tmt and site have been defined as objects I have tried: model1=lme(aeucs~tmt,random=~1|tmt/site) I think you want lme(aeucs~tmt,random=~tmt|site) ...as tmt is a fixed factor, you don't want to use it for grouping. Instead, you want to estimate a separate tmt effect for each site...so estimate (~) a fixed tmt effect (tmt) within (|) sites (site) = ~tmt|site. --Adam I get the following error message Error in na.fail.default(list(aeucs = c(0.8, 1, 1, 1, 1, 1, 0.7, : missing values in object I need to know where I am going wrong. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting max row from data matrix
Hi Srini, This may be as simple as tapply(weight,fruit,max) or t(that) if you want it as you specified. --Adam On Sun, 7 Sep 2008, Srinivas Iyyer wrote: dear group, i have a data matrix with some replicate items with different values. I want to extract the row with max value. for example: x fruit weight 1 apple1.3 2 apple1.5 3 apple1.6 4 orange1.4 5 orange1.6 x is a data frame. I want to extract unique items from fruits that has max weight. that is: 3 apple1.6 5 orange1.6 I want to be able to use apply functions. Could some one lend some help please. Thanks srini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to preserve date format while aggregating
Hi Erich, Since min() is defined for numbers and not dates, the problem is in the min() function. min() is converting from date format to number format. Your best bet is to make this conversion explicit...such that it is reversable. So, convert the date into UTC, then UTC to seconds since epoch, then take the minimum, then convert back to UTC time. This sounds like a pain...but that's basically what a version of min() designed to work with dates would do. The reason this is a pain is basically due to timezones: Consider a comparison between x = 3:54 PM September 8 in California (right now where I am) and y = 12:54 AM September 9 in Zurich (right now where you are). Is it earlier here than there? Yes, because it's Sept 8 to your Sept 9. Is it earlier there than here? Yes, because your day started 56 minutes ago, mine over 15 hours ago. Is it the same time here than there? Yes, because our UTC times are equal. So it's not clear what min should return, so min is not defined for dates. However, min is defined for numbers, and dates can be converted to numbers...but what those numbers actually mean is not necessarily clear. --Adam On Mon, 8 Sep 2008, Erich Studerus wrote: Hi I have a dataframe in which some subjects appear in more than one row. I want to extract the subject-rows which have the minimum date per subject. I tried the following aggregate function. attach(dataframe.xy) aggregate(Date,list(SubjectID),min) Unfortunately, the format of the Date-column changes to numeric, when I'm applying this function. How can I preserve the date format? Thanks Erich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving functions
Hi Robin, source(filename.R) will open filename.R from the working directory and behave as if you had typed in its contents, line by line. You can include a full path if you like, or use file.choose() for a file not in your working directory. --Adam On Mon, 8 Sep 2008, Williams, Robin wrote: Hi, Appologies for the simple nature of this question, I am unable to find the answer in manuals (EG and introduciton to R). I have written a function in a text editor and saved it with an .R extension. It is saved in my working directory. How can I run it, do I need to use source? If so, how do I supply the arguments to the function? Or does it need to be saved in a particular directory? Do I need a different file extension? Many thanks for any help. Robin Williams Met Office summer intern - Health Forecasting [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about multiple regression
Hi Dimitri, On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote: Dear R-list, maybe some of you could point me in the right direction: Are you aware of any FREE Fortran or Java libraries/actual pieces of code that are VERY efficient (time-wise) in running the regular linear least-squares multiple regression? You almost certainly want the LAPACK fortran libraries, avail at http://www.netlib.org/lapack/ ...the function of interest to you is probably called dgels: http://www.netlib.org/lapack/explore-html/dgels.f.html ...of course, this runs faster if you have a fast BLAS library installed. These exist in many forms, and may already be installed on your system. --Adam More specifically, I have to run small regression models (between 1 and 15 predictors) on samples of up to N=700 but thousands and thousands of them. I am designing a simulation in R and running those regressions and R itself is way too slow. So, I am thinking of compiling the regression run itself in Fortran and Java and then calling it from R. Thank you very much for any advice! Dimitri Liakhovitski MarketTools, Inc. [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equality of complicatedly related average correlations
Hi Ralph, I had the same problem you do a few months ago, and realized that the question I had (does time show a different effect for X than Y) was not best modeled as differences between correlations across individuals, but as whether time interacts with condition. I answered this question with library(nlme) lme(obs ~ cond*time, random=~cond*time|subj) ...where obs is the responses on the X or Y variable, cond is a factor of either X or Y, and subj is your subject variable. This fits a heirarchical linear model to the data. The relationship between X and time is sig. diff. from the relationship between Y and time if the cond:time fixed effect is true. This approach makes better use of your data, because when you correlate the observations, you're effectively losing variability (because correlations are doubly standardized) as well as degrees of freedom (you have 9 df within each individual, but each correlation is only one number). --Adam On Sat, 6 Sep 2008, Ralph79 wrote: Dear R-Users, I am currently looking for a way to test the equality of two correlations that are related in a very special way. Let me describe the situation with an example. - There are 100 respondents, and there are 2 points in time, t=1 and t=2. - For each of the respondents and at each of the time points, I have information on 10 X-variables and on 10 Y-variables. - Based on this information, I calculate two correlations for each respondent: cor(X[t=1],X[t=2]) and cor(Y[t=1],Y[t=2]), with X and Y being the vectors of the corresponding 10 variables. - Now I get the average correlations over the whole sample using Fishers Z-transformation, i.e. I have mean(cor(X[t=1],X[t=2])) and mean(cor(X[t=1],X[t=2])) and want to know if the mean correlations are significantly different! I haven't found any test that deals with exactly my situation. Therefore, I simply apply a paired t-test based on the individual z-correlations. From my point of view this should be ok, because of the z's normality. However, I am unsure if there is a better way to test the hypothesis that I am interested in? I'd be grateful for any comment or hint. Thank you very much, Ralph - Ralph Wirth University Erlangen-Nuremberg, Chair of Statistics GfK Group, Department of Methods and Product Development -- View this message in context: http://www.nabble.com/Test-for-equality-of-complicatedly-related-average-correlations-tp19346312p19346312.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme: Testing variance components
Hello, I've been making a decent amount of use of the lme function recently, and I am aware that I can extract the variance and correlation components with the VarCorr(model) function. However, I am interested in testing these components as well...is there a module or function available for testing these? I understand there's some debate as to which test is best for a given situation (e.g., Wald vs. likelihood ratio), as well as some good arguments that it is perhaps ill-advised to test (variance is something to be predicted; if there is literally no variance, then the estimate perfectly predicts the outcome and the scientific question is basically answered; no statistics necessary), but there are some circumstances in which a nonsignificant variance component might be useful: For example, when deciding (in an HLM) whether it is necessary or at all useful to include a grouping factor. Thanks in advance. Adam D. I. Kramer Ph.D. Candidate, Social and Personality Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coercing by/tapply to data.frame for more than two indices?
Dear Colleagues, Apologies for a long email to ask what I feel may be a very simple question; I figure it's better to overspecify my situation. I was asked a question, recently, by a colleague in my department about pre-aggregating variables, i.e., computing the mean of defined subsets of a data frame. Naturally, I thought of the 'by' and 'tapply' functions, as they have always been the solution for me. However, my colleague had three indices, and as such needs to pay attention to the indices of the output...this is to say, the create an array function of tapply doesn't quite work because an array is not quite what we want. Consider this data set: df - data.frame(var1= factor(rep(rep(1:5,25*5),10)), var2= factor(rep(rep(1:5,each=25*5),10), trial= rep(rep(1:25,25),10), id= factor(rep(1:10,each=5*5*25)), score= rnorm(n=5*5*25*10) ) ...this is to say, each of 10 ids has scores for 5 different levels of var1 and 5 different levels of var2...across 25 trials. Basically, a three-way crossed repeated measures design...where tapply does what I want for a two-way design, it does not quite suit my purposes for a 3-way or n-way for n 2. The goal is to predict score from var1 and var2. The straightforward guess of what to do would be to simply have the AOV function aggregate across trials: aov(score ~ var1*var2 + Error(id/(var1*var2)), data=df) (or lm with defined contrasts) ...however, there are missing data on some trials for some people, which makes this design unbalanced (i.e., it introduces a correlation between var1 and var2). Because my colleague knows (from a theoretical standpoint) that he wants to analyze the mean, his ANOVA on the aggregated trial means WOULD be balanced, which is to say, the analysis he wants to run would produce different output from the above. So, what he needs is a data frame with four variables instead of five: var1, var2, id, and mscore (mean score), which has been averaged across trials. Clearly (to me, it seems), the way to do this is with tapply: x - tapply(df$score, list(df$var1,df$var2,df$id), mean, na.rm=TRUE) ...which returns a var1*var2 matrix for each ID, when what I want is a observation-per-row data frame. So, my question: How do I end up with what I'm looking for? My current process involves setting df2 - data.frame(mscore=c(x), ...) where ... is a bunch of factor(rep) columns that would specify the var1 var2 and id levels. My problem with this approach is that it seems like a hack; it is not a general solution because I must use knowledge of the process by which x was generated in order to get it right, and there's a decent amount of room for unnoticed error on my part. I suppose what I'm looking for is either a way to take by or tapply and have it return a set of index variable columns based on the list of indices I provide to it...or a way to collapse an n-way table into a single data frame with index variables. Any suggestions? Cordially, Adam D. I. Kramer Ph.D. Candidate, Social Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing by/tapply to data.frame for more than two indices?
Thanks very much...it is exactly what I needed, and I'm a bit embarassed that I couldn't find it on my own. One might consider adding aggregate to the See also: lines of by and tapply. That would have prevented me from needing to email the list (which I may have accidentally done twice; I apologize for that). --Adam On Sat, 3 May 2008, jim holtman wrote: ?aggregate aggregate(df$score, list(df$var1, df$var2, df$id), mean, na.rm=TRUE) Group.1 Group.2 Group.3 x 1 1 1 1 0.1053576980 2 2 1 1 0.1514888520 3 3 1 1 0.1270477403 4 4 1 1 -0.0193129404 5 5 1 1 0.2574346931 6 1 2 1 0.0185013523 7 2 2 1 -0.0886420632 8 3 2 1 -0.1304342272 9 4 2 1 -0.0972963702 105 2 1 -0.1463502593 On Fri, May 2, 2008 at 6:43 PM, Adam D. I. Kramer [EMAIL PROTECTED] wrote: Dear Colleagues, Apologies for a long email to ask what I feel may be a very simple question; I figure it's better to overspecify my situation. I was asked a question, recently, by a colleague in my department about pre-aggregating variables, i.e., computing the mean of defined subsets of a data frame. Naturally, I thought of the 'by' and 'tapply' functions, as they have always been the solution for me. However, my colleague had three indices, and as such needs to pay attention to the indices of the output...this is to say, the create an array function of tapply doesn't quite work because an array is not quite what we want. Consider this data set: df - data.frame(var1= factor(rep(rep(1:5,25*5),10)), var2= factor(rep(rep(1:5,each=25*5),10), trial= rep(rep(1:25,25),10), id= factor(rep(1:10,each=5*5*25)), score= rnorm(n=5*5*25*10) ) ...this is to say, each of 10 ids has scores for 5 different levels of var1 and 5 different levels of var2...across 25 trials. Basically, a three-way crossed repeated measures design...where tapply does what I want for a two-way design, it does not quite suit my purposes for a 3-way or n-way for n 2. The goal is to predict score from var1 and var2. The straightforward guess of what to do would be to simply have the AOV function aggregate across trials: aov(score ~ var1*var2 + Error(id/(var1*var2)), data=df) (or lm with defined contrasts) ...however, there are missing data on some trials for some people, which makes this design unbalanced (i.e., it introduces a correlation between var1 and var2). Because my colleague knows (from a theoretical standpoint) that he wants to analyze the mean, his ANOVA on the aggregated trial means WOULD be balanced, which is to say, the analysis he wants to run would produce different output from the above. So, what he needs is a data frame with four variables instead of five: var1, var2, id, and mscore (mean score), which has been averaged across trials. Clearly (to me, it seems), the way to do this is with tapply: x - tapply(df$score, list(df$var1,df$var2,df$id), mean, na.rm=TRUE) ...which returns a var1*var2 matrix for each ID, when what I want is a observation-per-row data frame. So, my question: How do I end up with what I'm looking for? My current process involves setting df2 - data.frame(mscore=c(x), ...) where ... is a bunch of factor(rep) columns that would specify the var1 var2 and id levels. My problem with this approach is that it seems like a hack; it is not a general solution because I must use knowledge of the process by which x was generated in order to get it right, and there's a decent amount of room for unnoticed error on my part. I suppose what I'm looking for is either a way to take by or tapply and have it return a set of index variable columns based on the list of indices I provide to it...or a way to collapse an n-way table into a single data frame with index variables. Any suggestions? Cordially, Adam D. I. Kramer Ph.D. Candidate, Social Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coercing by/tapply to data.frame for more than two indices?
Dear Colleagues, Apologies for a long email to ask what I feel may be a very simple question; I figure it's better to overspecify my situation. I was asked a question, recently, by a colleague in my department about pre-aggregating variables, i.e., computing the mean of defined subsets of a data frame. Naturally, I thought of the 'by' and 'tapply' functions, as they have always been the solution for me. However, my colleague had three indices, and as such needs to pay attention to the indices of the output...this is to say, the create an array function of tapply doesn't quite work because an array is not quite what we want. Consider this data set: df - data.frame(var1= factor(rep(rep(1:5,25*5),10)), var2= factor(rep(rep(1:5,each=25*5),10), trial= rep(rep(1:25,25),10), id= factor(rep(1:10,each=5*5*25)), score= rnorm(n=5*5*25*10) ) ...this is to say, each of 10 ids has scores for 5 different levels of var1 and 5 different levels of var2...across 25 trials. Basically, a three-way crossed repeated measures design...where tapply does what I want for a two-way design, it does not quite suit my purposes for a 3-way or n-way for n 2. The goal is to predict score from var1 and var2. The straightforward guess of what to do would be to simply have the AOV function aggregate across trials: aov(score ~ var1*var2 + Error(id/(var1*var2)), data=df) (or lm with defined contrasts) ...however, there are missing data on some trials for some people, which makes this design unbalanced (i.e., it introduces a correlation between var1 and var2). Because my colleague knows (from a theoretical standpoint) that he wants to analyze the mean, his ANOVA on the aggregated trial means WOULD be balanced, which is to say, the analysis he wants to run would produce different output from the above. So, what he needs is a data frame with four variables instead of five: var1, var2, id, and mscore (mean score), which has been averaged across trials. Clearly (to me, it seems), the way to do this is with tapply: x - tapply(df$score, list(df$var1,df$var2,df$id), mean, na.rm=TRUE) ...which returns a var1*var2 matrix for each ID, when what I want is a observation-per-row data frame. So, my question: How do I end up with what I'm looking for? My current process involves setting df2 - data.frame(mscore=c(x), ...) where ... is a bunch of factor(rep) columns that would specify the var1 var2 and id levels. My problem with this approach is that it seems like a hack; it is not a general solution because I must use knowledge of the process by which x was generated in order to get it right, and there's a decent amount of room for unnoticed error on my part. I suppose what I'm looking for is either a way to take by or tapply and have it return a set of index variable columns based on the list of indices I provide to it...or a way to collapse an n-way table into a single data frame with index variables. Any suggestions? Cordially, Adam D. I. Kramer Ph.D. Candidate, Social Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Omnibus main effects in summary.lme?
Hello, I've been running some HLMs using the lme function quite happily; it does what I want and I'm pretty sure I understand it. The issue is that I'm currently trying to estimate a model with a 14-level nusiance factor as an independent variable...which makes the output quite ugly. All I'm really interested in is the question of whether these factor as a whole (and its interactions with other factors) are significant. The summary.aov function provides this sort of aggregation for lm objects, but does not run on lme objects. I've also tried estimating the full model and restricted model, leaving out a main effect or interaction term and then using anova.lme to compare the models, but these models appear to be being fit differently. Say I have l2, and then l3 - update(l2, .~.-useful:nusience) anova.lme(l3,l2) ...to see whether the interaction term is significant, produces the error, Fitted objects with different fixed effects. REML comparisons are not meaningful. Upon examination using summary(l3), it seems that the fixed factors are indeed different. So, my question is this: How do I estimate omnibus main effects for multi-level factors and multi-level factor interactions in lme models? Many thanks, Adam D. I. Kramer Ph.D. Student, Social and Personality Psychology University of Oregon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.