[R] [R-pkgs] caret version 4.06 released
Version 4.06 of the caret package was sent to CRAN. caret can be used to tune the parameters of predictive models using resampling, estimate variable importance and visualize the results. There are also various modeling and helper functions that can be useful for training models. caret has wrappers to over 50 different models for classification and regression. See the package vignettes or the paper at http://www.jstatsoft.org/v28/i05 for more details. Significant internal changes were made to how the models are fit in train(). Now, the function used to compute the models is passed in as a parameter (defaulting to lapply). In this way, users can use their own parallel processing software without new versions of caret. Examples using MPI and NWS are given in ?train. The package now contains a function (splsda) that extends the spls function to classification (in the same manner than caret's plsda function extends plsr). Also, fixed a bug where the MSE (instead of RMSE) was reported for random forest OOB resampling There are more examples in ?train. Changes to confusionMatrix, sensitivity, specificity and the predicative value functions: - each was made more generic with default and table methods - confusionMatrix extractor functions for matrices and tables were added - the pos/neg predicted value computations were changed to incorporate prevalence - prevalence was added as an option to several functions - detection rate and prevalence statistics were added to confusionMatrix - the examples were expanded in the help files This version of caret will break compatibility with caretLSF and caretNWS. However, these packages will not be needed now (see above) and will be deprecated. They will work on versions of caret = 3.51 and will not be developed going forward. However, they can still be found at fhttps://r-forge.r-project.org/projects/caret/ Send questions, comments etc to max.k...@pfizer.com. Max ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can't load rJava in R 2.8.1 on Windows XP
Duncan Murdoch murdoch at stats.uwo.ca writes: I don't know what's going wrong on your system. I added a browser() call to the .onLoad function in R/windows/FirstLib.R on my system, and I see it successfully gets JAVA_HOME from the registry. It gets a number of other files, then adds these paths to my PATH variable. I've used strsplit() to separate them for viewing. [14] C:\\Program Files\\Java\\jre1.6.0_07\\bin\\client [15] C:\\Program Files\\Java\\jre1.6.0_07/bin [16] C:\\Program Files\\Java\\jre1.6.0_07/bin/client [17] C:\\Program Files\\Java\\jre1.6.0_07/jre/bin/client I believe LoadLibrary needs paths to be specified with backslashes, so you might be able to fix things on your system by changing the file.path calls in that function to use fsep=\\ instead of the default /. Thanks for your help. I think I tracked it down. It has nothing to do with rJava, but rather with Sys.getenv(). Looks like this function truncates around 1024 characters, and my path is very long due to Visual Studio + Delphi + SQL Server. See the printout below. Note that the last entry should read \\Delphi, and that more entries are coming in my system path. This also explains why only some people have the problem. No workaround found yet. I keep this message here for other people who have the problem, but possibly this is more for R-devel to be continued. Dieter p = Sys.getenv(PATH) nchar(p) PATH 1019 strsplit(p,;)$PATH[-(1:27)] [1] C:\\Program Files\\Microsoft SQL Server\\100\\Tools\\Binn\\VSShell\\Common7\\IDE\\ [2] C:\\Program Files\\MiKTeX 2.7\\miktex\\bin [3] C:\\Users\\Dieter\\Documents\\Delp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to analyse and model 2 time series, when one series needs to be differenced?
Hello. How can I analyse the cross-correlation between two time series with ccf, if one of the time series need to be differenced, so it is stationary? The two time series differ when in length and maybe ccf produces not the correct cross-correlation?! Another problem: How can I model the two time series as an VARI-process with the dse package? - So how can I handle it, that one series has to be differenced and the other series not? I hope you can give me some hints. Regards, Andreas. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HMISC package: wtd.table()
Hi useRs developeRs, I got stuck within a function of the Hmisc package. Sounds easy, hope it is: I got 2 items (FamTyp.kurz, HGEW) of same length, no missings. length(FamTyp.kurz);summary(FamTyp.kurz) [1] 14883 Min. 1st Qu. MedianMean 3rd Qu.Max. 10.00 20.00 21.00 21.66 23.00 31.00 length(HGEW);summary(HGEW) [1] 14883 Min. 1st Qu. MedianMean 3rd Qu.Max. 104.5 409.6 489.4 549.8 623.3 3880.0 Now I simply want to compute a table of unweighted and weighted values. But, ... the weights do not seem to be accepted print(unweighted );print(table(FamTyp.kurz)) [1] unweighted FamTyp.kurz 10 1120 21 22 23 30 31 1755 683 3322 1683 2428 1440 1748 1824 print(weighted );print(wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) [1] weighted Error in wtd.table(FamTyp.kurz, weigths = HGEW, normwt = FALSE, na.rm = TRUE) : unused arguments (weigths = c(495.55949, 495.55949, 678.16378, 678.16378, . any ideas ??? thanx in advance, Norbert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] heatmap with levelplot
Hi there, I'd like to create a heatmap from my matrix with a) a defined color range (lets say from yellow to red) b) using striking colors above and below a certain threshold (above = green, below = blue) Example matrix (there should be a few outliers generated...) + simple levelplot without outliers marked: library(lattice) my.mat - matrix(rnorm(800), nrow = 40) threshold - c(-1,1) # should be used for the extreme colors colorFun - colorRampPalette(c(yellow,red)) levelplot(my.mat, col.regions = colorFun(50)) I don't know how to handle the extrem values... Can anybody help? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMISC package: wtd.table()
oops, that really was easy: (wtd.table(FamTyp.kurz,HGEW,normwt=FALSE,na.rm=TRUE)) instead of (wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) sorry for that question ... Am 26.01.2009, 10:30 Uhr, schrieb Norbert NEUWIRTH norbert.s.neuwi...@univie.ac.at: Hi useRs developeRs, I got stuck within a function of the Hmisc package. Sounds easy, hope it is: I got 2 items (FamTyp.kurz, HGEW) of same length, no missings. length(FamTyp.kurz);summary(FamTyp.kurz) [1] 14883 Min. 1st Qu. MedianMean 3rd Qu.Max. 10.00 20.00 21.00 21.66 23.00 31.00 length(HGEW);summary(HGEW) [1] 14883 Min. 1st Qu. MedianMean 3rd Qu.Max. 104.5 409.6 489.4 549.8 623.3 3880.0 Now I simply want to compute a table of unweighted and weighted values. But, ... the weights do not seem to be accepted print(unweighted );print(table(FamTyp.kurz)) [1] unweighted FamTyp.kurz 10 1120 21 22 23 30 31 1755 683 3322 1683 2428 1440 1748 1824 print(weighted );print(wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) [1] weighted Error in wtd.table(FamTyp.kurz, weigths = HGEW, normwt = FALSE, na.rm = TRUE) : unused arguments (weigths = c(495.55949, 495.55949, 678.16378, 678.16378, . any ideas ??? thanx in advance, Norbert -- ** Mag. Norbert Neuwirth Österreichisches Institut für Familienforschung (ÖIF) - Universität Wien Austrian Institute for Family Studies - University of Vienna http://www.oif.ac.at e-mail:norbert.neuwi...@oif.ac.at tel: +43-1-4277-489-11 fax: +43-1-4277-9-489 address: A-1010 Wien, Grillparzerstraße 7/9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with clustering
I am going to try out a tentative clustering of some feature vectors. The range of values spanned by the three items making up the features vector is quite different: Item-1 goes roughly from 70 to 525 (integer numbers only) Item-2 is in-between 0 and 1 (all real numbers between 0 and 1) Item-3 goes from 1 to 10 (integer numbers only) In order to spread out Item-2 even further I might try to replace Item-2 with Log10(Item-2). My concern is that, regardless the distance measure used, the item whose order of magnitude is the highest may carry the highest weight in the process of calculating the similarity matrix therefore fading out the influence of the items with smaller variation in the resulting clusters. Should I normalize all feature vector elements to 1 in advance of generating the similarity matrix ? Thank you so much. Maura tutti i telefonini TIM! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] infer haplotypes phasing trios tdthap
tdthap wassn't intended to solve that problem and it has been removed from my own web site since I no longer consider it important enough to support. DC -Original Message- From: Tiago R Magalhães [mailto:tiag...@gmail.com] Sent: 22 January 2009 11:10 To: r-help@R-project.org Subject: infer haplotypes phasing trios tdthap Dear R mailing list, I have a dataset with genotypes from trios and I would like to infer haplotypes for each mother, father and child. The package that I could find that can do this is tdthap. But when the mother is homozygous (e.g., 2/2) the haplotype is called as not possible to infer (0); I would prefer for it to call the genotype (2). From what I understand it is doing what I would like for the father (example below). Can anyone provide me with some information about this tdthap behaviour? And is there any other package that would do this? (Searched for it, couldn't find it) Thank you very much, Tiago Magalhães example (ped file with pedigrees) 9 100 102 101 1 2 1 1 2 1 2 2 1 2 9 101 0 0 2 1 1 1 2 1 2 2 2 2 9 102 0 0 1 1 2 1 2 1 2 2 1 1 data out: hap.transmit(example) pedidfathermother 9 100102 101 f.tr.1f.tr.2f.tr.3f.tr.4 1 0 2 1 m.tr.1m.tr.2m.tr.3m.tr.4 00 0 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Meaning of Inner Product (%*%) Between Slot and Vector
Dear all, I have the following object and vector: print(alpha) Slot ra: [1] 0.994704478 0.002647761 0.000882587 0.000882587 0.000882587 0.989459074 [7] 0.005270463 0.002635231 0.002635231 0.994717023 0.005282977 1.0 [13] 1.0 Slot ja: [1] 1 5 2 3 4 2 1 3 4 3 3 4 5 Slot ia: [1] 1 6 10 12 13 14 Slot dimension: [1] 5 5 print(p) [1] 0.4 0.2 0.2 0.2 0.2 Now what I don't understand is, after performing inner product it gives this: print(alpha %*% p) [,1] [1,] 0.3989409 [2,] 0.2010541 [3,] 0.200 [4,] 0.200 [5,] 0.200 My questions are: 1. How does %*% work in the above example? 2. Is there a more understandable (naive) way to implement such product in this context? - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to modify an R built-in function?
Hello R experts! Last week I run in to a lot a problems triyng to fit an ARIMA model to a time series. The problem is that the internal process of the arima function call function optim to estimate the model parameters, so far so good... but my data presents a problem with the default method BFGS of the optim function, the output error looks like this: Error en optim(init[mask], armafn, method = BFGS, hessian = TRUE, control = optim.control, : non-finite finite-difference value [7] I've searched through the R-forums for an answer and the only thing that look like it might help is a suggestion to modify the R-arima function in a way that allows to select the optimization method for the optim function. The post is available here: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/138255.html The problem is that I'm not familiar with the procedure thet the author suggest, ie, I don't know how to modify the R fucntion through a R-script. Any help will be very appreciated!!! regards!!! Diego. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with clustering
Generally, how to scale different variables when aggregating them in a dissimilarity measure is strongly dependent on the subject matter, what the aim of clustering and your cluster comncept is. This cannot be answered properly on such a mailing list. A standard transformation before computing dissimilarities would be to scale all variables to variance 1 by dividing by their standard deviations. This gives in some well defined sense all variables the same weight (which may be somewhat affected by outliers, heavy tails, skewness; note, however, that normalising to the same range shares the same problems more severly). Regards, Christian On Mon, 26 Jan 2009, mau...@alice.it wrote: I am going to try out a tentative clustering of some feature vectors. The range of values spanned by the three items making up the features vector is quite different: Item-1 goes roughly from 70 to 525 (integer numbers only) Item-2 is in-between 0 and 1 (all real numbers between 0 and 1) Item-3 goes from 1 to 10 (integer numbers only) In order to spread out Item-2 even further I might try to replace Item-2 with Log10(Item-2). My concern is that, regardless the distance measure used, the item whose order of magnitude is the highest may carry the highest weight in the process of calculating the similarity matrix therefore fading out the influence of the items with smaller variation in the resulting clusters. Should I normalize all feature vector elements to 1 in advance of generating the similarity matrix ? Thank you so much. Maura tutti i telefonini TIM! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package help
Thanks! Works like a charm. -Aaron From: Duncan Temple Lang [dun...@wald.ucdavis.edu] Sent: Friday, January 23, 2009 6:48 PM To: Skewes,Aaron Cc: r-help@r-project.org Subject: Re: [R] XML package help Skewes,Aaron wrote: Please consider this: Manifest xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; !-- eName : name of the element. eValue : value of the element. -- OutputFilePath./XYZ/OutputFilePath FilesList File FileTypeId10/FileTypeId FilePath./XYZ//FilePath PatientCharacteristics eName=one eValue=1/ PatientCharacteristics eName=two eValue=2/ PatientCharacteristics eName=three eValue=3/ /File /FilesList /Manifest I am attempting to use XML package and xpathSApply() to extract, say, the eValue attribute for eName=='0ne' for all File nodes that have FileTypeId==10. I try the following, amoung several things: getNodeSet(doc, //File[FileTypeId/text()='10']/patientcharacteristi...@ename='one']/@eValue) should do it. You need to compare the text() of the FileTypeId node. And the / after the PatientCharacterstics and before the [] will cause trouble. HTH, D. doc-xmlInternalTreeParse(Manifest) Root = xmlRoot(doc) xpathSApply(Root, //File[FileTypeId=10]/PatientCharacteristics/[...@ename='one'], xmlAttrs) and it does not work. Might somebody help me with the syntax here? Thanks a lot!! Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mode (statistics) in R?
Hopefully this is a pretty simple question: Is there a function in R that calculates the mode of a sample? That is, I would like to be able to determine the value that occurs the most frequently in a data set. I tried the default R mode function, but it appears to provide a storage type or something else. I tried the RSeek and some R documentation that I downloaded, but nothing seems to mention calculating the mode. Thanks again. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm StepAIC with all interactions and update to remove a term vs. glm specifying all but a few terms and stepAIC
Problem: I am sorting through model selection process for first time and want to make sure that I have used glm, stepAIC, and update correctly. Something is strange because I get a different result between: 1) a glm of 12 predictor variables followed by a stepAIC where all interactions are considered and then an update to remove one specific interaction. vs. 2) entering all the terms individually in a glm (exept the one that I removed with update and 4 others like it but which did not make it to final model anyway), and then running stepAIC. Question: Why do these processes not yield same model? Here are all the details if helpful: I start with 12 potential predictor variables, 7 primary terms and 5 additional that are I(primary_terms^2). I run a glm for these 12 and then do stepAIC (BIC actually) both directions. The scope argument is scope=list(upper=~.^2,lower=NULL). This means there are 78 predictor terms considered, the 12 primary terms and 66 interactions [n(n+1)/2]. I see this with trace=T also. Here is the code used: glm1-glm(formula = PRESENCE == 1 ~ SNOW + I(SNOW^2) + POP_DEN + ROAD_DE + ADJELEV + I(ADJELEV^2) + TRI + I(TRI^2) + EDGE + I(EDGE^2) + TREECOV + I(TREECOV^2),family = binomial, data = wolv) summary(glm1) library(MASS) stepglm2-stepAIC(glm1,scope=list(upper=~.^2,lower=NULL), trace=T,k=log(4828),direction=both) summary(stepglm2) extractAIC(stepglm2,k=log(4828)) This results in a 15 term model with a BIC of 3758.659 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -4.983e+01 9.263e+00 -5.379 7.50e-08 *** SNOW 6.085e-02 8.641e-03 7.041 1.90e-12 *** ROAD_DE -5.637e-01 1.192e-01 -4.730 2.24e-06 *** ADJELEV2.880e-02 7.457e-03 3.863 0.000112 *** I(ADJELEV^2) -4.038e-06 1.487e-06 -2.715 0.006618 ** TRI5.675e-02 1.081e-02 5.248 1.54e-07 *** I(TRI^2) -1.713e-03 4.243e-04 -4.036 5.43e-05 *** EDGE 6.418e-03 1.697e-03 3.782 0.000156 *** TREECOV1.680e-01 2.929e-02 5.735 9.76e-09 *** SNOW:ADJELEV -4.313e-05 6.935e-06 -6.219 5.00e-10 *** ADJELEV:TREECOV -6.628e-05 1.161e-05 -5.711 1.13e-08 *** SNOW:I(ADJELEV^2) 7.437e-09 1.384e-09 5.373 7.74e-08 *** TRI:I(TRI^2) 1.321e-06 3.419e-07 3.863 0.000112 *** I(ADJELEV^2):I(TRI^2) -2.127e-10 5.745e-11 -3.702 0.000214 *** ADJELEV:I(TRI^2) 1.029e-06 3.004e-07 3.424 0.000617 *** SNOW:TRI 1.057e-05 3.372e-06 3.135 0.001721 ** The final model included a the TRI:I(TRI^2) term, which is effectively a cubic function. So this was removed because cubic's were not considered for all variables. I used update to remove TRI:I(TRI^2). Code: stepglm3-update(stepglm2,~.-TRI:I(TRI^2),trace=T) summary(stepglm3) extractAIC(stepglm3,k=log(4828)) This results in a 14 term model with a BIC of 3770.172. The BIC is a little higher, but the cubic term improved fit and is no longer in, so expected. Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -5.329e+01 9.267e+00 -5.750 8.92e-09 *** SNOW 6.241e-02 8.695e-03 7.178 7.06e-13 *** ROAD_DE -5.756e-01 1.184e-01 -4.863 1.16e-06 *** ADJELEV3.233e-02 7.452e-03 4.338 1.44e-05 *** I(ADJELEV^2) -4.724e-06 1.487e-06 -3.177 0.001489 ** TRI1.834e-02 5.402e-03 3.395 0.000687 *** I(TRI^2) -1.122e-03 3.920e-04 -2.863 0.004190 ** EDGE 6.344e-03 1.690e-03 3.754 0.000174 *** TREECOV1.745e-01 2.923e-02 5.969 2.39e-09 *** SNOW:ADJELEV -4.444e-05 6.984e-06 -6.363 1.98e-10 *** ADJELEV:TREECOV -6.885e-05 1.160e-05 -5.937 2.90e-09 *** SNOW:I(ADJELEV^2) 7.681e-09 1.395e-09 5.506 3.67e-08 *** I(ADJELEV^2):I(TRI^2) -1.839e-10 5.692e-11 -3.232 0.001231 ** ADJELEV:I(TRI^2) 8.860e-07 2.974e-07 2.979 0.002892 ** SNOW:TRI 1.219e-05 3.260e-06 3.740 0.000184 *** This all seems to be as it should. I then decided to try and confim this result by running a glm without any of the 5 potential cubic terms ( note - TRI:I(TRI^2) was the only one that made it into the final model but there were 5 potential). After entering the 73 potential terms (12 primary vaiables and now 66 minus 5 interactions = 73 total), the glm and stepAIC produces a completely different final model. It has 8 variables that were not in the model that was chosen with scope statement and manually removing TRI:TRI^2, and it is missing 7 variables that were in the model chosen with the scope statement. It has 8 variables that were in both. Code and Result: glmalt1b-glm(formula = PRESENCE ==1 ~
Re: [R] how to modify an R built-in function?
type the name of the function in an R session. The source code will result- have fun. On Mon, Jan 26, 2009 at 7:36 AM, diego Diego dhab...@gmail.com wrote: Hello R experts! Last week I run in to a lot a problems triyng to fit an ARIMA model to a time series. The problem is that the internal process of the arima function call function optim to estimate the model parameters, so far so good... but my data presents a problem with the default method BFGS of the optim function, the output error looks like this: Error en optim(init[mask], armafn, method = BFGS, hessian = TRUE, control = optim.control, : non-finite finite-difference value [7] I've searched through the R-forums for an answer and the only thing that look like it might help is a suggestion to modify the R-arima function in a way that allows to select the optimization method for the optim function. The post is available here: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/138255.html The problem is that I'm not familiar with the procedure thet the author suggest, ie, I don't know how to modify the R fucntion through a R-script. Any help will be very appreciated!!! regards!!! Diego. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode (statistics) in R?
Hello, You can try ?table. Best regards, Carlos J. Gil Bellosta http://www.datanaytics.com On Mon, 2009-01-26 at 05:28 -0800, Jason Rupert wrote: Hopefully this is a pretty simple question: Is there a function in R that calculates the mode of a sample? That is, I would like to be able to determine the value that occurs the most frequently in a data set. I tried the default R mode function, but it appears to provide a storage type or something else. I tried the RSeek and some R documentation that I downloaded, but nothing seems to mention calculating the mode. Thanks again. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode (statistics) in R?
Here's a rather convoluted way of finding the mode (or, at least, the first mode): x = round(rnorm(100,sd=5)) my_mode = as.numeric(names(table(x))[which.max(table(x))]) On Mon, Jan 26, 2009 at 9:28 AM, Jason Rupert jasonkrup...@yahoo.com wrote: Hopefully this is a pretty simple question: Is there a function in R that calculates the mode of a sample? That is, I would like to be able to determine the value that occurs the most frequently in a data set. I tried the default R mode function, but it appears to provide a storage type or something else. I tried the RSeek and some R documentation that I downloaded, but nothing seems to mention calculating the mode. Thanks again. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University www.thatmike.com Looking to arrange a meeting? Check my public calendar: http://www.thatmike.com/mikes-public-calendar ~ Certainty is folly... I think. ~ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode (statistics) in R?
Thanks. I ended up breaking it up into two steps: table_data-table(data) subset(table_data, table_data==max(table_data)) Thanks again. --- On Mon, 1/26/09, Mike Lawrence m...@thatmike.com wrote: From: Mike Lawrence m...@thatmike.com Subject: Re: [R] Mode (statistics) in R? To: jasonkrup...@yahoo.com Cc: r-help@r-project.org Date: Monday, January 26, 2009, 7:39 AM Here's a rather convoluted way of finding the mode (or, at least, the first mode): x = round(rnorm(100,sd=5)) my_mode = as.numeric(names(table(x))[which.max(table(x))]) On Mon, Jan 26, 2009 at 9:28 AM, Jason Rupert jasonkrup...@yahoo.com wrote: Hopefully this is a pretty simple question: Is there a function in R that calculates the mode of a sample? That is, I would like to be able to determine the value that occurs the most frequently in a data set. I tried the default R mode function, but it appears to provide a storage type or something else. I tried the RSeek and some R documentation that I downloaded, but nothing seems to mention calculating the mode. Thanks again. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University www.thatmike.com Looking to arrange a meeting? Check my public calendar: http://www.thatmike.com/mikes-public-calendar ~ Certainty is folly... I think. ~ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to modify an R built-in function?
Unless there is some real reason you need an arima model perhaps you could just try an ar model instead. ?ar On Mon, Jan 26, 2009 at 7:36 AM, diego Diego dhab...@gmail.com wrote: Hello R experts! Last week I run in to a lot a problems triyng to fit an ARIMA model to a time series. The problem is that the internal process of the arima function call function optim to estimate the model parameters, so far so good... but my data presents a problem with the default method BFGS of the optim function, the output error looks like this: Error en optim(init[mask], armafn, method = BFGS, hessian = TRUE, control = optim.control, : non-finite finite-difference value [7] I've searched through the R-forums for an answer and the only thing that look like it might help is a suggestion to modify the R-arima function in a way that allows to select the optimization method for the optim function. The post is available here: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/138255.html The problem is that I'm not familiar with the procedure thet the author suggest, ie, I don't know how to modify the R fucntion through a R-script. Any help will be very appreciated!!! regards!!! Diego. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode (statistics) in R?
on 01/26/2009 07:28 AM Jason Rupert wrote: Hopefully this is a pretty simple question: � Is there a function in R that calculates the mode of a sample?�� That is, I would like to be able to determine the value that occurs the most frequently in a data set. � I tried the default R mode function, but it appears to provide a storage type or something else.� � I tried the RSeek and some R documentation that I downloaded, but nothing seems to mention calculating the mode. � Thanks again. It depends upon the type of data you are dealing with. If it is discrete, you can use table() to calculate frequencies and then take the max: set.seed(1) tl - table(sample(letters, 100, replace = TRUE)) tl a b c d e f g h i j k l m n o p q r s t u v w x y z 2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 tl[which.max(tl)] u 8 Alternatively, if the data is continuous, then you will need to look at some form of density estimation. There have been various discussions over the years on how to go about doing this, but a simplistic approach would be: set.seed(1) x - rnorm(100) dx - density(x) dx$x[which.max(dx$y)] [1] 0.3294585 # Review plot plot(dx) abline(v = dx$x[which.max(dx$y)]) See ?table, ?which.max and ?density HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text vector clustering
Dear srinivas, You can try using trigrams, a special case of N-grams, often used in Natural Language Processing. I am interested in grouping/cluster these names as those which are similar letter to letter. Are there any text clustering algorithm in R which can group names of similar type in to segments of exactly matching , 90% matching, 80% matching,etc. As an example: # supose we have a list with locations # (here we got a matrix, second column is used to create the sample, not relevant) # locations with errors Poblacion_dist = matrix( c(MADRIZ, 0.3, BARÇELONA, 0.25, BILAO, 0.135, SEVILA, 0.1, VALENÇIA, 0.1, CORUNA, 0.025, ALACANTE,0.025, VALLADOLI, 0.025, SANTIAGO, 0.01, SAN SEBASTIAN, 0.01, CADIZ, 0.01, ZARAGOZA, 0.01), ncol = 2, byrow=T) # True locations Poblacion = matrix( c(MADRID, 0.3, BARCELONA, 0.25, BILBAO, 0.135, SEVILLA, 0.1, VALENCIA, 0.1, CORUÑA, 0.025, ALICANTE,0.025, VALLADOLID, 0.025, SANTIAGO, 0.01, SAN_SEBASTIAN, 0.01, CADIZ, 0.01, ZARAGOZA, 0.01), ncol = 2, byrow=T) muestrear = function(que, cuantas_veces){ sample(que[,1], prob = as.numeric(que[,2]), cuantas_veces) } Provincias = ((replicate(10,c(muestrear(Poblacion,1), c(muestrear(Poblacion_dist,1)) # now we have a list with 20 locations Provincias = Provincias[1:length(Provincias)] # next we need to process each location as a set of trigrams word2trigram = function(word){ trigramatrix = matrix(c(seq(1, nchar(word)-2), seq(1, nchar(word)-2)+2), ncol = 2, byrow = F) trigram = c() for (i in 1:nrow(trigramatrix)) { trigram = append(trigram,substr(word,trigramatrix[i,1],trigramatrix[i,2])) } return(trigram) } Prov2trigram = lapply(Provincias, word2trigram) # every trigram in the sample Trigrams = levels(factor((unlist(Prov2trigram # we get how many times appears a trigram in a location ocrrnc.mtrx = matrix(rep(0,length(Trigrams)* length(Prov2trigram)), ncol = length(Prov2trigram)) for (i in 1:ncol(ocrrnc.mtrx)) { ocrrnc.mtrx[,i] = as.integer(table(append(Prov2trigram[[i]], Trigrams))-1) } # calculate cosine (often used in NLP) matrizCos = function(X){ X = t(X ) nterm = nrow(X ) modulo = c() cosen = matrix(rep(0,(nterm*nterm)),ncol = nterm) for (i in 1:nterm){ Vec = X [i,] modulo[i] = sqrt(Vec%*%Vec) cosen[,i] = (X %*% Vec) } cosen = (cosen/modulo)/matrix(rep(modulo,nterm),ncol = nterm,byrow=T) cosen[is.nan(cosen)] - 0 return (cosen) } rslt.dst.mat = matrizCos(ocrrnc.mtrx) # and get the clusters attr(rslt.dst.mat , dimnames)-list(Provincias , Provincias ) plot(hclust(as.dist(1-rslt.dst.mat),method = 'med')) I hope this helps, Eduardo San Miguel Martin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] heatmap with levelplot
I played a little bit around and got the following solution which works for now, though it seems to be too complicated to me. If anybody else know another solution - please let me know!!! library(lattice) my.mat - matrix(rnorm(800), nrow = 40) colorFun - colorRampPalette(c(yellow,red)) b - boxplot(my.mat, plot = FALSE) thr - c(b$stats[1],b$stats[5]) col.bins - 100 step - abs(thr[2] - thr[1])/50 limit - ifelse(min(my.mat) thr[1] - step, min(my.mat) - step, min(my.mat)) lp - rev(seq(thr[1] - step, limit - step, -step)) mp - seq(thr[1], thr[2], step) limit - ifelse(max(my.mat) thr[2] + step, max(my.mat) + step, max(my.mat)) up - seq(thr[2] + step, limit + step, step) my.at - c(lp,mp,up) my.col.regions - c(rep(green, length(lp)), colorFun(length(mp)), rep(blue, length(up)) ) levelplot(my.mat, at = my.at, col.regions = my.col.regions) Antje schrieb: Hi there, I'd like to create a heatmap from my matrix with a) a defined color range (lets say from yellow to red) b) using striking colors above and below a certain threshold (above = green, below = blue) Example matrix (there should be a few outliers generated...) + simple levelplot without outliers marked: library(lattice) my.mat - matrix(rnorm(800), nrow = 40) threshold - c(-1,1) # should be used for the extreme colors colorFun - colorRampPalette(c(yellow,red)) levelplot(my.mat, col.regions = colorFun(50)) I don't know how to handle the extrem values... Can anybody help? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mode (statistics) in R?
Hello, I think this will work: names( sort( -table( x ) ) )[1] Regards Patricia García From: c...@datanalytics.com To: jasonkrup...@yahoo.com Date: Mon, 26 Jan 2009 18:34:00 +0500 CC: r-help@r-project.org Subject: Re: [R] Mode (statistics) in R? Hello, You can try ?table. Best regards, Carlos J. Gil Bellosta http://www.datanaytics.com On Mon, 2009-01-26 at 05:28 -0800, Jason Rupert wrote: Hopefully this is a pretty simple question:Is there a function in R that calculates the mode of a sample? That is, I would like to be able to determine the value that occurs the most frequently in a data set. I tried the default R mode function, but it appears to provide a storage type or something else. I tried the RSeek and some R documentation that I downloaded, but nothing seems to mention calculating the mode. Thanks again. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting graph for Missing values
I added patientinformation1 variable and then I gave the command for tapply but its giving me the following error: Error in tapply(pat1, format(dos, %Y%m), function(x) sum(x == 0)) : arguments must have same length seems like you added patientinformation1, but still use pat1 in the tapply call. Bart -- View this message in context: http://www.nabble.com/Plotting-graph-for-Missing-values-tp21659322p21666790.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting data from a PDF-file into R
Hello I have around 200 PDF-documents, containing data i want organized in R as a dataframe. The PDF-documents look like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg or like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg So i want to pull out the data in coloured boxes it become organized like this (just in R instead of excel); http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg So the 0'es and 1'es represent when either PRRS-neg occurs presented by a 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with PRRS-pos VAC or Vac presented by a 1 in the colum PRRS-VAC, and PRRS-pos DK or DK presented by a 1 in the colum PRRS-DK. And also with sanVAC there should be a 1 in the colum VACsan, and with sanDK there should be a 1 in the colum DKsan. The first date for each CHR-nr should either be the earliest date ne the red box (as in the first picture), or the date with word før before the date (as in the second picture). All the 200 PDF-documents looks like the ones in the pictures, each reprenting a different CHR-nr Hope you can help me -- View this message in context: http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21667074.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl unable to download a particular web page -- what is so special about this web page?
Hi, i ran your getURL example and had the same problem with downloading the file. ## R Start.. library(RCurl) toString(getURL(http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2;)) [1] ## R end. However, if it is interesting that if you manually save the page to your desktop, getURL works fine on it: ## R Start.. library(URL) toString(getURL('file:PFO-SBS001//Redirected//tonyb//Desktop//webpage.html')) [1] \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n!DOCTYPE HTML PUBLIC \-//W3C//DTD HTML 4.01 Transitional//EN\ \http://www.w3.org/TR/html4/loose.dtd\; \nhtml\nhead\n\ [etc...] ## R end. very strange indeed.I use RCurl for web crawling every now and again so i would be interested in knowing why this happens too :-) Tony Breyal On 26 Jan, 13:58, clair.crossup...@googlemail.com clair.crossup...@googlemail.com wrote: Dear R-help, There seems to be a web page I am unable to download using RCurl. I don't understand why it won't download: library(RCurl) my.url - http://www.nytimes.com/2009/01/07/technology/business-computing/07pro...; getURL(my.url) [1] Other web pages are ok to download but this is the first time I have been unable to download a web page using the very nice RCurl package. While i can download the webpage using the RDCOMClient, i would like to understand why it doesn't work as above please? library(RDCOMClient) my.url - http://www.nytimes.com/2009/01/07/technology/business-computing/07pro...; ie - COMCreate(InternetExplorer.Application) txt - list() ie$Navigate(my.url) NULL while(ie[[Busy]]) Sys.sleep(1) txt[[my.url]] - ie[[document]][[body]][[innerText]] txt $`http://www.nytimes.com/2009/01/07/technology/business-computing/ 07program.html?_r=2` [1] Skip to article Try Electronic Edition Log ... Many thanks for your time, C.C Windows Vista, running with administrator privileges. sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RDCOMClient_0.92-0 RCurl_0.94-0 loaded via a namespace (and not attached): [1] tools_2.8.1 __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting graph for Missing values
Hi Jim r-help-boun...@r-project.org napsal dne 26.01.2009 15:44:32: From your original posting: I tried the code which u provided. In place of dos in command pat1 - rbinom(length(dos), 1, .5) # generate some data I added patientinformation1 variable and then I gave the command for tapply but its giving me the following error: Error in tapply(pat1, format(dos, %Y%m), function(x) sum(x == 0)) : arguments must have same length I would say that pat1 and dos were not of the same length. Check your code and objects to verify this; that is what the error message is saying. You said you added the patientinformation1 variable, but it does not seem to appear in the error message. You are really patient. I presume Shreyasee does not know much about data structures and function use in R. It probably could help a lot if s/he looked into same basic documents like R intro. If I understand correctly what was done is pat1 - rbinom(length(patientinformation1), 1, .5) what does not make much sense as it code an artificial data as well and most probably there is dos version in memory which was constructed during testing your code and which has length 335. This could result in mentioned error Error in tapply(pat1, format(dos, %Y%m), function(x) sum(x == 0)) : arguments must have same length Then note ds - read.csv(file=D:/Shreyasee laptop data/ASC Dataset/Subset of the ASC Dataset.csv, header=TRUE) attach(ds) str(dos) if str(ds) is issued, it could reveal what kind of data s/he has. Also format(dos, ...) would not work as dos is factor not Date str(dos) I am getting the following message: Factor w/ 12 levels -00-00,6-Aug,..: 6 6 6 6 6 6 6 6 6 6 ... If it was aggregate(ds[,-1], list(format(ds$dos, %Y%m)), function(x) sum(x==0)) Group.1 pat1 pat2 1 200605 12 16 2 200606 20 18 3 200607 12 13 4 200608 18 15 5 200609 18 11 6 200610 17 15 7 200611 19 17 8 200612 14 15 9 200701 14 18 10 200702 13 13 11 200703 16 19 could do the trick if patientinformation variables had the same structure as you anticipate which is not true *for(i in 1:length(dos)) for(j in 1:length(patientinformation1) if(dos[i]==May-06 patientinformation1[j]==) a - j+1 Well, if Shreyasee manage to redefine dos to Date mode (which will not be straightforward if dos has awkward structure), then something like aggregate(ds[,-1], list(format(ds$dos, %Y%m)), function(x) sum(x==)) could do the trick. Regards Petr On Sun, Jan 25, 2009 at 11:48 PM, Shreyasee shreyasee.prad...@gmail.com wrote: Hi Jim, I run the following code ds - read.csv(file=D:/Shreyasee laptop data/ASC Dataset/Subset of the ASC Dataset.csv, header=TRUE) attach(ds) str(dos) I am getting the following message: Factor w/ 12 levels -00-00,6-Aug,..: 6 6 6 6 6 6 6 6 6 6 ... Thanks, Shreyasee On Mon, Jan 26, 2009 at 12:20 PM, jim holtman jholt...@gmail.com wrote: do: str(dos) str(patientinformation1) They must be the same length for the command to work: must be a one to one match of the data. On Sun, Jan 25, 2009 at 10:23 PM, Shreyasee shreyasee.prad...@gmail.com wrote: Hi Jim, I tried the code which u provided. In place of dos in command pat1 - rbinom(length(dos), 1, .5) # generate some data I added patientinformation1 variable and then I gave the command for tapply but its giving me the following error: Error in tapply(pat1, format(dos, %Y%m), function(x) sum(x == 0)) : arguments must have same length Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:50 AM, jim holtman jholt...@gmail.com wrote: YOu can save the output of the tapply and then replicate it for each of the variables. The data can be used to plot the graphs. On Sun, Jan 25, 2009 at 9:38 PM, Shreyasee shreyasee.prad...@gmail.com wrote: Hi Jim, I need to calculate the missing values in variable patientinformation1 for the period of May 2006 to March 2007 and then plot the graph of the percentage of the missing values over these months. This has to be done for each variable. The code which you have provided, calculates the missing values for the months variable, am I right? I need to calculate for all the variables for each month. Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:29 AM, jim holtman jholt...@gmail.com wrote: Here is an example of how you might approach it: dos - seq(as.Date('2006-05-01'), as.Date('2007-03-31'), by='1 day') pat1 - rbinom(length(dos), 1, .5) # generate some data # partition by month and then list out the number of zero values (missing) tapply(pat1, format(dos, %Y%m), function(x) sum(x==0)) 200605 200606 200607 200608 200609 200610 200611 200612 200701 200702
[R] Help with sas.get
Dear R-users, I am seeking advises on the sas.get function from the Hmisc package. I have tried to import some of our SAS files using the syntax presented in the function help example but the importation always failed. The function does not seem to recognize our sas files and complains about the lack of format library files (I am not SAS proficient, but I guess that is what the formats.sas7bcat is, isn't?). Currently, my working directly contain different .sas7bdat files but no .sas7bcat files. Is the existence of format files assumed by the function? I would really appreciate if a user experienced with this function could provide some guidance. Thank you Working environment: R 2.8.1 is installed on linux machines with the most recent version of the Hmisc package; SAS 9 runs on a Solaris based system. ### Code mypath - /home/sbihorel/my_documents/Testing_env/SAS_dataset_R_import mydf - sas.get(library=mypath,member=test) ### Error message Error in sas.get(library = mypath, member = test) : SAS output files not found In addition: Warning message: In sas.get(library = mypath, member = test) : /home/sbihorel/my_documents/Testing_env/SAS_dataset_R_import/formats.sc? or formats.sas7bcat not found. Formatting ignored. Execution halted -- Sebastien Bihorel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting data from a PDF-file into R
joe1985 wrote: Hello I have around 200 PDF-documents, containing data i want organized in R as a dataframe. The PDF-documents look like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg or like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg So i want to pull out the data in coloured boxes it become organized like this (just in R instead of excel); http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg So the 0'es and 1'es represent when either PRRS-neg occurs presented by a 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with PRRS-pos VAC or Vac presented by a 1 in the colum PRRS-VAC, and PRRS-pos DK or DK presented by a 1 in the colum PRRS-DK. And also with sanVAC there should be a 1 in the colum VACsan, and with sanDK there should be a 1 in the colum DKsan. The first date for each CHR-nr should either be the earliest date ne the red box (as in the first picture), or the date with word før before the date (as in the second picture). All the 200 PDF-documents looks like the ones in the pictures, each reprenting a different CHR-nr Hope you can help me Not on the basis of .jpeg files, I think. We'd need some indication of what the PDF looks like inside. There's a tool called pdftotext, which might do something for you, IF you can figure out reliably where your data begin and end. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting data from a PDF-file into R
On Mon, Jan 26, 2009 at 9:40 AM, Peter Dalgaard p.dalga...@biostat.ku.dk wrote: joe1985 wrote: Hello I have around 200 PDF-documents, containing data i want organized in R as a dataframe. The PDF-documents look like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg or like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg So i want to pull out the data in coloured boxes it become organized like this (just in R instead of excel); http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg So the 0'es and 1'es represent when either PRRS-neg occurs presented by a 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with PRRS-pos VAC or Vac presented by a 1 in the colum PRRS-VAC, and PRRS-pos DK or DK presented by a 1 in the colum PRRS-DK. And also with sanVAC there should be a 1 in the colum VACsan, and with sanDK there should be a 1 in the colum DKsan. The first date for each CHR-nr should either be the earliest date ne the red box (as in the first picture), or the date with word før before the date (as in the second picture). All the 200 PDF-documents looks like the ones in the pictures, each reprenting a different CHR-nr Hope you can help me Not on the basis of .jpeg files, I think. We'd need some indication of what the PDF looks like inside. There's a tool called pdftotext, which might do something for you, IF you can figure out reliably where your data begin and end. An alternative is to outsource the problem. You can get very reasonable data entry quotes from sites like http://www.elance.com/, and depending on how much you value your time this might end up being a much cheaper option than figuring out how to do it programmatically (but not as intellectually satisfying). Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PCALG Package
Hi all, Can anyone help me setup this package so I can use it. I am getting errors with the Rgraphviz package and have tried a number of ways to get this to work. Any help will be greatly appreciated! I am sorta new to R but have been actively trying to get into using it as my main analysis software. Thanks, Brock [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list.files changed in 2.7.0
Hmm. I get exactly the same files and directories with C: and C:/, except for the double slashes now. Previously the two calls to list.files gave exactly the same results. My current directory (getwd()) is not C:. I'm puzzled by your output. -- David -Original Message- From: henrik.bengts...@gmail.com [mailto:henrik.bengts...@gmail.com] On Behalf Of Henrik Bengtsson Sent: Friday, January 23, 2009 8:36 PM To: David Reiner dav...@rhotrading.com Cc: r-help@r-project.org Subject: Re: [R] list.files changed in 2.7.0 And I'm not sure that list.files(C:, full.names=TRUE) returns correct pathnames, because it lists the files in the current directory (of C:), not the root of C:. There is a difference between C: and C:/, and you should get: list.files(C:, full.names=TRUE) [1] C:aFile.txt [2] C:anotherFile.txt list.files(C:/, full.names=TRUE) [1] C:/Documents and Settings [2] C:/Program Files Now we get: list.files(C:, full.names=TRUE) [1] C:/aFile.txt [2] C:/anotherFile.txt list.files(C:/, full.names=TRUE) [1] C://Documents and Settings [2] C://Program Files This causes pathnames - list.files(C:, full.names=TRUE); file.exists(pathnames); to return all FALSE (not expected), whereas, pathnames - list.files(C:); file.exists(pathnames); returns all TRUE (expected). So, that extract slash seems to be the cause. My $.02 /Henrik On Fri, Jan 23, 2009 at 2:42 PM, dav...@rhotrading.com wrote: I just noticed a change in the behavior of list.files from 2.6.1pat to 2.7.0 (I noticed it in 2.8.1 and traced back.) Previously, if the directory ended with a slash, and full.names=TRUE, the names returned had a single slash at the end of the directory, but now there are two. I noticed since I was getting a list of certain files and then grepping in the list for a full name formed with a single slash. (The double slash would be OK if I were opening the file since the OS treats double slash in a path the same as a single slash.) I searched through the release notes, etc., and couldn't find this announced. Try list.files(C:, full.names=TRUE) list.files(C:/, full.names=TRUE) Is there any chance that this could be put back to the single slash behavior? (This was on Windows XP.) Thanks, David L. Reiner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting data from a PDF-file into R
You can convert the pdf to text, then manipulate the output to read only the data. In linux has pdftotext function, in linux you can download the xpdf zip, that contais such function. Best On 1/26/09, joe1985 johan...@dsr.life.ku.dk wrote: Hello I have around 200 PDF-documents, containing data i want organized in R as a dataframe. The PDF-documents look like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg or like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg So i want to pull out the data in coloured boxes it become organized like this (just in R instead of excel); http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg So the 0'es and 1'es represent when either PRRS-neg occurs presented by a 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with PRRS-pos VAC or Vac presented by a 1 in the colum PRRS-VAC, and PRRS-pos DK or DK presented by a 1 in the colum PRRS-DK. And also with sanVAC there should be a 1 in the colum VACsan, and with sanDK there should be a 1 in the colum DKsan. The first date for each CHR-nr should either be the earliest date ne the red box (as in the first picture), or the date with word før before the date (as in the second picture). All the 200 PDF-documents looks like the ones in the pictures, each reprenting a different CHR-nr Hope you can help me -- View this message in context: http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21667074.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RExcel foreground and background server
Dear all, I have a question regarding background and foreground server in RExcel: Can somebody explain the main difference between them? As far as I understood from the RExcel webpage, for both of them one needs rights for access to Windows registries. The only difference that I can see so far is that for the installation of the background server, one needs the R(D)COM package, whereas for the foreground server installation, the rcom package is required. Further more, does anyone know, how Excel proccesses and R processes communicate? Is it possible for more than one Excel-process to communicate with more than one R process? Thank you in advance! Best regards, Irina Ursachi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl unable to download a particular web page -- what is so special about this web page?
clair.crossup...@googlemail.com wrote: Dear R-help, There seems to be a web page I am unable to download using RCurl. I don't understand why it won't download: library(RCurl) my.url - http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2; getURL(my.url) [1] I like the irony that RCurl seems to have difficulties downloading an article about R. Good thing it is just a matter of additional arguments to getURL() or it would be bad news. The followlocation parameter defaults to FALSE, so getURL(my.url, followlocation = TRUE) gets what you want. The way I found this is getURL(my.url, verbose = TRUE) and take a look at the information being sent from R and received by R from the server. This gives * About to connect() to www.nytimes.com port 80 (#0) * Trying 199.239.136.200... * connected * Connected to www.nytimes.com (199.239.136.200) port 80 (#0) GET /2009/01/07/technology/business-computing/07program.html?_r=2 HTTP/1.1 Host: www.nytimes.com Accept: */* HTTP/1.1 301 Moved Permanently Server: Sun-ONE-Web-Server/6.1 Date: Mon, 26 Jan 2009 16:10:51 GMT Content-length: 0 Content-type: text/html Location: http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2009/01/07/technology/business-computing/07program.htmlOQ=_rQ3D3op=42fceb38q2fq5duarq5d3-z8q26--q24jq5djccq7bq5dcmq5dc1q5dq24...@-f-q2anq5dry8h@a88q3dz-dbyq...@q2aq5dc1bq26-q2aq26q5bddfq24df And the 301 is the critical thing here. D. Other web pages are ok to download but this is the first time I have been unable to download a web page using the very nice RCurl package. While i can download the webpage using the RDCOMClient, i would like to understand why it doesn't work as above please? library(RDCOMClient) my.url - http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2; ie - COMCreate(InternetExplorer.Application) txt - list() ie$Navigate(my.url) NULL while(ie[[Busy]]) Sys.sleep(1) txt[[my.url]] - ie[[document]][[body]][[innerText]] txt $`http://www.nytimes.com/2009/01/07/technology/business-computing/ 07program.html?_r=2` [1] Skip to article Try Electronic Edition Log ... Many thanks for your time, C.C Windows Vista, running with administrator privileges. sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RDCOMClient_0.92-0 RCurl_0.94-0 loaded via a namespace (and not attached): [1] tools_2.8.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA with subsampling question
Hi all, I am trying to analyze an experiment I ran, but not sure how to code in R. I have germinated seeds in petri dishes at 3 different temperatures (call it low, med, and high) and 2 different light levels (light and dark). For each seed I have recorded time to germination (not counting those that didn't germinate because I will analyze in a separate ANOVA). Each temperature/light treatment has 5 petri dishes with 10 seeds per dish, for a total of 30 dishes and 300 seeds. The replicate is petri dish, but I want to treat the seeds as subsampling in the error effects. Any help would be appreciated. I have looked at code for mixed models, but want to make sure that I am on the right track. Thanks a lot, Wade __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMISC package: wtd.table()
Norbert NEUWIRTH wrote: oops, that really was easy: (wtd.table(FamTyp.kurz,HGEW,normwt=FALSE,na.rm=TRUE)) instead of (wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) That is one solution. The other is to spell 'weights' correctly :-) Frank sorry for that question ... Am 26.01.2009, 10:30 Uhr, schrieb Norbert NEUWIRTH norbert.s.neuwi...@univie.ac.at: Hi useRs developeRs, I got stuck within a function of the Hmisc package. Sounds easy, hope it is: I got 2 items (FamTyp.kurz, HGEW) of same length, no missings. length(FamTyp.kurz);summary(FamTyp.kurz) [1] 14883 Min. 1st Qu. MedianMean 3rd Qu.Max. 10.00 20.00 21.00 21.66 23.00 31.00 length(HGEW);summary(HGEW) [1] 14883 Min. 1st Qu. MedianMean 3rd Qu.Max. 104.5 409.6 489.4 549.8 623.3 3880.0 Now I simply want to compute a table of unweighted and weighted values. But, ... the weights do not seem to be accepted print(unweighted );print(table(FamTyp.kurz)) [1] unweighted FamTyp.kurz 10 1120 21 22 23 30 31 1755 683 3322 1683 2428 1440 1748 1824 print(weighted );print(wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) [1] weighted Error in wtd.table(FamTyp.kurz, weigths = HGEW, normwt = FALSE, na.rm = TRUE) : unused arguments (weigths = c(495.55949, 495.55949, 678.16378, 678.16378, . any ideas ??? thanx in advance, Norbert -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] commercially supported version of R for 64 -bit Windows?
That's correct: REvolution Computing (whom I work for) is in the process of porting R and packages to 64-bit Windows. The development process has been underway for several months and is near completion. There will be a beta test in February and we expect to release in March. When the beta program is launched it will be announced at http://blog.revolution-computing.com , but if anyone is interested in getting involved sooner please let me know. # David Smith -- David M Smith da...@revolution-computing.com Director of Community, REvolution Computing www.revolution-computing.com Tel: +1 (206) 577-4778 x3203 (Seattle, USA) On Sun, Jan 25, 2009 at 10:47 AM, Dirk Eddelbuettel e...@debian.org wrote: On 25 January 2009 at 04:39, new ruser wrote: | Can anyone please refer me to all firms that offer and/or are developing a | commercially supported version of R for 64 -bit Windows? - Thanks Try contacting Revolution-Computing.com --- to the best of my knowledge they expect to have such a product forthcoming in 2009. 64bit versions have of course been available on Linux / Unix for over a decade so you could use that now. Works great for me on Debian and Ubuntu. Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape problem: id and variable names not being recognized
Hi everyone. Long time listener, first-time caller here. I have a data set that's been melted with the excellent reshape package, but I can't seem to cast it the way I need to. Here's the melted data's structure: str(mdat) 'data.frame': 6978 obs. of 4 variables: $ VehType : Factor w/ 2 levels Car,Truck: 1 1 2 1 1 2 1 1 1 1 ... $ Year: Factor w/ 6 levels 2003,2004,..: 5 1 5 6 6 2 2 3 2 5 ... $ variable: Factor w/ 1 level mpg: 1 1 1 1 1 1 1 1 1 1 ... $ value : num 22.4 21.5 22.6 22.4 25 ... For the purpose of testing, I have stripped out all the variables except for mpg. Casting it without specifying any ids or variables works fine: cast(mdat,,mean) VehType Year mpg 1 Car 2003 22.03623 2 Car 2004 21.94160 3 Car 2005 21.77286 4 Car 2006 21.49105 5 Car 2007 21.38180 6 Car 2008 21.56873 7Truck 2003 16.91461 8Truck 2004 16.88771 9Truck 2005 17.19801 10 Truck 2006 17.48225 11 Truck 2007 17.40694 12 Truck 2008 17.74042 I should then be able to make a crosstab of the means by writing a formula, right? It fails, though: cast(mdat, VehType ~ Year | mpg, mean) Error: Casting formula contains variables not found in molten data: mpg When I make the same table by using variable instead of the name of my variable, it works: cast(mdat, VehType ~ Year | variable, mean) $mpg VehType 2003 2004 2005 2006 2007 2008 1 Car 22.03623 21.94160 21.77286 21.49105 21.38180 21.56873 2 Truck 16.91461 16.88771 17.19801 17.48225 17.40694 17.74042 Why can't it find the mpg variable when I call it explicitly? Thanks, Matt Frost [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl unable to download a particular web page -- what is so special about this web page?
Duncan Temple Lang wrote: clair.crossup...@googlemail.com wrote: Dear R-help, There seems to be a web page I am unable to download using RCurl. I don't understand why it won't download: library(RCurl) my.url - http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2; getURL(my.url) [1] I like the irony that RCurl seems to have difficulties downloading an article about R. Good thing it is just a matter of additional arguments to getURL() or it would be bad news. Don't forget the irony that https is supported in url() and download.file() on Windows but not UNIX... http://tolstoy.newcastle.edu.au/R/e2/devel/07/01/1634.html Jeff The followlocation parameter defaults to FALSE, so getURL(my.url, followlocation = TRUE) gets what you want. The way I found this is getURL(my.url, verbose = TRUE) and take a look at the information being sent from R and received by R from the server. This gives * About to connect() to www.nytimes.com port 80 (#0) * Trying 199.239.136.200... * connected * Connected to www.nytimes.com (199.239.136.200) port 80 (#0) GET /2009/01/07/technology/business-computing/07program.html?_r=2 HTTP/1.1 Host: www.nytimes.com Accept: */* HTTP/1.1 301 Moved Permanently Server: Sun-ONE-Web-Server/6.1 Date: Mon, 26 Jan 2009 16:10:51 GMT Content-length: 0 Content-type: text/html Location: http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2009/01/07/technology/business-computing/07program.htmlOQ=_rQ3D3op=42fceb38q2fq5duarq5d3-z8q26--q24jq5djccq7bq5dcmq5dc1q5dq24...@-f-q2anq5dry8h@a88q3dz-dbyq...@q2aq5dc1bq26-q2aq26q5bddfq24df And the 301 is the critical thing here. D. Other web pages are ok to download but this is the first time I have been unable to download a web page using the very nice RCurl package. While i can download the webpage using the RDCOMClient, i would like to understand why it doesn't work as above please? library(RDCOMClient) my.url - http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2; ie - COMCreate(InternetExplorer.Application) txt - list() ie$Navigate(my.url) NULL while(ie[[Busy]]) Sys.sleep(1) txt[[my.url]] - ie[[document]][[body]][[innerText]] txt $`http://www.nytimes.com/2009/01/07/technology/business-computing/ 07program.html?_r=2` [1] Skip to article Try Electronic Edition Log ... Many thanks for your time, C.C Windows Vista, running with administrator privileges. sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RDCOMClient_0.92-0 RCurl_0.94-0 loaded via a namespace (and not attached): [1] tools_2.8.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape problem: id and variable names not being recognized
Look at your 'str(mdat)' and you will see that there is not a variable call 'mpg'; it is one of the levels of the 'variable'. On Mon, Jan 26, 2009 at 11:38 AM, MW Frost mwfr...@gmail.com wrote: Hi everyone. Long time listener, first-time caller here. I have a data set that's been melted with the excellent reshape package, but I can't seem to cast it the way I need to. Here's the melted data's structure: str(mdat) 'data.frame': 6978 obs. of 4 variables: $ VehType : Factor w/ 2 levels Car,Truck: 1 1 2 1 1 2 1 1 1 1 ... $ Year: Factor w/ 6 levels 2003,2004,..: 5 1 5 6 6 2 2 3 2 5 ... $ variable: Factor w/ 1 level mpg: 1 1 1 1 1 1 1 1 1 1 ... $ value : num 22.4 21.5 22.6 22.4 25 ... For the purpose of testing, I have stripped out all the variables except for mpg. Casting it without specifying any ids or variables works fine: cast(mdat,,mean) VehType Year mpg 1 Car 2003 22.03623 2 Car 2004 21.94160 3 Car 2005 21.77286 4 Car 2006 21.49105 5 Car 2007 21.38180 6 Car 2008 21.56873 7Truck 2003 16.91461 8Truck 2004 16.88771 9Truck 2005 17.19801 10 Truck 2006 17.48225 11 Truck 2007 17.40694 12 Truck 2008 17.74042 I should then be able to make a crosstab of the means by writing a formula, right? It fails, though: cast(mdat, VehType ~ Year | mpg, mean) Error: Casting formula contains variables not found in molten data: mpg When I make the same table by using variable instead of the name of my variable, it works: cast(mdat, VehType ~ Year | variable, mean) $mpg VehType 2003 2004 2005 2006 2007 2008 1 Car 22.03623 21.94160 21.77286 21.49105 21.38180 21.56873 2 Truck 16.91461 16.88771 17.19801 17.48225 17.40694 17.74042 Why can't it find the mpg variable when I call it explicitly? Thanks, Matt Frost [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tinn-R
Hi Everyone, I was hoping someone could help me with the settings for Tinn-R. I see in the screen shots that it has syntax help, or something similar (tips on what functions, etc). I can not seem to get this to turn on in the program, and I am wondering if I have to set up a few options. I quickly read through the help and could not figure it out. Many thanks! - Brock P.S. It appears as if Tinn-R is widely used, but would you recommend something different? I am new to R and programming, but have learned (somewhat) using VBA editors I have grown to love to the intelligent typing that goes along with it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Goodness of fit for gamma distributions
I'm looking for goodness of fit tests for gamma distributions with large data sizes. I have a matrix with around 10,000 data values in it and i have fitted a gamma distribution over a histogram of the data. The problem is testing how well that distribution fits. Chi-squared seems to be used more for discrete distributions and kolmogorov-smirnov seems that large sample sizes make it had to evaluate the D statistic. Also i haven't found a qq plot for gamma, although i think this might be an appropriate test. in summary -is there a gamma goodness of fit test that doesnt depend on the sample size? -is there a way of using qqplot for gamma distributions, if so how would you calculate it from a matrix of data values? regards, Dann -- View this message in context: http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21668711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMISC package: wtd.table()
Frank E Harrell Jr f.harrell at vanderbilt.edu writes: (wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) That is one solution. The other is to spell 'weights' correctly Have pity with us German speakers. It was such a paing to learn th that we cannot resist to apply it whenever pothible. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error managment
Hello R experts! I'm running a FOR loop in which at every step an arima model is generated. The problem is some series produces numeric problems with optim. My question is if there is a way of telling to R that at every critical error of optim jumps to the next series instead of stopping the calculations. Or better yet, tell it to run another arima fit but with a different optmization algorithm. Thanks!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error managment
?try On Mon, Jan 26, 2009 at 1:05 PM, diego Diego dhab...@gmail.com wrote: Hello R experts! I'm running a FOR loop in which at every step an arima model is generated. The problem is some series produces numeric problems with optim. My question is if there is a way of telling to R that at every critical error of optim jumps to the next series instead of stopping the calculations. Or better yet, tell it to run another arima fit but with a different optmization algorithm. Thanks!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with table()
Hey everyone, I am looking for the easiest way to get this table # Table2 Year / 2000 / 2002 / 2004 Julia / 3 / 4 / 1 Peter / 1 / 2 / 4 ... / ... / ... / ... out of this one? # Table1 name / year / cases Julia / 2000 / 1 Julia / 2000 / 2 Julia / 2002 / 4 Peter / 2000 / 1 Julia / 2004 / 1 Peter / 2004 / 2 Peter / 2002 / 2 Peter / 2004 / 2 ... / ... / ... Code for table1: name - c('Julia','Julia','Julia','Peter','Julia','Peter','Peter','Peter') year - c(2000,2000,2002,2000,2004,2004,2002,2004) cases - c(1,2,4,1,1,2,2,2) table1 - data.frame(name,year,cases) Thanks! Dominik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with table()
on 01/26/2009 12:23 PM Dominik Hattrup wrote: Hey everyone, I am looking for the easiest way to get this table # Table2 Year / 2000 / 2002 / 2004 Julia / 3 / 4 / 1 Peter / 1 / 2 / 4 ... / ... / ... / ... out of this one? # Table1 name / year / cases Julia / 2000 / 1 Julia / 2000 / 2 Julia / 2002 / 4 Peter / 2000 / 1 Julia / 2004 / 1 Peter / 2004 / 2 Peter / 2002 / 2 Peter / 2004 / 2 ... / ... / ... Code for table1: name - c('Julia','Julia','Julia','Peter','Julia','Peter','Peter','Peter') year - c(2000,2000,2002,2000,2004,2004,2002,2004) cases - c(1,2,4,1,1,2,2,2) table1 - data.frame(name,year,cases) Thanks! Dominik table() generates frequencies from individual values, not from already tabulated data. In this case, you can use xtabs(): xtabs(cases ~ name + year, data = table1) year name2000 2002 2004 Julia341 Peter124 See ?xtabs An alternative would be to use tapply(): with(table1, tapply(cases, list(name = name, year = year), sum)) year name2000 2002 2004 Julia341 Peter124 See ?tapply HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Large regular expressions
Given a vector of reference strings Ref and a vector of test strings Test, I would like to find elements of Test which do not contain elements of Ref as \b-delimited substrings. This can be done straightforwardly for length(Ref) 6000 or so (R 2.8.1 Windows) by constructing a pattern like \b(a|b|c)\b, but not for larger Refs (see below). The easy workaround for this is to split Ref into smaller subsets and test each subset separately. Is there a better solution e.g. along the lines of fgrep? My real data have length(Ref) == 6 or more. -s - Example Test - as.character(floor(runif(2000,1,2))) # Real data is short phrases testing - function(n) { Ref - as.character(1:n) # Real data is sentences Pat - paste('\\b(',paste(Ref,collapse=|),')\\b',sep='') grep(Pat,Test) } testing(2000) = no problem However, testing(1) gives an error message (invalid regular expression) and a warning (memory exhausted), and testing(10) crashes R (Process R exited abnormally with code 5). Using grep(...,perl=TRUE) as suggested in the man page also fails with testing(1), though it gives a more helpful error message (regular expression is too large) without crashing the process. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] name scoping within dataframe index
Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? Cheers, Alexy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with table()
Marc Schwartz schrieb: on 01/26/2009 12:23 PM Dominik Hattrup wrote: Hey everyone, I am looking for the easiest way to get this table # Table2 Year / 2000 / 2002 / 2004 Julia / 3 / 4 / 1 Peter / 1 / 2 / 4 ... / ... / ... / ... out of this one? # Table1 name / year / cases Julia / 2000 / 1 Julia / 2000 / 2 Julia / 2002 / 4 Peter / 2000 / 1 Julia / 2004 / 1 Peter / 2004 / 2 Peter / 2002 / 2 Peter / 2004 / 2 ... / ... / ... Code for table1: name - c('Julia','Julia','Julia','Peter','Julia','Peter','Peter','Peter') year - c(2000,2000,2002,2000,2004,2004,2002,2004) cases - c(1,2,4,1,1,2,2,2) table1 - data.frame(name,year,cases) Thanks! Dominik table() generates frequencies from individual values, not from already tabulated data. In this case, you can use xtabs(): xtabs(cases ~ name + year, data = table1) year name2000 2002 2004 Julia341 Peter124 See ?xtabs An alternative would be to use tapply(): with(table1, tapply(cases, list(name = name, year = year), sum)) year name2000 2002 2004 Julia341 Peter124 See ?tapply HTH, Marc Schwartz Thank You for the quick answer. Works perfect! Dominik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] name scoping within dataframe index
On 1/26/2009 1:46 PM, Alexy Khrabrov wrote: Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? If you did that, it would be quite difficult to get at a colname variable that *isn't* the column of df. It would be something like df[get(colname, parent.frame()) == value,] So just use subset(), or with(), or type the extra 3 chars. Duncan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large regular expressions
I am using R.version.string # Vista [1] R version 2.8.1 Patched (2008-12-26 r47350) and it also caused R to actually crash for me. On Mon, Jan 26, 2009 at 1:38 PM, Stavros Macrakis macra...@alum.mit.edu wrote: Given a vector of reference strings Ref and a vector of test strings Test, I would like to find elements of Test which do not contain elements of Ref as \b-delimited substrings. This can be done straightforwardly for length(Ref) 6000 or so (R 2.8.1 Windows) by constructing a pattern like \b(a|b|c)\b, but not for larger Refs (see below). The easy workaround for this is to split Ref into smaller subsets and test each subset separately. Is there a better solution e.g. along the lines of fgrep? My real data have length(Ref) == 6 or more. -s - Example Test - as.character(floor(runif(2000,1,2))) # Real data is short phrases testing - function(n) { Ref - as.character(1:n) # Real data is sentences Pat - paste('\\b(',paste(Ref,collapse=|),')\\b',sep='') grep(Pat,Test) } testing(2000) = no problem However, testing(1) gives an error message (invalid regular expression) and a warning (memory exhausted), and testing(10) crashes R (Process R exited abnormally with code 5). Using grep(...,perl=TRUE) as suggested in the man page also fails with testing(1), though it gives a more helpful error message (regular expression is too large) without crashing the process. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] name scoping within dataframe index
Try: subset(df, colname == value) On Mon, Jan 26, 2009 at 1:46 PM, Alexy Khrabrov delivera...@gmail.com wrote: Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? Cheers, Alexy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with colormodel in pdf driver
Greg Snow wrote: You may want to consider a dotchart instead of a barplot. Then you can distinguish between groups by using symbols, grouping, and labels rather than depending on colors/shades of grey. Thanks Greg. The only problem is that I was trying to illustrate the use of barplot() ... I guess for now I can always use the pdf() driver with the default RGB colormodel and then use command line tools (e.g. ImageMagick) to convert the resulting graphs to grayscale... Thanks all for the help. Luis -- Luis Torgo FEP/LIAAD - INESC Porto, LA Phone : (+351) 22 339 20 93 University of Porto Fax : (+351) 22 339 20 99 R. de Ceuta, 118, 6o email : lto...@liaad.up.pt 4050-190 PORTO - PORTUGAL WWW : http://www.liaad.up.pt/~ltorgo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] name scoping within dataframe index
On 1/26/2009 1:46 PM, Alexy Khrabrov wrote: Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? If you did that, it would be quite difficult to get at a colname variable that *isn't* the column of df. It would be something like df[get(colname, parent.frame()) == value,] Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first. It would break legacy code which used the column names identical to variables in this context, but there's probably other ideas to enhance R readability which would break legacy code. Perhaps when the next major overhaul occurs, this is something folks can voice opinions about. I find the need for inner prefixing quite unnatural, FWIW. Cheers, Alexy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] name scoping within dataframe index
On 1/26/2009 2:01 PM, Alexy Khrabrov wrote: On 1/26/2009 1:46 PM, Alexy Khrabrov wrote: Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? If you did that, it would be quite difficult to get at a colname variable that *isn't* the column of df. It would be something like df[get(colname, parent.frame()) == value,] Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first. Yes, I understood that, and I explained why it would be a bad idea. Duncan Murdoch It would break legacy code which used the column names identical to variables in this context, but there's probably other ideas to enhance R readability which would break legacy code. Perhaps when the next major overhaul occurs, this is something folks can voice opinions about. I find the need for inner prefixing quite unnatural, FWIW. Cheers, Alexy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tinn-R
Hi, if you are on linux Emacs + ESS is quite popular too. Cheers Anna Anna Freni Sterrantino Ph.D Student Department of Statistics University of Bologna, Italy via Belle Arti 41, 40124 BO. Da: Tibert, Brock btib...@bentley.edu A: r-help@r-project.org r-help@r-project.org Inviato: Lunedì 26 gennaio 2009, 18:01:25 Oggetto: [R] Tinn-R Hi Everyone, I was hoping someone could help me with the settings for Tinn-R. I see in the screen shots that it has syntax help, or something similar (tips on what functions, etc). I can not seem to get this to turn on in the program, and I am wondering if I have to set up a few options. I quickly read through the help and could not figure it out. Many thanks! - Brock P.S. It appears as if Tinn-R is widely used, but would you recommend something different? I am new to R and programming, but have learned (somewhat) using VBA editors I have grown to love to the intelligent typing that goes along with it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tinn-R
First thing you need to do is, save the file with an .r ending. Tinn-R will then come up with syntax highlighting (as far as i remember). Options-Main-Application leads you to the r-configuration... HTH Thomas Tibert, Brock schrieb: Tibert, Brock schrieb: Hi Everyone, I was hoping someone could help me with the settings for Tinn-R. I see in the screen shots that it has syntax help, or something similar (tips on what functions, etc). I can not seem to get this to turn on in the program, and I am wondering if I have to set up a few options. I quickly read through the help and could not figure it out. Many thanks! - Brock P.S. It appears as if Tinn-R is widely used, but would you recommend something different? I am new to R and programming, but have learned (somewhat) using VBA editors I have grown to love to the intelligent typing that goes along with it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] name scoping within dataframe index
On Jan 26, 2009, at 2:12 PM, Duncan Murdoch wrote: df[get(colname, parent.frame()) == value,] Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first. Yes, I understood that, and I explained why it would be a bad idea. Well this is the case in all programming languages with scoping where inner-scope variables override the outer ones. Usually it's solved with prefixing with the outer scope, outercsope.name or outerscope::name or so. So it only underscores the need to improve scoping access in R. Dataframe column names belong to the dataframe object and the natural thing would be to enable easy access to naming; you'd need to apply an extra effort to access an overridden unrelated external variable. Again, just an analogy from other programming languages. Cheers, Alexy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spectral analysis with mtm-svd Multi-Taper Method Combined with Singular Value Decomposition
Hi list, Does anyone know if there is a library in R that does MTM-SVD method for spectral analysis? Thanks - Yasir H. Kaheil Columbia University -- View this message in context: http://www.nabble.com/Spectral-analysis-with-mtm-svd-Multi-Taper-Method-Combined-with-Singular-Value-Decomposition-tp21671934p21671934.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] randomSurvivalForest plotting
i would like to plot a subset of variables with the highest variable importance measures (say the top 20) instead of plotting all of the variables included in the analysis (~75). i tried arguments that work to restrict the number of variables displayed in the plot in randomForest as follows: plot(rsfCauc.out,sort=TRUE,n.var=min(30,nrow(rsfCauc.out$importance)),type=TRUE,class=NULL,scale=TRUE,main=deparse(substitute(rsfCauc.out))) however, that code only resulted in the plot with all 75 variables. help would be much appreciated. many thanks! -- View this message in context: http://www.nabble.com/randomSurvivalForest-plotting-tp21672013p21672013.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] name scoping within dataframe index
On 1/26/2009 2:20 PM, Alexy Khrabrov wrote: On Jan 26, 2009, at 2:12 PM, Duncan Murdoch wrote: df[get(colname, parent.frame()) == value,] Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first. Yes, I understood that, and I explained why it would be a bad idea. Well this is the case in all programming languages with scoping where inner-scope variables override the outer ones. Usually it's solved with prefixing with the outer scope, outercsope.name or outerscope::name or so. So it only underscores the need to improve scoping access in R. Dataframe column names belong to the dataframe object and the natural thing would be to enable easy access to naming; you'd need to apply an extra effort to access an overridden unrelated external variable. Again, just an analogy from other programming languages. The issue is that in most cases the outer scope would be unnamed: it's the one that currently doesn't need a prefix. So if we have a prefix meaning this scope, why wouldn't that evaluate to df in that context? I guess we need a prefix meaning the caller's scope, but that's just going to lead to confusion: is it the caller of the function that is trying to index df, or the function trying to do the indexing? So we'd need a prefix specific to indexing: and that's just too ugly for words. As I said, use subset() or with(). For subset selection, subset() works very nicely. (I don't like the way it does column selection, but that's a different argument.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMISC package: wtd.table()
Dieter Menne wrote: Frank E Harrell Jr f.harrell at vanderbilt.edu writes: (wtd.table(FamTyp.kurz,weigths=HGEW,normwt=FALSE,na.rm=TRUE)) That is one solution. The other is to spell 'weights' correctly Have pity with us German speakers. It was such a paing to learn th that we cannot resist to apply it whenever pothible. Dieter Good point Dieter. I'm still learning english myself. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with colormodel in pdf driver
On Mon, 26 Jan 2009, Luis Torgo wrote: Greg Snow wrote: You may want to consider a dotchart instead of a barplot. Then you can distinguish between groups by using symbols, grouping, and labels rather than depending on colors/shades of grey. Thanks Greg. The only problem is that I was trying to illustrate the use of barplot() ... I guess for now I can always use the pdf() driver with the default RGB colormodel and then use command line tools (e.g. ImageMagick) to convert the resulting graphs to grayscale... You won't be able to convert PDF to PDF with ImageMagick (possible with a helper). Thanks all for the help. Or update your R, as the posting guide suggested. It works in R-patched and R-devel. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [stats-rosuda-devel] Problem with JGR. Was: Re: Using help()
On Jan 25, 2009, at 9:35 , Michael Kubovy wrote: Dear Friends, Thanks to Rolf Turner, Brian Ripley and Patrick Burns for their answers.They don't quite resolve the problem, which I now realize is due to non-standard behavior of JGR, at least on my machine (I verified that Mac GUI works entirely as expected): My installation Running the JGR GUI: sessionInfo() R version 2.8.1 (2008-12-22) i386-apple-darwin8.11.1 locale: C/C/en_US/C/C/C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] JGR_1.6-2 iplots_1.1-2JavaGD_0.5-2rJava_0.6-1 [5] MASS_7.2-45 lattice_0.17-20 loaded via a namespace (and not attached): [1] tools_2.8.1 What happens with ? and ?? ** ? is interpreted by JGR (re-mapped to internal call to help followed by help.search if no topics were found) instead of R. So JGR is smarter than R used to be, but that has changed in R 2.8. Unfortunately R has currently no publicly available API to support what the Mac-GUI does, because it uses a nasty trick by modifying R's sources to hook inside R. I'm working on fixing this for R 2.9.0-to- be, but currently JGR is out of luck and has to rely on its own attempts to parse the command line, so the results will vary until then. If I type ?normal I get the long list, not No documentation found. When I type ?plot I get the help page for plot {JM}, and not plot.default {graphics}; when I type ?dnorm I get a rather long list of help pages. If I type ??normal I get ?normal.htm .com.symantec.APSock .com.symantec.aptmp .DM_1039:1232634821l:DlnIrq .DM_11869:1232818209l:m4AGyL .DM_13345:1232655220l:C1js39 .DM_14309:1232822090l:e6wvqw .DM_15688:1232659145l:ffZvPg .DM_16640:1232825979l:n5TrAz .DM_18040:1232662823l:Gb81yX … Another JGR problem ** Help pages for newly installed packages are accessible only after JGR is restarted. I can see what could cause that, but in theory that should affect all html-based systems if R really doesn't update the links. I didn't actually look but it's possible that JGR just needs to call make.packages.html() -- in effect, try calling that function and if that solves your problem that's what's it is ... Cheers, S Thanks, MK On Jan 24, 2009, at 8:54 PM, Rolf Turner wrote: On 25/01/2009, at 2:33 PM, Michael Kubovy wrote: … (1) If I type ?normal because I forgot the name dnorm() I get a long list of relevant pages. Getting to right page is laborious. (2) If I remember dnorm() and want to be reminded of the call, I also get a list of pages. … … If you type ``?normal'' you get a ``No documentation found'' message. If you type ``??normal'' you indeed get a long list of pages, some of which might be relevant. (If you want help on ``dnorm'' then the relevant page is stats::Normal. And then typing ``?Normal'' gets you what you want. Which is somewhat on the obscure side of obvious, IMHO.) If you type ``?dnorm'' then you get exactly what you want immediately. Exactly? Well, there's also info on pnorm, qnorm, and rnorm, but I expect you can live with that. … Rolf Turner ___ stats-rosuda-devel mailing list stats-rosuda-de...@listserv.uni-augsburg.de http://mailman.rz.uni-augsburg.de/mailman/listinfo/stats-rosuda-devel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMISC package: wtd.table()
Frank E Harrell Jr wrote: I'm still learning english myself. Including capitalization rules? -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Power analysis for MANOVA?
Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
http://www.amazon.com/Statistical-Power-Analysis-Behavioral-Sciences/dp/0805802835 Cohen's book was in fact the basis for the pwr package at CRAN. And it does have a MANOVA power analysis, which was left out of the pwr package. On Mon, Jan 26, 2009 at 4:12 PM, Adam D. I. Kramer a...@ilovebacon.org wrote: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Due to the recession, requests for instant gratification will be deferred until arrears in scheduled gratification have been satisfied. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
Hi Adam, My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave'ing Danish characters
Hi, I am writing an Sweave document and am using 'xtable' to make frequency tables of diagnoses of people undergoing cholecystectomy. Some of these diagnoses contain Danish characters (æ, ø, and å), and these characters are all garbled in the Latex document after I run Sweave. The odd thing is, everything looks absolutely right in the R console, and if I enter the same Danish characters in a new variable, the new variable produces no problems?! Therefore, I cannot offer a reproducible example, but I am hoping nonetheless that someone can point me towards a solution. To illustrate: library(xtable) library(Hmisc) rm(list=ls()) load(u:/kirurgi/cholecystit/Chol_oprenset.Rdata) test2 - chol$nydiag[3] # This 3rd observation contains a diagnosis with Danish characters (Kræft i fordøjelsessystemet, meaning gastrointestinal cancer). print(xtable(table(test2))) % latex table generated in R 2.8.1 by xtable 1.5-4 package % Mon Jan 26 23:31:37 2009 \begin{table}[ht] \begin{center} \begin{tabular}{rr} \hline test2 \\ \hline Kræft i fordøjelsessystemet1 \\# It looks right here, but in the .tex-file it says Kræft i fordøjelsessystemet \hline \end{tabular} \end{center} \end{table} print(xtable(table(Kræft i fordøjelsessystemet))) # This, on the other hand, works like a charm. % latex table generated in R 2.8.1 by xtable 1.5-4 package % Mon Jan 26 23:36:53 2009 \begin{table}[ht] \begin{center} \begin{tabular}{rr} \hline V1 \\ \hline Kræft i fordøjelsessystemet1 \\# See, no problems here! \hline \end{tabular} \end{center} \end{table} I am using Windows Vista 64-bit and MikTex 2.7. Best regards, Peter. sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=Danish_Denmark.1252;LC_CTYPE=Danish_Denmark.1252;LC_MONETARY=Danish_Denmark.1252;LC_NUMERIC=C;LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.4-4foreign_0.8-30 xtable_1.5-4 loaded via a namespace (and not attached): [1] cluster_1.11.12 grid_2.8.1 lattice_0.17-20 tools_2.8.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] suppressing time shift in plot of POSIXct object?
Friends, I have a POSIXct vector located in EST timezone. When I plot against it here in PST, the time axis is shifted 3 hours back in time. IOW, plot adjusts for time zone difference. Now that's really great, if that's what one wants. However, I want time axis to use actual times in object (without any shift). For example: n - 360 y - rnorm(n) t - seq(from = as.POSIXct(2009-01-26 12:00:00, tz = EST), by = 60, length.out = n) head(t) #[ 1] 2009-01-26 12:00:00 EST 2009-01-26 12:01:00 EST 2009-01-26 12:02:00 EST # [4] 2009-01-26 12:03:00 EST 2009-01-26 12:04:00 EST 2009-01-26 12:05:00 EST Sys.timezone() # [1] PST #But doing: plot(y ~ t, type = l) results in plot starting at 09:00 (here in California) I've poked around in help, etc but haven't any way to force use of timezone in t. What am I missing? TIA, Jim Porzak TGN.com San Francisco, CA http://www.linkedin.com/in/jimporzak use R! Group SF: http://ia.meetup.com/67/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with colormodel in pdf driver
Prof Brian Ripley wrote: On Mon, 26 Jan 2009, Luis Torgo wrote: Greg Snow wrote: You may want to consider a dotchart instead of a barplot. Then you can distinguish between groups by using symbols, grouping, and labels rather than depending on colors/shades of grey. Thanks Greg. The only problem is that I was trying to illustrate the use of barplot() ... I guess for now I can always use the pdf() driver with the default RGB colormodel and then use command line tools (e.g. ImageMagick) to convert the resulting graphs to grayscale... You won't be able to convert PDF to PDF with ImageMagick (possible with a helper). Well actually I can and it worked perfectly. Just did: $ mogrify *.pdf -type Grayscale and all my PDFs got changed from RGB to Grayscale. Maybe a problem with versions of ImageMagick mine is $ mogrify -version Version: ImageMagick 6.3.7 08/21/08 Q16 http://www.imagemagick.org Copyright: Copyright (C) 1999-2008 ImageMagick Studio LLC Thanks all for the help. Or update your R, as the posting guide suggested. It works in R-patched and R-devel. That's good news. I'll give it a try, thanks. -- Luis Torgo FEP/LIAAD - INESC Porto, LA Phone : (+351) 22 339 20 93 University of Porto Fax : (+351) 22 339 20 99 R. de Ceuta, 118, 6o email : lto...@liaad.up.pt 4050-190 PORTO - PORTUGAL WWW : http://www.liaad.up.pt/~ltorgo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
If you know what a 'general linear hypothesis test' is see http://cran.r-project.org/src/contrib/Archive/hpower/hpower_0.1-0.tar.gz HTH, Chuck On Mon, 26 Jan 2009, Adam D. I. Kramer wrote: On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in segmented() output from segmented package
Hi- I'm gettting the following error message when trying to use the segmented function to look for breakpoints in my data. Error in segmented.glm(glm, seg.Z = ~segmentdist, psi = 2, control = seg.control(display = F), : (Some) estimated psi out of its range Here are some real data and the models I'm calling which gives the error above. segmentdist [1] 0.00 8.547576 12.700485 13.291767 15.701552 17.567891 18.936836 19.846242 20.325434 20.397607 20.066126 17.976218 16.772871 16.513030 16.434075 [16] 16.508426 16.717404 17.049235 17.501350 18.077070 dal [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 lm-lm(data=df, segmentdist~dal) lm(formula = segmentdist ~ dal, data = df) Coefficients: (Intercept) dal 13.77564 -0.06682 seg-segmented(lm, seg.Z=~segmentdist, psi=2, control=seg.control(display=F), model.frame=T) The range of the data I'm looking for breaks in is min=0, max=44.5, so I don't understand how my psi=2 could be out of range. Thanks for your help, Tim -- View this message in context: http://www.nabble.com/Error-in-segmented%28%29-output-from-segmented-package-tp21674240p21674240.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why two diff. se in nlsList?
Hi list, In the object returned by summary.nlsList, what's the difference between coefficients and parameters? The have the same Estimate, different se (therefore t value), but same p values. R.2.8.0 on winxp with nlme_3.1-89 Thanks, ...Tao + library(nlme) fm1 - nlsList(uptake ~ SSasympOff(conc, Asym, lrc, c0), data = CO2, start = c(Asym = 30, lrc = -4.5, c0 = 52)) summary(fm1)$para[,,1] Estimate Std. Error t value Pr(|t|) Qn1 38.13977 0.9911148 38.48169 1.991990e-06 Qn2 42.87169 1.0932089 39.21638 2.583953e-06 Qn3 44.22800 1.0241029 43.18706 1.809264e-07 Qc1 36.42874 1.1941594 30.50576 1.140085e-05 Qc3 40.68373 1.2480923 32.59673 1.424635e-04 Qc2 39.81950 1.0167249 39.16447 2.692304e-06 Mn3 28.48286 1.0624246 26.80930 1.066434e-06 Mn2 32.12827 1.0174826 31.57624 3.488786e-06 Mn1 34.08482 1.3400596 25.43530 4.199333e-06 Mc2 13.55519 1.0506404 12.90184 4.385886e-06 Mc3 18.53506 0.8363371 22.16219 1.461563e-06 Mc1 21.78723 1.4113318 15.43735 5.756870e-06 summary(fm1)$coef[,,1] Estimate Std. Error t value Pr(|t|) Qn1 38.13977 0.9163882 41.61967 1.991990e-06 Qn2 42.87169 1.0994599 38.99341 2.583953e-06 Qn3 44.22800 0.5829894 75.86415 1.809264e-07 Qc1 36.42874 1.3556273 26.87224 1.140085e-05 Qc3 40.68373 2.8632576 14.20890 1.424635e-04 Qc2 39.81950 1.0317496 38.59415 2.692304e-06 Mn3 28.48286 0.5852408 48.66861 1.066434e-06 Mn2 32.12827 0.8883225 36.16735 3.488786e-06 Mn1 34.08482 0.9872439 34.52522 4.199333e-06 Mc2 13.55519 0.3969189 34.15104 4.385886e-06 Mc3 18.53506 0.4121147 44.97549 1.461563e-06 Mc1 21.78723 0.6830001 31.89930 5.756870e-06 _ ore_012009 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in Surv(time, status) : Time variable is not numeric
Dear, I want to analyze two-level survival data using a shared frailty model, for which I want to use the R package 'Frailtypack, proposed by Rondeau et al. The dataset was built using SAS software. I also tried to change the format using SPSS and Excell. My (reduced) dataset has following column names: ID entrytimestatusfamily var1 I used following command: frailtyPenal(Surv(time, status) ~var1 + cluster(family), Frailty=TRUE ,n.knots=8, kappa1=1500, + cross.validation=FALSE) And got this error : Error in Surv(time, status) : Time variable is not numeric In addition: Warning message: In is.na(time) : is.na() applied to non-(list or vector) of type 'closure' I think R transforms the data when importing into R, so that the observations are not numeric anymore. Does anyone know how to handle this problem? Thanks, Marie -- View this message in context: http://www.nabble.com/Error-in-Surv%28time%2C-status%29-%3A-Time-variable-is-not-numeric-tp21674025p21674025.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
On Mon, 26 Jan 2009, Charles C. Berry wrote: If you know what a 'general linear hypothesis test' is see http://cran.r-project.org/src/contrib/Archive/hpower/hpower_0.1-0.tar.gz I do, and am quite interested, however this package will not install on R 2.8.1: First, it said that there was no maintainer in the description, so I added one (figuring that the 1991 date of the package was to blame), however it still will not compile: parmesan:tmp$ sudo R CMD INSTALL hpower/ * Installing to library '/usr/local/lib/R/library' * Installing *source* package 'hpower' ... ** R ** preparing package for lazy loading Error in parse(n = -1, file = file) : unexpected '{' at 5: ## 6: pfnc_function(q,df1,df2,lm,iprec=c(6)) { Calls: Anonymous - code2LazyLoadDB - sys.source - parse Execution halted ERROR: lazy loading failed for package 'hpower' ** Removing '/usr/local/lib/R/library/hpower' parmesan:tmp$ ...any tips? --Adam HTH, Chuck On Mon, 26 Jan 2009, Adam D. I. Kramer wrote: On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suppressing time shift in plot of POSIXct object?
Try: Sys.setenv(TZ=EST) plot(y ~ t, type = l) You can save TZ before you set it and then restore it. On Mon, Jan 26, 2009 at 5:47 PM, Jim Porzak jpor...@gmail.com wrote: Friends, I have a POSIXct vector located in EST timezone. When I plot against it here in PST, the time axis is shifted 3 hours back in time. IOW, plot adjusts for time zone difference. Now that's really great, if that's what one wants. However, I want time axis to use actual times in object (without any shift). For example: n - 360 y - rnorm(n) t - seq(from = as.POSIXct(2009-01-26 12:00:00, tz = EST), by = 60, length.out = n) head(t) #[ 1] 2009-01-26 12:00:00 EST 2009-01-26 12:01:00 EST 2009-01-26 12:02:00 EST # [4] 2009-01-26 12:03:00 EST 2009-01-26 12:04:00 EST 2009-01-26 12:05:00 EST Sys.timezone() # [1] PST #But doing: plot(y ~ t, type = l) results in plot starting at 09:00 (here in California) I've poked around in help, etc but haven't any way to force use of timezone in t. What am I missing? TIA, Jim Porzak TGN.com San Francisco, CA http://www.linkedin.com/in/jimporzak use R! Group SF: http://ia.meetup.com/67/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] working with tables -- was Re: Mode (statistics) in R?
Ok, so I'm slowly figuring out what a factor is, and was able to follow the related thread about finding a mode by using constructs like my_mode = as.numeric(names(table(x))[which.max(table(x))]) Now, suppose I want to keep looking for other modes? For example, Rgames sample(seq(1,10),50,replace=TRUE)-bag Rgames bag [1] 2 8 8 10 7 3 2 9 8 3 8 9 6 6 10 10 7 1 [19] 9 5 4 3 3 5 10 3 6 3 2 8 4 2 1 10 6 2 [37] 6 6 9 8 6 8 8 4 3 6 3 9 5 1 Rgames names(which.max(table(bag))) [1] 3 I can then do Rgames bag2-bag[bag!=3] and repeat the which.max stuff. I came up with the following command to find the actual magnitude of the mode: Rgames table(bag)-tbag Rgames tbag bag 1 2 3 4 5 6 7 8 9 10 3 5 8 3 3 8 2 8 5 5 Rgames tbag[dimnames(tbag)$bag==3]-bagmode Rgames bagmode 3 8 Related to this, since bag2 is now bereft of threes, Rgames table(bag2) bag2 1 2 4 5 6 7 8 9 10 3 5 3 3 8 2 8 5 5 I was able to make the same table with Rgames newtable-tbag[c(dimnames(tbag)$bag)!=3] Rgames newtable bag 1 2 4 5 6 7 8 9 10 3 5 3 3 8 2 8 5 5 Is there a cleaner syntax to do these things? Thanks for your help--and feel free to point me to the Inferno or other paper on the philosophy and use of factors and tables. Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Quality Accuracy of R
It would be possible to develop tools to develop code coverage statistics quantifying the percent of the code that the tests exercise. On Fri, Jan 23, 2009 at 10:04 AM, Muenchen, Robert A (Bob) muenc...@utk.edu wrote: Hi All, We have all had to face skeptical colleagues asking if software made by volunteers could match the quality and accuracy of commercially written software. Thanks to the prompting of a recent R-help thread, I read, R: Regulatory Compliance and Validation Issues, A Guidance Document for the Use of R in Regulated Clinical Trial Environments (http://www.r-project.org/doc/R-FDA.pdf). This is an important document, of interest to the general R community. The question of R's accuracy is such a frequent one, it would be beneficial to increase the visibility of the non-clinical information it contains. A document aimed at a general audience, entitled something like, R: Controlling Quality and Assuring Accuracy could be compiled from the these sections: 1. What is R? (section 4) 2. The R Foundation for Statistical Computing (section 3) 3. The Scope of this Guidance Document (section 2) 4. Software Development Life Cycle (section 6) Marc Schwartz, Frank Harrell, Anthony Rossini, Ian Francis and others did such a great job that very few words would need to change. The only addition I suggest is to mention how well R did in, Keeling Parvur's A comparative study of the reliability to nine statistical software packages, May 1, 2007 Computational Statistics Data Analysis, Vol.51, pp 3811-3831. Given the importance of this issue, I would like to see such a document added to the PDF manuals in R's Help. The document mentions (Sect. 6.3) that a set of validation tests, data and known results are available. It would be useful to have an option to run that test suite in every R installation, providing clear progress, Validating accuracy of t-tests...Validating accuracy of linear regression Whether or not people chose to run the tests, they would at least know that such tests are available. Back in my mainframe installation days, this step was part of many software installations and it certainly gave the impression that those were the companies that took accuracy seriously. Of course the other companies probably just ran their validation suite before shipping, but seeing it happen had a tremendous impact. I don't know how much this would add to download, but if it was too much, perhaps it could be implemented as a separate download. I hope these suggestions can help mitigate the concerns so many non-R users have. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager, Research Computing Support U of TN Office of Information Technology Stokely Management Center, Suite 200 916 Volunteer Blvd., Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: muenc...@utk.edu Web: http://oit.utk.edu/research http://oit.utk.edu/scc Map to Office: http://www.utk.edu/maps Newsletter: http://listserv.utk.edu/archives/rcnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Quality Accuracy of R
That's a great idea. I know of no commercial vendors who provide such detailed info. Bob -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Monday, January 26, 2009 7:52 PM To: Muenchen, Robert A (Bob) Cc: R-help@r-project.org Subject: Re: [R] The Quality Accuracy of R It would be possible to develop tools to develop code coverage statistics quantifying the percent of the code that the tests exercise. On Fri, Jan 23, 2009 at 10:04 AM, Muenchen, Robert A (Bob) muenc...@utk.edu wrote: Hi All, We have all had to face skeptical colleagues asking if software made by volunteers could match the quality and accuracy of commercially written software. Thanks to the prompting of a recent R-help thread, I read, R: Regulatory Compliance and Validation Issues, A Guidance Document for the Use of R in Regulated Clinical Trial Environments (http://www.r-project.org/doc/R-FDA.pdf). This is an important document, of interest to the general R community. The question of R's accuracy is such a frequent one, it would be beneficial to increase the visibility of the non-clinical information it contains. A document aimed at a general audience, entitled something like, R: Controlling Quality and Assuring Accuracy could be compiled from the these sections: 1. What is R? (section 4) 2. The R Foundation for Statistical Computing (section 3) 3. The Scope of this Guidance Document (section 2) 4. Software Development Life Cycle (section 6) Marc Schwartz, Frank Harrell, Anthony Rossini, Ian Francis and others did such a great job that very few words would need to change. The only addition I suggest is to mention how well R did in, Keeling Parvur's A comparative study of the reliability to nine statistical software packages, May 1, 2007 Computational Statistics Data Analysis, Vol.51, pp 3811-3831. Given the importance of this issue, I would like to see such a document added to the PDF manuals in R's Help. The document mentions (Sect. 6.3) that a set of validation tests, data and known results are available. It would be useful to have an option to run that test suite in every R installation, providing clear progress, Validating accuracy of t-tests...Validating accuracy of linear regression Whether or not people chose to run the tests, they would at least know that such tests are available. Back in my mainframe installation days, this step was part of many software installations and it certainly gave the impression that those were the companies that took accuracy seriously. Of course the other companies probably just ran their validation suite before shipping, but seeing it happen had a tremendous impact. I don't know how much this would add to download, but if it was too much, perhaps it could be implemented as a separate download. I hope these suggestions can help mitigate the concerns so many non-R users have. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager, Research Computing Support U of TN Office of Information Technology Stokely Management Center, Suite 200 916 Volunteer Blvd., Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: muenc...@utk.edu Web: http://oit.utk.edu/research http://oit.utk.edu/scc Map to Office: http://www.utk.edu/maps Newsletter: http://listserv.utk.edu/archives/rcnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pausing processing into an interactive session
Hi all, As a possibly silly request, is it possible to interactively pause a R-calculation and do a browser(), say, without browser or other debug handlers being explicitly included in the code? Imagine the following situation: You write up a big calculation for R to calculate. We are talking hours here, or worse. A few hours into the calculation, you decide that you want to check on how it's going. Unfortunately, you didn't forsee the output you really want to check on. Oops. What would seem ideal is something like this: as well as Ctrl-C, which would terminate the current computation, we really want some key combo perhaps that would pause the computation, perhaps at the next 'reasonable spot'. (Not Ctrl-Z either, as it doesn't let you look at what's going on in the program). Then you can examine variables, for example. Maybe even tweak them manually. And press the key to resume the calculation. Is this already possible somehow? Can it be made possible? Or would there not be any point? Thanks, Zhou __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] WhisperStation R
What do you think of this: http://www.microway.com/whisperstation/whisperstation-r.html I'm considering ditching my Windows Vista 2 GB RAM computer for WhisperStation R using Debian 64-bit Linux with 32 GB RAM and setting the whole thing up for R and WinBUGS. I put in a price request, but I know nothing about Linux, or WhisperStation R for that matter, and am really curious what you think? -- View this message in context: http://www.nabble.com/WhisperStation-R-tp21678280p21678280.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running R under Sun Grid Engine with OpenMPI tight integration
Hi - I saw your posting on the R-help mailing list. Were you ever able to get this working? did you end up switching to use the rsge library? I'm trying to do the same, and not having very much luck getting it going. Thanks! Peter Waltman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with loading RMySQL under sge/qsub
Hi - I'm trying to set up a parallelized batch job that is run under rmpi and managed by sge, using qsub, but it reports that it can't load RMySQL because it can't find the libmysqlclient.so.15 file. Note, when I run R interactively, and manually load the RMySQL library, it works without a hitch, however, when I have qsub launch R, it reports the following error: Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library '/home/install/usr/apps/R-2.8.0/lib64/RMySQL/libs/RMySQL.so': libmysqlclient.so.15: cannot open shared object file: No such file or directory On the web, I found this posting to this list: http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12876.html, which recommends setting the LD_LIBRARY_PATH env var to the location of the libmysqlclient.so.15 file. I've set that in my .bashrc, and use the '-V' switch to qsub to make sure I'm exporting my environment variables to qsub, but still get the error. I've also double checked the qsub job's status, with qstat -j jobid and the LD_LIBRARY_PATH is set to what I've set it to. Since it only happens when under qsub, I think it's got to be something with either how I'm calling qsub or how sge is configured, but I can't figure out what or what the problem is. Can anyone suggest a workaround, or make a suggestion? I'm really stuck here. Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for MANOVA?
On Mon, 26 Jan 2009, Adam D. I. Kramer wrote: On Mon, 26 Jan 2009, Charles C. Berry wrote: If you know what a 'general linear hypothesis test' is see http://cran.r-project.org/src/contrib/Archive/hpower/hpower_0.1-0.tar.gz I do, and am quite interested, however this package will not install on R 2.8.1: First, it said that there was no maintainer in the description, so I added one (figuring that the 1991 date of the package was to blame), however it still will not compile: parmesan:tmp$ sudo R CMD INSTALL hpower/ * Installing to library '/usr/local/lib/R/library' * Installing *source* package 'hpower' ... ** R ** preparing package for lazy loading Error in parse(n = -1, file = file) : unexpected '{' at 5: ## 6: pfnc_function(q,df1,df2,lm,iprec=c(6)) { _^_ AHA! That underscore is the old 'assignment' operator - now no longer allowed. Do a global replace of '_' with ' - ' in the R/*.R files and it should install. HTH, Chuck Calls: Anonymous - code2LazyLoadDB - sys.source - parse Execution halted ERROR: lazy loading failed for package 'hpower' ** Removing '/usr/local/lib/R/library/hpower' parmesan:tmp$ ...any tips? --Adam HTH, Chuck On Mon, 26 Jan 2009, Adam D. I. Kramer wrote: On Mon, 26 Jan 2009, Stephan Kolassa wrote: My (and, judging from previous traffic on R-help about power analyses, also some other people's) preferred approach is to simply simulate an effect size you would like to detect a couple of thousand times, run your proposed analysis and look how often you get significance. In your simple case, this should be quite easy. I actually don't have much experience running monte-carlo designs like this...so while I'd certainly prefer a bootstrapping method like this one, simulating the effect size given my constraints isn't something I've done before. The MANOVA procedure takes 5 dependent variables, and determines what combination of the variables best discriminates the two levels of my independent variable...then the discrimination rate is represented in the statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). So coming up with a set of constraints that would produce V=.00019 given my data set doesn't quite sound trivial...so I'll go for the par library reference mentioned earlier before I try this. That said, if anyone can refer me to a tool that will help me out (or an instruction manual for RNG), I'd also be much obliged. Many thanks, Adam HTH, Stephan Adam D. I. Kramer schrieb: Hello, I have searched and failed for a program or script or method to conduct a power analysis for a MANOVA. My interest is a fairly simple case of 5 dependent variables and a single two-level categorical predictor (though the categories aren't balanced). If anybody happens to know of a script that will do this in R, I'd love to know of it! Otherwise, I'll see about writing one myself. What I currently see is this, from help.search(power): stats::power.anova.test Power calculations for balanced one-way analysis of variance tests stats::power.prop.test Power calculations two sample test for proportions stats::power.t.test Power calculations for one and two sample t tests Any references on power in MANOVA would also be helpful, though of course I will do my own lit search for them myself. Cordially, Adam D. I. Kramer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing
Re: [R] WhisperStation R
any idea why DDR2 667 MHz RAM isn't used instead of DDR? I thought that DDR 400MHz was almost finished in production... On Jan 27, 1:01 pm, zerfetzen zerfet...@yahoo.com wrote: What do you think of this: http://www.microway.com/whisperstation/whisperstation-r.html I'm considering ditching my Windows Vista 2 GB RAM computer for WhisperStation R using Debian 64-bit Linux with 32 GB RAM and setting the whole thing up for R and WinBUGS. I put in a price request, but I know nothing about Linux, or WhisperStation R for that matter, and am really curious what you think? -- View this message in context:http://www.nabble.com/WhisperStation-R-tp21678280p21678280.html Sent from the R help mailing list archive at Nabble.com. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running R under Sun Grid Engine with OpenMPI tight integration
On Tue, Jan 27, 2009 at 2:30 AM, Peter Waltman peter.walt...@gmail.com wrote: Hi - I saw your posting on the R-help mailing list. Were you ever able to get this working? did you end up switching to use the rsge library? Yes - that is exactly what I did - I am using rsge or, which is in most cases sufficient for me, starting several instances of R and run the whole simulation (array processing). But I would still like to know how I can use the Rmpi and snow on the Sun Grid Engine. Please keep me posted, Rainer I'm trying to do the same, and not having very much luck getting it going. Thanks! Peter Waltman -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Control of Quartz Window Location
If I use plot(1:10) quartz() plot(1:10) I get the second graph window almost on top of the first graph window. How can I control the location of the quartz window? Larry Weldon Simon Fraser University wel...@sfu.ca www.stat.sfu.ca/~weldon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do you specify font family in png output; png cross-platform issues
For teaching purposes, I prepared a little R program. I want to give this to students who can run it and dump out many formats and then compare their use in LaTeX documents. I do not have too much trouble with xfig or postscript format, but I've really run into a roadblock where png files are concerned. My original problem was that the png device does not accept a family option. How can I have png output with the Times family to compare with the postscript or pdf output? While searching for information on this, I discovered there have been a lot of R changes in png support. If I give this script to people with Mac or Windows, what are the chances that it will work? If I'm reading the png help page correctly, there are different types available, Xlib and cairo, but I don't understand what all that means for sending a program like this across systems. (fear the worst, but ask hoping for best). As far as I understand it, the paper=special option is needed so that the eps or pdf output will fit into a document without creating really huge margins around the graph. Correct? x- rnorm(333) y- rnorm(333) plot ( x,y, xlab=Input Variable, ylab=Output Variable) xfig(file=testplot.fig, horizontal=F, height=6, width=6, family=Times) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() postscript(file=testplot-1.eps, horizontal=F, height=6, width=6, family=Times, onefile=F, paper=special) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() postscript(file=testplot-2.eps, horizontal=F, height=4, width=4, family=Times, onefile=F, paper=special) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() pdf(file=testplot-1.pdf, height=6, width=6, family=Times,onefile=F,paper=special) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() png(file=testplot-1.png, height=350, width=550, type=Xlib) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() png(file=testplot-2.png, height=350, width=550, type=cairo) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() Can I bother you about one last png issue? While searching r-help, I see posts about the difference in png output between type Xlib and cairo. For reasons I do not understand, ordinary viewers like GQview or Firefox make cairo-produced png files look blurry (in the words of posts on r-help). The png output from type=Xlib output is not blurry. This raises another level of confusion about this exercise I'm devising. Does R for Windows, as provided on the CRAN system, use Xlib for png? pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting data from a PDF-file into R
Peter Dalgaard wrote: joe1985 wrote: Hello I have around 200 PDF-documents, containing data i want organized in R as a dataframe. The PDF-documents look like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg or like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg So i want to pull out the data in coloured boxes it become organized like this (just in R instead of excel); http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg So the 0'es and 1'es represent when either PRRS-neg occurs presented by a 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with PRRS-pos VAC or Vac presented by a 1 in the colum PRRS-VAC, and PRRS-pos DK or DK presented by a 1 in the colum PRRS-DK. And also with sanVAC there should be a 1 in the colum VACsan, and with sanDK there should be a 1 in the colum DKsan. The first date for each CHR-nr should either be the earliest date ne the red box (as in the first picture), or the date with word før before the date (as in the second picture). All the 200 PDF-documents looks like the ones in the pictures, each reprenting a different CHR-nr Hope you can help me Not on the basis of .jpeg files, I think. We'd need some indication of what the PDF looks like inside. There's a tool called pdftotext, which might do something for you, IF you can figure out reliably where your data begin and end. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thank you for your quick respons Here they are as textfiles; http://www.nabble.com/file/p21680833/Foersom%2B-%2B688.txt Foersom+-+688.txt http://www.nabble.com/file/p21680833/M%25C3%2598LLEVANG%2B602%2B.txt M%C3%98LLEVANG+602+.txt -- View this message in context: http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21680833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.