Re: [R] Random Forest
Ruben, Maybe your binary response is a numeric vector - try converting it into a factor with two levels. You probably want classification rather than regression (the dependent variable should be numeric and continous)! Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ruben Feldman Sent: Monday, April 23, 2007 10:28 AM To: r-help@stat.math.ethz.ch Subject: [R] Random Forest Hi R-wizards, I ran a random forest on a dataset where the response variable had two possible values. It returned a warning telling me that it did regression and if that was really what I wanted. Does anybody know what is being in terms of the algorithm when it does a regression? (the random forest is used as a regression, how does that work?) Thanks for your time! Ruben [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting very long character string
Hello, thanks a lot for your help on splitting the string to get a numeric vector. I'm now writign the string to a tempfile and read it in via scan - this is fast enough for me: library(XML); ... tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T); tmp = xmlValue(tmp[[1]]); cat(paste('splitting', nchar(tmp), 'string ...\n')); tmp.file = tempfile(); sink(tmp.file); cat(tmp); sink(); tmp = scan(tmp.file); unlink(tmp.file); cat(paste('splitting done,', length(tmp), 'elements\n')); thanks again and kind regards, Arne -Original Message- From: john seers (IFR) [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 01, 2006 17:01 To: Muller, Arne PH/FR; r-help@stat.math.ethz.ch Subject: RE: [R] splitting very long character string Hi Arne If you are reading in from files and they are just one number per line it would be more efficient to use scan directly. ?scan For example: filen-C:/temp/tt.txt i-scan(filen) Read 5 items i [1] 12345 5643765674 63566565666 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 01 November 2006 15:47 To: r-help@stat.math.ethz.ch Subject: [R] splitting very long character string Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: 12345 564376 5674 6356656 5666 I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting very long character string
Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345 564376 5674 6356656 5666 ... I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] graphics and 'layout' question
Hello, I got stuck with a graphics question: I've 3 figures that I present on a single page (window) via 'layout'. The layout is layout(matrix(c(1,1,2,3), 2, 2, byrow=TRUE)); so that the frst plot spans the both columns in row one. Now I'd like to magnify the fist figure so that it takes 20% more vertical space (i.e. more space for the y-axis). How would I do this in R? thanks a lot for your help, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] randomForest question
Hello, I've a question regarding randomForest (from the package with same name). I've 16 featurs (nominative), 159 positive and 318 negative cases that I'd like to classify (binary classification). Using the tuning from the e1071 package it turns out that the best performance if reached when using all 16 features per tree (mtry=16). However, the documentation of randomForest suggests to take the sqrt(#features), i.e. 4. How can I explain this difference? When using all features this is the same as a classical decision tree, with the difference that the tree is built and tested with different data sets, right? example (I've tried different configurations, incl. changing ntree): param - try(tune(randomForest, class ~ ., data=d.all318, range=list(mtry=c(4, 8, 16), ntree=c(1000; summary(param) Parameter tuning of `randomForest': - sampling method: 10-fold cross validation - best parameters: mtry ntree 16 1000 - best performance: 0.1571809 - Detailed performance results: mtry ntree error 14 1000 0.1928635 28 1000 0.1634752 3 16 1000 0.1571809 thanks a lot for your help, kind regards, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data import problem
Dear All, I'm trying to read a text data file that contains several records separated by a blank line. Each record starts with a row that contains it's ID and the number of rows for the records (two columns), then the data table itself, e.g. 123 5 89.17911.1024 90.57351.1024 92.56661.1024 95.07251.1024 101.20701.1024 321 3 60.16011.1024 64.80231.1024 70.05932.1502 ... I thought I coudl simply use something line this: con - file(test2.txt); do { e - read.table(con, nlines = 1); if ( length(e) == 2 ) { d - read.table(con, nrows = e[1,2]); #process data frame d } } while (length(e) == 2); The problem is that read.table closes the connection object, I assumed that it would not close the connection, and instead contines where it last stopped. Since the data is nearly a simple table I though read.table could work rather than using scan directly. Any suggestions to read this file efficently are welcome (the file can contain several thousand record and each record can contain several thousand rows). thanks a lot for your help, +kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data import problem
Well, the data is generated by a perl script, and I could just configure the perl script so that there is one file per data table, but I though I'd probably must more efficent to have all records in a single file rather than reading a thousands of small files ... . kind regards, Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Philipp Pagel Sent: Wednesday, March 08, 2006 12:44 To: r-help@stat.math.ethz.ch Subject: Re: [R] data import problem On Wed, Mar 08, 2006 at 12:32:28PM +0100, [EMAIL PROTECTED] wrote: I'm trying to read a text data file that contains several records separated by a blank line. Each record starts with a row that contains it's ID and the number of rows for the records (two columns), then the data table itself, e.g. 123 5 89.17911.1024 90.57351.1024 92.56661.1024 95.07251.1024 101.20701.1024 321 3 60.16011.1024 64.80231.1024 70.05932.1502 That sound like a job for awk. I think it will be much easier to transform the data into a flat table using awk, python or perl an then just read the table with R. cu Philipp -- Dr. Philipp PagelTel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich Science Center Weihenstephan 85350 Freising, Germany and Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675 GSF - National Research Center Fax. +49-89-3187 3585 for Environment and Health Ingolstädter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] calculating IC50
Hello, I was wondering if there is an R-package to automatically calculate the IC50 value (concentration of a substrance that inhibits cell growth to 50%) for some measurements. kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dynamic Programming in R
Hello, I've implemented dynamic programming for aligning spectral data (usually 100 to 200 peaks in one spectrum, but some spectra contain 5k peaks) entirely in R. As François Pinard pointed out, the memory usage should be proportional to the n x n dynamic programming matrix, and I've not yet had any problems on my machine (R2.2.0 win2k, 1GB mem, 2GHz Intel PV), CPU seems to be the more problematic issue. I guess it all depends on how much data you have. You could split the dynamic programming matrix into chunks and calculate them in parallel on different machines (but the implementatino of finding the optiomal trace will probably get a bit difficult). kind regards, Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Arnab mukherji Sent: Thursday, January 19, 2006 22:55 To: r-help@stat.math.ethz.ch Subject: [R] Dynamic Programming in R Hi R users, I am looking to numerically solve a dynamic program in the R environment. I was wondering if there were people out there who had expereinced success at using R for such applications. I'd rather continue in R than learn Mathlab. A concern that has been cited that may discourage R use for solving dynamic programs is its memory handling abilities. A senior researcher had a lot of trouble with R becuase on any given run it would eat up all the computers memory and need to start using the hard disk. Yet, the memory needed was not substantial - saving the worksapce, exiting and recalling would noticebly start of tthe progam at a much lower memory use, level and a quick deteroration in a few thousand iterations. Is this a problem other people have come across? Perhaps, its a problem already fixed, since the researcher was working on this in 2002 (he claimed he had tried it on windows, mac, and unix versions to check). Thanks. Arnab __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RMySQL/DBI
Hello, does anybody run RMySQL/DBI successfully on SunOS5.8 and MySQL 3.23.53 ? I'll get a segmentation fault whe trying to call dbConnect. We'll soon swtich to MySQL 4, however, I was wondering whether the very ancient mysql version realy is the problem ... RMySQL 0.5-5 DBI 0.1-9 R 2.2.0 SunOS 5.8 kind regards and thanks a lot for your help, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] trellis: style of axis labels
Hello, is it possible to get xyplot of package lattice to acknowledge par(las=2)? In my trellis plot the x-axis lables are overlapping (they're factors with rather long level names), and I'd like to have them vertical. The trellis plot doesn't seem to read the 'par' settings, and trellist.par.set neither :-( thanks for your help, +kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data frames and factors
Hello, I have prepared an svm on some training data and would like to use the svm model for predicting binary outcome from new data. The input data frame contains several numeric and factor variables. Usually I construct the input matrix of the entities to be predicted with a perl script that writes it to a file (since the data comes from different sources and some text processing is needed). This file is then read read via read.table within R. It is possible that I'd like to perform prediction on many new cases or on a single new case. There are now two problems: 1. If the constructed matrix for the cases to be predicted does not contain all the factor levels that were used to build the model (the factor levels found the training set) the svm throws an error (Error in scale ...). I've tried to factors, but instead of getting the level labes I get the numeric values: tmp - sapply(11:15, function(i) factor(new.dat[,i], levels=c('A','C','G','T'))) tmp [,1] [,2] [,3] [,4] [,5] [1,]34422 [2,]42211 [3,]21111 [4,]11111 [5,]11213 [6,]21343 [7,]34331 [8,]33141 [9,]14114 [10,]11444 new.dat[,14] [1] C A A A A T G T A T 2. When reading a data frame with the variables and factos for a single new case (one row), read.table always treats the variables as strings (variables and factors), and worse - one of the factors contains a level named 'T' that is replaced by TRUE during read.table. I've tried as.is = T and F, and the result for she single row data frame is the same (T is replaxced by TRUE). I'm running R 2.1.0. Any suggestions how to read a data frame (with at least one row) and to treat factor columns as such, and how to adjust the factor levels before passing the data frame to predict.svm? thanks in advance, +kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] basic anova and t-test question
Hello, I'm posting this to receive some comments/hints about a rather statistical than R-technical question ... . In an anova of a lme factor SSPos11 shows up non-significant, but in the t-test of the summay 2 of the 4 levels (one for constrast) are significant. See below for some truncated output. I realize that the two test are different (F-test/t-test), but I'm looking for for a meaning. Maye you have a schenario that explains how these differences can be created and how you'd go ahead and analyse it further. When I use SSPos11 as te only fixed effect, it does it is not significant in either anova nor t-test, and a boxplot of the factor shows that the levels are all quite similar (similar variance and mean). Might the effect I observe be linked to an unbalance design in the multifactorial model? thanks a lot for your help, +kind regards, Arne anova(fit) numDF denDF F-value p-value (Intercept) 1 540 323.4442 .0001 SSPos1 3 540 15.1206 .0001 ... SSPos11 3 540 1.1902 0.3128 ... summary(fit) Linear mixed-effects model fit by REML Data: d.orig AIC BIClogLik 1007.066 1153.168 -469.5329 Random effects: Formula: ~1 | Method (Intercept) Residual StdDev: 0.4000478 0.4943817 Fixed effects: log(value + 7.5) ~ SSPos1 + SSPos2 + SSPos6 + SSPos7 + SSPos10 + SSPos11 + SSPos13 + SSPos14 + SSPos18 + SSPos19 + Value Std.Error DF t-value p-value (Intercept) 2.8621811 0.23125065 540 12.376964 0. SSPos1C -0.1647937 0.06293993 540 -2.618269 0.0091 SSPos1G -0.3448095 0.05922479 540 -5.822047 0. SSPos1T 0.1083988 0.06087095 540 1.780797 0.0755 ... SSPos11C -0.1540292 0.06171635 540 -2.495761 0.0129 SSPos11G -0.1428980 0.05993122 540 -2.384368 0.0175 SSPos11T -0.0039434 0.06133920 540 -0.064289 0.9488 ... [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RandomForest question
Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although there are only 32 explanatory variables the best classification performance is reached when choosing mtry=80. How is it possible that more variables can used than there are in columns the data frame? thanks for your help + kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] p-values for classification
Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate p-values for obtaining =fdr and =sensitifity for each classifier. I was thinking about shuffling/bootstrap the lables of the test set, classify them and calculating the p-value from the obtained normal distributed random fdr and sensitifity. The problem is that it's rather slow when running many rounds of shuffling/classification (I'd like to do this for many classifiers and parameter combinations). In addition classification of the 50 test data points with shuffled lables realistically produces only a very limited number of possible fdr's and sensitivities, and I'm wondering if I can realy believe these values to be normal. Basically I'm looking for a way to calculate the p-values analytically. I'd be happy for any suggestions, web-addresses or references. kind regads, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] randomForest error
Hello, I'm using the random forest package. One of my factors in the data set contains 41 levels (I can't code this as a numeric value - in terms of linear models this would be a random factor). The randomForest call comes back with an error telling me that the limit is 32 categories. Is there any reason for this particular limit? Maybe it's possible to recompile the module with a different cutoff? thanks a lot for your help, kind regards, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] svm and scaling input
Dear All, I've a question about scaling the input variables for an analysis with svm (package e1071). Most of my variables are factors with 4 to 6 levels but there are also some numeric variables. I'm not familiar with the math behind svms, so my assumtions maybe completely wrong ... or obvious. Will the svm automatically expand the factors into a binary matrix? If I add numeric variables outside the range of 0 to 1 do I have to scale them to have 0 to 1 range? thanks a lot for help, +kind regards, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] bug in predict.lme?
Dear All, I've come across a problem in predict.lme. Assigning a model formula to a variable and then using this variable in lme (instead of typing the formula into the formula part of lme) works as expect. However, when performing a predict on the fitted model I gan an error messag - predict.lme (but not predictlm) seems to expect a 'properly' typed in formula and a cannot extract the formula from the variable. THe code below demonstrates this. Is this a known or expected behavour of predict.lme or is this a bug? kind regards, Arne (R-2.1.0) library(nlme) ... mod - distance ~ age + Sex # example from ?lme mod distance ~ age + Sex fm2 - lme(mod, data = Orthodont, random = ~ 1) anova(fm2) numDF denDF F-value p-value (Intercept) 180 4123.156 .0001 age 180 114.838 .0001 Sex 1259.292 0.0054 fm2 Linear mixed-effects model fit by REML Data: Orthodont Log-restricted-likelihood: -218.7563 Fixed: mod ... predict(fm2, Orthodont) Error in mCall[[fixed]][-2] : object is not subsettable fm2 - update(fm2, . ~ .) # this replaces mod by the contents of variable mod fm2 Linear mixed-effects model fit by REML Data: Orthodont Log-restricted-likelihood: -218.7563 Fixed: distance ~ age + Sex ... predict(fm2, Orthodont) M01 M01 M01 M01 ... 25.39237 26.71274 28.03311 29.35348 21.61052 ... fm2 - lm(mod, data = Orthodont) predict(fm2, Orthodont) 1234 ... 22.98819 24.30856 25.62894 26.94931 ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lm/lme cross-validation
Hello, is there a special package/method to cross-validate linear fixed effects and mixed effects models (from lme)? I've tried cv.glm on an lme (hoping that it may deal with any kind of linear model ...), but it raises an error: Error in eval(expr, envir, enclos) : couldn't find function lme.formula so I guess it's not dealing with an lme. I've realized that removing randomly some lines from the data frame used for lme strongly changes the the estimates and reduces the correlation between fitted and actual values. Therefore I'd like to get a more realistic view of the prediction performance. Any ideas are welcome, +thanks, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] error in plot.lmList
Hello, in R-2.1.0 I'm trying to prodice trellis plots from an lmList object as described in the help for plot.lmList. I can generate the plots from the help, but on my own data plotting fails with an error message that I cannot interpret (please see below). Any hints are greatly appreciapted. kind regards, Arne dim(d) [1] 575 4 d[1:3,] Level_of_Expression SSPos1 SSPos19 Method 111.9 G A bDNA 224.7 T T bDNA 3 9.8 C T bDNA fm - lmList(Level_of_Expression ~ SSPos1 + SSPos19 | Method, data=d) fm Call: Model: Level_of_Expression ~ SSPos1 + SSPos19 | Method Data: d Coefficients: (Intercept) SSPos1CSSPos1G SSPos1T SSPos19C SSPos19G SSPos19T bDNA 25.75211 -6.379701 -9.193304 10.371056 24.32171 24.06107 9.7357724 Luciferase23.79947 4.905679 -7.747861 8.112779 48.95151 48.15064 -0.2646783 RT-PCR56.08985 -7.352206 -15.896556 -2.712313 19.91967 24.28425 -2.2317071 Western 14.03876 2.777038 -14.113157 -7.804959 24.62684 25.50382 8.3864782 Degrees of freedom: 575 total; 547 residual Residual standard error: 25.39981 plot(fm, Level_of_Expression ~ fitted(.)) Error in plot.lmList(fm, Level_of_Expression ~ fitted(.)) : Object cF not found what is object cF ...? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] casting lm.fit output to an lm object
Hello, Is it possible to cast the output of lm.fit to an lm object? I've 10,000 linear models for a gene expression experiment, all of which have the same model matrix. Maybe calling lm.fit on a model matrix and a data vector is faster than lm. I'd like to use each fit for an anova as well as comparing different models via anova. kind regards, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Re: The hidden costs of GPL software?
[...] I am a biologist coming to R via Bioconductor. I have no computer background in computer sciences and only basic undergraduate training level in statistics. I have used R with great pleasure and great pains. The most difficult thing is to know what functions to use - sometimes I know that one function is most likely available, but there's really no easy way to get it (yes, even going to the archives and reading the help files). I feel that more examples in the help files would definitely be a good way to fully understand the potencial of the functions. I know how difficult this is to do and how much of a time sink it must be. Yes, I' often have the same problem when it comes to programming in R (data manipulation, formatting etc ...). When thinking about a solution, I often come up with something slow and complicated. A positng to this list usually reveals a very simple solution thanks to a function that I didn't find when exploring help, help.search and the archives (and thanks to those who give me the hint ;-). However, I don't know how to improve this, i.e. how to implement a more sophisticated help.search. Maybe the keywords in the help files or some kind of free text mining would help - well, maybe this is a bit over the top. On the other hand, when it comes to the statistics (I'm a not a statistician) and it's minimal formatting of data etc , I think that developing an understanding of the stats itself is the main probelm and a GUI doesn't help very much in for this. Once the basic understanding is there (which one needs anyway, even with a GUI), the rest is not too difficult. In addition I usually need to script the calculations for many different datasets, and again most GUIs are bad in repeating tasks systematically. I've spent quite some time with learing R (and I haven't stoped yet ;-), but it's devinitely worth it. As a scientists I appreciate it, and since it is a tool that use often, I would not exchange the command-line for any GUI. This list and the many books and manuals (mentioned in the other postings here) do a pretty good job in teaching R! kind regards, Arne [...] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] printing to stderr
Hello, is it possible to configure the print function to print to stderr? kind regards, Arne __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Boxplot, space to axis
Hello, I've crearted a boxplot with 84 boxes. So fat everything is as I expect, but there is quite some space between the 1st box and axis 2 and the last box and axis 4. Since 84 boxes get very slim anyway I'd like to discribute as much of the horizontal space over the x-axis. Maybe I've forgotten about a graphics parameter? Thanks for your help, Arne __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Boxplot, space to axis
Hello Deepayan, thanks for your suggestion, xaxs='i' works, but it leaves no space at all. I though this may be configurable by a real value. kind regards, Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Deepayan Sarkar Sent: 30 September 2004 17:12 To: [EMAIL PROTECTED] Cc: Muller, Arne PH/FR Subject: Re: [R] Boxplot, space to axis On Thursday 30 September 2004 09:41, [EMAIL PROTECTED] wrote: Hello, I've crearted a boxplot with 84 boxes. So fat everything is as I expect, but there is quite some space between the 1st box and axis 2 and the last box and axis 4. Since 84 boxes get very slim anyway I'd like to discribute as much of the horizontal space over the x-axis. Maybe I've forgotten about a graphics parameter? Perhaps par(xaxs = i) ? Deepayan __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] strange tickmarks placing in image
Hello, I've a problem aligning tickmarks to an image. I've created a correlation matrix for 84 datasets. I'm visualizing the matrix as an image with colour coding according to the correlation coefficient. The 84 datasets are distributed over three factors, but the desgin is unbalanced, so that the tickmarks and the lables for the axis must not evenly distributed. A regular grid via the 'grid' function aligns perfectly with the image cells, but the tickmarks via axis are slightely shifted, and not aligned perfectly with the image cells. The offset is even stonger for the y-axis. The thing is that I don't want 84 lables at the axis, it's enough to have one lable for all the different factor level combinations, which results in 28 labels. Maybe you have an idea how to setup the command to align the tick marks properly. thanks for your help, kind regards, Arne Here are my commands: library(marrayPlots) # for the colors col - maPalette(low='white', high='darkred', k=50) par(ps=8, cex=1, mar=c(1,5,5,1)) # space needed for lables @ axis 1 and 3 # x and y range from 1 to 84, x is the correlation matrix (dim = 84x84) image(1:84, 1:84, x, col=col, xaxt='n', yaxt='n', xlab='', ylab='') # set up the axis, 28 lables, distributed un-evenly over the image axis axis(3, i, labels=names(l), las=2, tick=T) axis(2, i, labels=names(l), las=2, tick=T) grid(84, col='black', lty='solid') # grids each of the 84 cells # this is where the lables come form, the number indicate the replicates # per factor-level combinations l NEW:4:0 NEW:4:100 NEW:4:250 NEW:4:500 NEW:4:1000NEW:24:0 3 3 3 3 3 3 NEW:24:100 NEW:24:250 NEW:24:500 NEW:24:1000 OLD:4:0 OLD:4:100 3 3 3 3 4 3 OLD:4:250 OLD:4:500 OLD:4:1000OLD:24:0 OLD:24:100 OLD:24:250 2 3 3 4 3 2 OLD:24:500 OLD:24:1000 PRG:4:0 PRG:4:100 PRG:4:250 PRG:4:1000 3 3 3 3 3 3 PRG:24:0 PRG:24:100 PRG:24:250 PRG:24:1000 3 3 3 3 # these are the positions along the axis for the tick marks, # replicates from 1 to 3 (replicates of one factor-evel combination), 4 to 6 # ... i [1] 3 6 9 12 15 18 21 24 27 30 34 37 39 42 45 49 52 54 57 60 63 66 69 72 75 [26] 78 81 84 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] binning a vector
Hello, I was wondering wether there's a function in R that takes two vectors (of same length) as input and computes mean values for bins (intervals) or even a sliding window over these vectros. I've several x/y data set (input/response) that I'd like plot together. Say the x-data for one data set goes from -5 to 14 with 12,000 values, then I'd like to bin the x-vector in steps of +1 and calculate and plot the mean of the x-values and the y-values within each bin. I was browsing the R-docs but couldn't find anything appropiate. thanks for hints + kind regads, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] unbalanced design for anova with low number of replicates
Hello, I'm wondering what's the best way to analyse an unbalanced design with a low number of replicates. I'm not a statistician, and I'm looking for some direction for this problem. I've a 2 factor design: Factor batch with 3 levels, and factor dose within each batch with 5 levels. Dose level 1 in batch one is replicated 4 times, level 3 is replicated only 2 times. all other levels are replicated 3 times, except for batch level 3, for which dose 4 is missing. I've realised that the other of the factors is critical for the outcome of the anova (using lm and anova). I guess the impact wouldn't be strong if there was a reasonably large numbe rof replicates within each cell (even though not balanced). However, since I've only 0 to 4 replicates I'm worried that the standard anova may not be the way to go. Are there special packages for unbalanced designs like this? kind regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Perl--R interface
Hi, look at http://www.omegahat.org/RSPerl/index.html. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of XIAO LIU Sent: 23 June 2004 17:11 To: [EMAIL PROTECTED] Subject: [R] Perl--R interface R users: My R is 1.8.1 in Linux. How can I call R in Perl process? And call Perl from R? Thanks Xiao __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] help with memory greedy storage
Hello, I've a problem with a self written routine taking a lot of memory (1.2Gb). Maybe you can suggest some enhancements, I'm pretty sure that my implementation is not optimal ... I'm creating many linear models and store coefficients, anova p-values ... all I need in different lists which are then finally returned in a list (list of lists). The input is a matrix with 84 rows and 100,000 rows. The routine probeDf below creates a data frame that assigns the 84 rows to the different factors, but not just for one row but for several rows, depending what which(rows == g),] returns, and a new factor ('probe') is generated. This results in a 1344 by 6 data frame. Example data frame returned by probeDf: Value batch time dose array probe 1 2.317804 NEW 24h 000mM 1 1 2 2.495390 NEW 24h 000mM 2 1 3 2.412247 NEW 24h 000mM 3 1 ... 144 8.851469 OLD 04h 100mM60 2 145 8.801430 PRG 24h 000mM61 2 146 8.308224 PRG 24h 000mM62 2 ... This data frame is not the problem since, it gets generated on-the-fly per gene and is discarded afterwards (just that it takes some time to generate it). Here comes the problematic routine: ### emat: matrix, model: formular for lm, contr: optional contrasts probe.fit - function(emat, factors, model, contr=NULL) { rows - rownames(emat) genes - unique(rows) l - length(genes) ### generate proper lables (names) for the anova p-values difflabels - attr(terms(model),term.labels) aov- list() # anova p-values for factors + interactions coef - list() # lm coefficients coefp - list() # p-valuies for coefficients rsq- list() # R-squared of fit fitted - list() # fitted values value - list() # orig. values (used with fitted to get residuals) for ( g in genes ) { # loop over 12,000 genes ### g is the name that identifies 14 to 16 rows in emat ### d is the data frame for the lm d - probeDf(emat[which(rows == g),], facts) fit - lm(model, data = d, contrasts=contr) fit.sum - summary(fit) aov[[g]] - as.vector(na.omit(anova(fit)$'Pr(F)')) names(aov[[g]]) - difflabels coef[[g]] - coef(fit)[-1] coefp[[g]] - coef(fit.sum)[-1,'Pr(|t|)'] rsq[[g]]- fit.sum$'r.squared' value[[g]] - d$Value fitted[[g]] - fitted(fit) } list(aov=aov, coefs=coef, coefp=coefp, rsq=rsq, fitted=fitted, values=values) } ### create a data frame from a matrix (usually 16 rows and 84 columns) ### and a list of factors. Basically this repates the factors 16 times ### (for each row in the matrix). This results in a data frame with 84*16 ### rows as many columns as there are factors + 2 (probe factor + value ### to be modeled later) probeDf - function(emat, facts) { df - NULL n - 1 nsamp - ncol(emat) for ( i in 1:nrow(emat) ) { values - c(t(emat[i,])) df.new - data.frame(Value = values, facts, probe = rep(n, nsamp)) n - n + 1 if ( !is.null(df) ) { df - rbind(df, df.new) } else { df - df.new } } df$probe - as.factor(df$probe) df } If I remove coef, coefp, value and fitted from the loop in probe.fit the memory usage is moderate. The problem is that each of the 12,000 genes contributes 148 coefficients (the model contains quite a few factors) and p-values, the fitted and value vectors are 1300 elements long. I couldn't find a more compact form of storage that I is still easy to explore afterwards. Suggestions on how to get this done more efficiently (in terms of memory) are greatfully received. kind regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] storage of lm objects in a database
Hello, I'd like to use DBI to store lm objects in a database. I've to analyze many of linear models and I cannot store them in a single R-session (not enough memory). Also it'd be nice to have them persistent. Maybe it's possible to create a compact binary representation of the object (the kind of format created created by save), so that one doesn't need to write a conversion routine for these objects (or maybe there's already a conversion available for lm?). I assume that the data do not need to be analyzed with a any other software than R. I'm happy for any suggestions and links to get some more info on this. kid regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R versus SAS: lm performance
Hello, A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all machines have similar hardware, but the S+ and SAS machines are on windows whereas the R machine is Redhat Linux 7.2. My question is if I'm doing something wrong (technically) calling the lm routine, or (if not), how I can optimize the call to lm or even using an alternative to lm. I'd like to run about 12,000 of these models in R (for a gene expression experiment - one model per gene, which would take far too long). I've run the follwong code in R (and S+): options(contrasts=c('contr.helmert', 'contr.poly')) The 1st colum is the value to be modeled, and the others are factors. names(df.gene1data) - c(Va, Ba, Ti, Do, Ar, Pr) df[c(1:2,1343:1344),] VaDo Ti Ba ArPr 12.317804 000mM 24h NEW 1 1 22.495390 000mM 24h NEW 2 1 8315 2.979641 025mM 04h PRG 8316 8415 4.505787 000mM 04h PRG 8416 this is a dataframe with 1344 rows. x - Sys.time(); wlm - lm(Va ~ Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti:Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar, data=df, singular=T); difftime(Sys.time(), x) Time difference of 15.3 mins anova(wlm) Analysis of Variance Table Response: Va Df Sum Sq Mean Sq F valuePr(F) Ba20.1 0.10.4262 0.653133 Ti12.6 2.6 16.5055 5.306e-05 *** Do46.8 1.7 10.5468 2.431e-08 *** Pr 15 5007.4 333.8 2081.8439 2.2e-16 *** Ba:Ti 23.2 1.69.8510 5.904e-05 *** Ba:Do 72.8 0.42.5054 0.014943 * Ba:Pr30 80.6 2.7 16.7585 2.2e-16 *** Ti:Do 48.7 2.2 13.5982 9.537e-11 *** Ti:Pr152.4 0.21.0017 0.450876 Do:Pr60 10.2 0.21.0594 0.358551 Ba:Ti:Do 71.4 0.21.2064 0.296415 Ba:Ti:Pr 305.6 0.21.1563 0.259184 Ba:Do:Pr105 14.2 0.10.8445 0.862262 Ti:Do:Pr 60 14.8 0.21.5367 0.006713 ** Ba:Ti:Do:Pr 105 15.8 0.20.9382 0.653134 Ba:Ti:Do:Ar 56 26.4 0.52.9434 2.904e-11 *** Residuals 840 134.7 0.2 The corresponding SAS program from my collegue is: proc glm data = the name of the data set; class B T D A P; model V = B T D P B*T B*D B*P T*D T*P D*P B*T*D B*T*P B*D*P T*D*P B*T*D*P A(B*T*D); run; Note, V = Va, B = Ba, T = Ti, D = Do, P = Pr, A = Ar of the R-example kind regards + thanks a lot for your help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] R versus SAS: lm performance
Hello, thanks for your reply. I've now done the profiling, and I interpret that the most time is spend in the fortran routine(s): Each sample represents 0.02 seconds. Total run time: 920.21999453 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in function alone. % total % self totalseconds selfsecondsname 100.00920.22 0.02 0.16 lm 99.96919.88 0.10 0.88 lm.fit 99.74917.84 99.74917.84 .Fortran 0.07 0.66 0.02 0.14 storage.mode- 0.06 0.52 0.00 0.00 eval 0.06 0.52 0.04 0.34 as.double 0.02 0.22 0.02 0.22 colnames- 0.02 0.20 0.02 0.20 structure 0.02 0.18 0.02 0.18 model.matrix.default 0.02 0.18 0.02 0.18 as.double.default 0.02 0.18 0.00 0.00 model.matrix 0.01 0.08 0.01 0.08 list % self% total self secondstotalsecondsname 99.74917.84 99.74917.84 .Fortran 0.10 0.88 99.96919.88 lm.fit 0.04 0.34 0.06 0.52 as.double 0.02 0.22 0.02 0.22 colnames- 0.02 0.20 0.02 0.20 structure 0.02 0.18 0.02 0.18 as.double.default 0.02 0.18 0.02 0.18 model.matrix.default 0.02 0.16100.00920.22 lm 0.02 0.14 0.07 0.66 storage.mode- 0.01 0.08 0.01 0.08 list I guess this actually means I cannot do anything about it ... other than maybe splitting the problem into different (independaent parts - which I actually may be able to). Regarding the usage of lm.fit instead of lm, this might be a good idea, since I am using the same model.matrix for all fits! However, I'd need to recreate an lm object from the output, because I'd like to run the anova function on this. I'll first do some profiling on lm versus lm.fit for the 12,000 models ... kind regards + thanks again for your help, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: 11 May 2004 09:08 To: Muller, Arne PH/FR Cc: [EMAIL PROTECTED] Subject: Re: [R] R versus SAS: lm performance The way to time things in R is system.time(). Without knowing much more about your problem we can only guess where R is spending the time. But you can find out by profiling -- see `Writing R Extensions'. If you want multiple fits with the same design matrix (do you?) you could look at the code of lm and call lm.fit repeatedly yourself. On Mon, 10 May 2004 [EMAIL PROTECTED] wrote: Hello, A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all machines have similar hardware, but the S+ and SAS machines are on windows whereas the R machine is Redhat Linux 7.2. My question is if I'm doing something wrong (technically) calling the lm routine, or (if not), how I can optimize the call to lm or even using an alternative to lm. I'd like to run about 12,000 of these models in R (for a gene expression experiment - one model per gene, which would take far too long). I've run the follwong code in R (and S+): options(contrasts=c('contr.helmert', 'contr.poly')) The 1st colum is the value to be modeled, and the others are factors. names(df.gene1data) - c(Va, Ba, Ti, Do, Ar, Pr) df[c(1:2,1343:1344),] VaDo Ti Ba ArPr 12.317804 000mM 24h NEW 1 1 22.495390 000mM 24h NEW 2 1 8315 2.979641 025mM 04h PRG 8316 8415 4.505787 000mM 04h PRG 8416 this is a dataframe with 1344 rows. x - Sys.time(); wlm - lm(Va ~ Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti :Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar, data=df, singular=T); difftime(Sys.time(), x) Time difference of 15.3 mins anova(wlm) Analysis of Variance Table Response: Va Df Sum Sq Mean Sq F valuePr(F) Ba20.1 0.10.4262 0.653133 Ti12.6 2.6 16.5055 5.306e-05 *** Do46.8 1.7 10.5468 2.431e-08 *** Pr 15 5007.4 333.8 2081.8439 2.2e-16 *** Ba:Ti 23.2 1.69.8510 5.904e-05 *** Ba:Do 72.8 0.42.5054 0.014943 * Ba:Pr30 80.6 2.7 16.7585 2.2e-16 *** Ti:Do 48.7 2.2 13.5982 9.537e-11 *** Ti:Pr152.4 0.21.0017 0.450876 Do:Pr60 10.2
RE: [R] R versus SAS: lm performance
Thanks All, for your help. There seems to be a lot I can try to speed up the fits. However, I'd like to go for a much simpler model which I think is justified by the experiment itself, e.g; I may think about removing the nestinh (Ba:Ti:Do)/Ar. The model matrix has 1344 rows and 2970 columns, and the rank of the matrix is 504. Therefore I think I should reformulate the model. I was just stroke my the massive difference in performance when my collegue told me about the difference between SAS and S+. kind regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: 11 May 2004 14:20 To: Muller, Arne PH/FR; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: RE: [R] R versus SAS: lm performance I tried the following on an Opteron 248, R-1.9.0 w/Goto's BLAS: y - matrix(rnorm(14000*1344), 1344) x - matrix(runif(1344*503),1344) system.time(fit - lm(y~x)) [1] 106.00 55.60 265.32 0.00 0.00 The resulting fit object is over 600MB. (The coefficient compoent is a 504 x 14000 matrix.) If I'm not mistaken, SAS sweeps on the extended cross product matrix to fit regression models. That, I believe, in usually faster than doing QR decomposition on the model matrix itself, but there are trade-offs. You could try what Prof. Bates suggested. Andy From: [EMAIL PROTECTED] Hello, thanks for your reply. I've now done the profiling, and I interpret that the most time is spend in the fortran routine(s): Each sample represents 0.02 seconds. Total run time: 920.21999453 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in function alone. % total % self totalseconds selfsecondsname 100.00920.22 0.02 0.16 lm 99.96919.88 0.10 0.88 lm.fit 99.74917.84 99.74917.84 .Fortran 0.07 0.66 0.02 0.14 storage.mode- 0.06 0.52 0.00 0.00 eval 0.06 0.52 0.04 0.34 as.double 0.02 0.22 0.02 0.22 colnames- 0.02 0.20 0.02 0.20 structure 0.02 0.18 0.02 0.18 model.matrix.default 0.02 0.18 0.02 0.18 as.double.default 0.02 0.18 0.00 0.00 model.matrix 0.01 0.08 0.01 0.08 list % self% total self secondstotalsecondsname 99.74917.84 99.74917.84 .Fortran 0.10 0.88 99.96919.88 lm.fit 0.04 0.34 0.06 0.52 as.double 0.02 0.22 0.02 0.22 colnames- 0.02 0.20 0.02 0.20 structure 0.02 0.18 0.02 0.18 as.double.default 0.02 0.18 0.02 0.18 model.matrix.default 0.02 0.16100.00920.22 lm 0.02 0.14 0.07 0.66 storage.mode- 0.01 0.08 0.01 0.08 list I guess this actually means I cannot do anything about it ... other than maybe splitting the problem into different (independaent parts - which I actually may be able to). Regarding the usage of lm.fit instead of lm, this might be a good idea, since I am using the same model.matrix for all fits! However, I'd need to recreate an lm object from the output, because I'd like to run the anova function on this. I'll first do some profiling on lm versus lm.fit for the 12,000 models ... kind regards + thanks again for your help, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: 11 May 2004 09:08 To: Muller, Arne PH/FR Cc: [EMAIL PROTECTED] Subject: Re: [R] R versus SAS: lm performance The way to time things in R is system.time(). Without knowing much more about your problem we can only guess where R is spending the time. But you can find out by profiling -- see `Writing R Extensions'. If you want multiple fits with the same design matrix (do you?) you could look at the code of lm and call lm.fit repeatedly yourself. On Mon, 10 May 2004 [EMAIL PROTECTED] wrote: Hello, A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all machines have similar hardware, but the S+ and SAS machines are on windows whereas the R machine is Redhat Linux 7.2. My question is if I'm doing something wrong (technically)
[R] strange result with contrasts
Hello, I'm trying to reproduce some SAS result wit R (after I got suspicious with the result in R). I struggle with the contrasts in a linear model. I've got three factors d$dose - as.factor(d$dose) # 5 levels d$time - as.factor(d$time) # 2 levels d$batch - as.factor(d$batch) # 3 levels the data frame d contains 82 rows. There are 2 to 4 replicates of each dose within each time point and each batch. There's one dose completely missing from one batch. I then generate Dunnett contrasts using the multicomp library: contrasts(d$dose) - contr.Dunnett(levels(d$dose), 1) contrasts(d$time) - contr.Dunnett(levels(d$time), 1) contrasts(d$batch) - contr.Dunnett(levels(d$batch), 1) For the moment I'm just looking at the dose effects of the complete model: summary(lm(value ~ dose * time * batch, data = d))$coefficients[1:5,] Estimate Std. Error t value Pr(|t|) (Intercept) 6.80211741 0.01505426 451.8399839 1.962247e-101 dose010mM-000mM -0.03454211 0.04113846 -0.8396549 4.046723e-01 dose025mM-000mM -0.01972550 0.04288981 -0.4599111 6.473607e-01 dose050mM-000mM -0.12015983 0.05356935 -2.2430704 2.886726e-02 - significant dose100mM-000mM 0.01252061 0.04113846 0.3043529 7.619872e-01 A collegue of mine has run the same data through a SAS program (listed below) proc glm data = dftest; class dose time batch; model value = dose|time|batch; means dose / dunnett ('000mM'); lsmeans dose /pdiff singular=1; run; Giving the following p-values: Pr(|t|) dose010mM-000mM 0.4047 dose025mM-000mM 0.6474 dose050mM-000mM 0.5745 --- dose100mM-000mM 0.7620 The p-values are the same expect for the one indicated. A stripchart for the data in R shows that dose050mM-000mM should not be significant (it doesn't look different from e.g. dose025mM-000mM). Do you've any suggestions what I'm doing wrong here (assuming that I believe the SAS result)? Any hints what I can do to further analyse this problem? Many thanks for your help, +regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Storing p-values from a glm
Hi, for example one could do it this way: v - summary(fit)$coefficients[,4] the coefficient attribute is a matrix, and with the 4 you refere to the pvalue (at least in lm - don't know if summary(glm) produces sligthely different output). to skip the intercept (1st row): v - summary(glmfit)$coefficients[-1,4] hope this helps, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Roy Sanderson Sent: 06 April 2004 14:36 To: [EMAIL PROTECTED] Subject: [R] Storing p-values from a glm Hello I need to store the P-statistics from a number of glm objects. Whilst it's easy to display these on screen via the summary() function, I'm not clear on how to extract the P-values and store them in a vector. Many thanks Roy -- -- Roy Sanderson Centre for Life Sciences Modelling Porter Building University of Newcastle Newcastle upon Tyne NE1 7RU United Kingdom Tel: +44 191 222 7789 [EMAIL PROTECTED] http://www.ncl.ac.uk/clsm __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] number point under-flow
Hello, I've come across the following situation in R-1.8.1 (compile + running under RedHat 7.1): phyper(24, 514, 5961-514, 53, lower.tail=T) [1] 1 phyper(24, 514, 5961-514, 53, lower.tail=F) [1] -1.037310e-11 I'd expect the later to be 0 or some very small positive number. Is this a number under-flow of the calculation? Do you think I'm safe if I just set the result to 0 in these cases? kind regards, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] number point under-flow
Hi, yes, I did compile it with gcc 2.96 ... . Do you've an estimate on how bad this error is, e.g. how much it effects the calculations in R? kind regards, Arne -Original Message- From: Roger D. Peng [mailto:[EMAIL PROTECTED] Sent: 04 February 2004 14:49 To: Muller, Arne PH/FR Cc: [EMAIL PROTECTED] Subject: Re: [R] number point under-flow Did you compile with gcc-2.96? I think there were some problems with the floating point arithmetic with that compiler (at least for the earlier versions released by Red Hat). -roger [EMAIL PROTECTED] wrote: Hello, I've come across the following situation in R-1.8.1 (compile + running under RedHat 7.1): phyper(24, 514, 5961-514, 53, lower.tail=T) [1] 1 phyper(24, 514, 5961-514, 53, lower.tail=F) [1] -1.037310e-11 I'd expect the later to be 0 or some very small positive number. Is this a number under-flow of the calculation? Do you think I'm safe if I just set the result to 0 in these cases? kind regards, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cochran-Mantel-Haenszel problem
Hello, I've tried to analyze some data with a CMH test. My 3 dimensional contingency tables are 2x2xN where N is usually between 10 and 100. The problem is that there may be 2 strata with opposite counts (the 2x2 contigency table for these are reversed), producing opposite odds ratios that cancle out in the overall statistics. These opposite counts are very important for my analysis, since they account for a dramatic difference. Could you recommend alternative tests that take account for opposite counts? Would you suggest a different strategy to analyze such data? thanks a lot for your suggestions, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multidimensional Fisher or Chi square test
Hello, Is there a test for independence available based on a multidimensional contingency table? I've about 300 processes, and for each of them I get numbers for failures and successes. I've two or more conditions under which I test these processes. If I had just one process to test I could just perform a fisher or chisquare test on a 2x2 contigency table, like this: for one process: conditionA conditionB ok 20 6 failed 190 156 From the table I can figure out if the outcome (ok/failed) is bound to one of the conditions for a process. However, I'd like to know how different the 2 conditions are from each other considering all 300 processes, and I consider the processes to be an additional dimension. My H0 is that both conditions are overall (considering all processes) the same. Could you give me a hint what kind of test of package I should look into? kind regars + thanks for your help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] significance in difference of proportions
Hello, thanks for the replies to this subject. I'm using a fisher.test to test if the proportions of my 2 samples are different (see Ted's example below). The assumption was that the two samples are from the same population and that they may contain a different number of positives (due to different treatment). I may be able to figues out the true probability to get a positive, since I for some of my experiments I know the entire population. E.g. the samples (111 items, and 10 items) come from a population of 10,000 items, and I know that there are 200 positives in the population. Is it possible to use the fisher test for testing equallity of proportions and to include the known probability to find a positive - would that make sense at all? If the two samples come from the same population the probability to find a positive shouldn't influence the test for difference of proportions, should it? At some point I'd like to extend the statistics so that the two samples can come from 2 different populations (with known probability for the positives). I'm happy to receive suggestions and comments on this. thanks a lot again for your help, Arne On 27-Nov-03 [EMAIL PROTECTED] wrote: I've 2 samples A (111 items) and B (10 items) drawn from the same unknown population. Witihn A I find 9 positives and in B 0 positives. I'd like to know if the 2 samples A and B are different, ie is there a way to find out whether the number of positives is significantly different in A and B? Pretty obviously not, just from looking at the numbers: 9 out of 111 - p = P(positive) approx = 1/10 P(0 out of 10 when p = 1/10) is not unlikely (in fact = 0.35). However, a Fisher exact test will give you a respectable P-value: library(ctest) ?fisher.test fisher.test(matrix(c(102,9,10,0),nrow=2)) [...] p-value = 1 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.00 6.088391 fisher.test(matrix(c(102,9,9,1),nrow=2)) p-value = 0.5926 fisher.test(matrix(c(102,9,8,2),nrow=2)) p-value = 0.2257 fisher.test(matrix(c(102,9,7,3),nrow=2)) p-value = 0.0605 fisher.test(matrix(c(102,9,6,4),nrow=2)) p-value = 0.01202 So there's a 95% CI (0,6.1) for the odds ratio which, for identical probabilities of +, is 1.0 hence well within the CI. And, keeping the numbers for the larger sample fixed for simplicity, you have to go quite a way with the smaller one to get a result significant at 5%: (102,9):(7,3) - P = 0.06 (102,9):(6,4) - P = 0.01 and, to have 80% power (0.8 probability of this event), the probability of + in the second sample would have to be as high as 0.41. Conclusion: your second sample size is quite inadequate except to detect rather large differences between the true proportions in the two cases! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 27-Nov-03 Time: 17:43:00 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] significance in difference of proportions
Hello, I'm looking for some guidance with the following problem: I've 2 samples A (111 items) and B (10 items) drawn from the same unknown population. Witihn A I find 9 positives and in B 0 positives. I'd like to know if the 2 samples A and B are different, ie is there a way to find out whether the number of positives is significantly different in A and B? I'm currently using prop.test, but unfortunately some of my data contains less than 5 items in a group (like in the example above), and the test statistics may not hold: prop.test(c(9,0), c(111,10)) 2-sample test for equality of proportions with continuity correction data: c(9, 0) out of c(111, 10) X-squared = 0.0941, df = 1, p-value = 0.759 alternative hypothesis: two.sided 95 percent confidence interval: -0.02420252 0.18636468 sample estimates: prop 1 prop 2 0.08108108 0. Warning message: Chi-squared approximation may be incorrect in: prop.test(c(9, 0), c(111, 10)) Do you have suggestions for an alternative test? many thanks for your help, +kind regards, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] FDR in p.adjust
Hello, I've a question about the fdr method in p.adjust: What is the threshold of the FDR, and is it possible to change this threshold? As I understand the FDR (please correct) it adjusts the p-values so that for less than N% (say the cutoff is 25%) of the alternative hypothesis the Null is in fact true. thanks a lot for help, +regards, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] why does data frame subset return vector
Hello, I've a weired problem with a data frame. Basically it should be just one column with specific names coming from a data file (the file contains 2 rows, one should be the for the rownames of the data frame the other contains numeric values). df.rr - read.table(RR_anova.txt, header=T, comment.char=, row.names=1) df.rr[c(1,2,3),] [1] 1.11e-16 1.11e-16 1.11e-16 Why are the rownames not displayed? The data file itself look slike this: df.rr - read.table(RR_anova.txt, header=T, comment.char=) df.rr[c(1,2,3),] QUAL PVALUE 1AJ224120_at 1.11e-16 2 rc_AA893000_at 1.11e-16 3 rc_AA946368_at 1.11e-16 and assigning the rownames explicitely works as I'd expect: rownames(df.rr) - df.rr$'QUAL' df.rr[c(1,2,3),] QUAL PVALUE AJ224120_at AJ224120_at 1.11e-16 rc_AA893000_at rc_AA893000_at 1.11e-16 rc_AA946368_at rc_AA946368_at 1.11e-16 Ok, now they are displayed, but it's a duplication to keep the QUAL colum. below I create the a new data frame to skip the QUAL column, since it is already a rowname. df.rr2 - data.frame(PVALUE=df.rr, row.names=1) df.rr2[1:4,] [1] 1.11e-16 1.11e-16 1.11e-16 1.11e-16 However, the rowname is still there ..., you just cannot see it: df.rr2[AJ224120_at,] [1] 1.11e-16 The code below shows that sub-setting the df.rr data frame in deed creates a vector rather than a data frame whereas sub-setting the 2 column data frame returns a new data frame (as I'd expect). df.rr[1:4,] [1] 1.11e-16 1.11e-16 1.11e-16 1.11e-16 is.vector(df.rr[1:4,]) [1] TRUE is.data.frame(df.rr[1:4,]) [1] FALSE df.rr - read.table(CLO_RR_anova.txt, header=T, comment.char=) is.data.frame(df.rr[1:4,]) [1] TRUE Any explanation is appreciated. There must be a good reason for this I guess ... . On the other hand is there a way to fore the subset of the 1 colum data frame to be dataframe itself? I'd just like to see the rownames displayed, that's it ... thanks alot for your help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] sub data frame by expression
Hi All, I've the following data frame with 54 rows and 4 colums: x Ratio Dose Time Batch R.010mM.04h.NEW0.02 010mM 04h NEW R.010mM.04h.NEW.1 0.07 010mM 04h NEW ... R.010mM.24h.NEW.2 0.06 010mM 24h NEW R.010mM.04h.OLD0.19 010mM 04h OLD ... R.010mM.04h.OLD.1 0.49 010mM 04h OLD R.100mM.24h.OLD0.40 100mM 24h OLD I'd like to create a sub data frame containing all rows where Batch == OLD and keeping the 4 colums. Assume that I don't know the order of the rows (otherwise I could just do something like x[1:20,]). I've tried x[x$Batch == 'OLD'] or x[x[,4] == 'OLD'] but it generates errors. So I assume I've still not realy understood the philosophy of indexing ... :-( What's the easiest way to do this, any suggestions? thanks a lot for you help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] sub data frame by expression
Sorry, I just figured it out: x[x$Batch == 'OLD',] instead of x[x$Batch == 'OLD']. I didn't know this has to be in the same format then x[1:20,] where I already used the comma. sorry for posting the previous message ... Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED] Sent: 17 October 2003 12:12 To: [EMAIL PROTECTED] Subject: [R] sub data frame by expression Hi All, I've the following data frame with 54 rows and 4 colums: x Ratio Dose Time Batch R.010mM.04h.NEW0.02 010mM 04h NEW R.010mM.04h.NEW.1 0.07 010mM 04h NEW ... R.010mM.24h.NEW.2 0.06 010mM 24h NEW R.010mM.04h.OLD0.19 010mM 04h OLD ... R.010mM.04h.OLD.1 0.49 010mM 04h OLD R.100mM.24h.OLD0.40 100mM 24h OLD I'd like to create a sub data frame containing all rows where Batch == OLD and keeping the 4 colums. Assume that I don't know the order of the rows (otherwise I could just do something like x[1:20,]). I've tried x[x$Batch == 'OLD'] or x[x[,4] == 'OLD'] but it generates errors. So I assume I've still not realy understood the philosophy of indexing ... :-( What's the easiest way to do this, any suggestions? thanks a lot for you help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] sub data frame by expression
Hi, thanks for your replies regarding the problem to select a sub data frame by expression. I start getting an understanding on how indexing works in R. thanks for your replies, Arne -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: 17 October 2003 12:38 To: Muller, Arne PH/FR Cc: [EMAIL PROTECTED] Subject: Re: [R] sub data frame by expression On Fri, 17 Oct 2003 [EMAIL PROTECTED] wrote: I've the following data frame with 54 rows and 4 colums: x Ratio Dose Time Batch R.010mM.04h.NEW0.02 010mM 04h NEW R.010mM.04h.NEW.1 0.07 010mM 04h NEW ... R.010mM.24h.NEW.2 0.06 010mM 24h NEW R.010mM.04h.OLD0.19 010mM 04h OLD ... R.010mM.04h.OLD.1 0.49 010mM 04h OLD R.100mM.24h.OLD0.40 100mM 24h OLD I'd like to create a sub data frame containing all rows where Batch == OLD and keeping the 4 colums. Assume that I don't know the order of the rows (otherwise I could just do something like x[1:20,]). I've tried x[x$Batch == 'OLD'] or x[x[,4] == 'OLD'] but it generates errors. That subsets columns, not rows. Try x[x$Batch == OLD,] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] A data frame of data frames
Hello, I'm trying to set up the flowwing data structure in R: A data frame with 7,000 rows and 4 colums. The rownames have some special meaning (they are names of genes). The 1st column per row is itself a data frame, and columns 2 to 4 will keep numeric values. The data frame contained in the 1st column will have 54 rows (with special names) and 4 colums (1st col is a response, cols 2- 4 are factors). Each of these data frames with the response/factors will be fed into an 3way linear model for anova. The other colums of the 1st data will hold the p-values. Basically running 7,000 anovas is very quick but the reformating of the data so that it is suitable for the anova takes a long time (45 minutes). So I'd just like to keep the generated data structure as a persistent R object. I haven't managed to store the 2nd data frame in the 1st colum of the 1st data frame. From other languages such as C I'd know how to setup this kind of data structure (pointers), but I get stuck in R (I guess I'm still struggling the way the R philosophy on how to present data structures). Do you've any suggestions on how to do this? kind regards, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] New R - recompiling all packages
Hi All, I'm running R 1.7.1, and I've installed some additional packages such a Bioconductor. Do I've to re-install all the additional packages when ugrading to R 1.8.0 (i.e. are there compile in dependencies)? thanks for your help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] updating via CRAN and http
Hello, thanks for the tips on updating packages for 1.8.0. The updating is a real problem for me, since I've to do it sort of manually using my web-browser or wget. I'm behind a firewall that requires http/ftp authentification (username and passwd) for every request it sends to a server outside our intranet. Therefore all the nice tools for automatic updating (cran, cpan ...) don't for me (I've tried). I understand that the non-paranoid rest of the world can't be bothered, but is there any intenstion to include such authentification into the update procedures of R? I think for ftp it's kind of tricky, but at least for http the authentification seems to be straight forward. kind regards, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] updating via CRAN and http
Sorry, I didn' mean it the nasty way. I wouldn't have been surprised if the R-team had told me the authentification with the firewall is my problem (i.e. a special case that cannot be dealt with by th R-team). Yess, and off course I should have had a much closer lookk into the docu. Thanks again for the hint + please forgive! +regards, Arne -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: 08 October 2003 17:20 To: Muller, Arne PH/FR Cc: [EMAIL PROTECTED] Subject: Re: [R] updating via CRAN and http On Wed, 8 Oct 2003 [EMAIL PROTECTED] wrote: Hello, thanks for the tips on updating packages for 1.8.0. The updating is a real problem for me, since I've to do it sort of manually using my web-browser or wget. I'm behind a firewall that requires http/ftp authentification (username and passwd) for every request it sends to a server outside our intranet. Therefore all the nice tools for automatic updating (cran, cpan ...) don't for me (I've tried). I understand that the non-paranoid rest of the world can't be bothered, but is there any intenstion to include such authentification into the update procedures of R? I think for ftp it's kind of tricky, but at least for http the authentification seems to be straight forward. It's available for http: see ?download.file, and you can even configure that to use wget. Your comments are very much misplaced: we *have* bothered to provide the facilities for you. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Jonckheere-Terpstra test
Hello, can anybody here explain what a Jonckheere-Terpstra test is and whether it is implemented in R? I just know it's a non-parametric test, otherwise I've no clue about it ;-( . Are there alternatives to this test? thanks for help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multi-dimensional hash
Hello, I was wondering what's the best data structure in R for a multi-dimensional lookup table, and how to implement it. I've several categories say A, B, C ... and within each of these categories there are other categories such as a, b, c, ... . There can be up to 5 dimensions. The actual value for [A][a]... is then a vector. I'm looking forward to any suggestions, +thanks very much for your help, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R book
Hi All, I'd be interested in your opinions of the book Introductory Statistics with R by Peter Dalgaard Does it well describe the R object concept, the language itself and statistical aspects (I am not a statistician)? thanks for your opinion, Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] No joy installing R with shared libs
Hi, I've experienced similar failures with the RSperl installation. So I'd be interested if someone sorts out the library misery ... ;-) Arne -Original Message- From: Laurent Faisnel [mailto:[EMAIL PROTECTED] Sent: 09 September 2003 12:48 To: [EMAIL PROTECTED] Subject: Re: [R] No joy installing R with shared libs Can some kind soul please give me a fool proof recipe for building R and RSPython so that it actually works? I don't have a recipe, but one thought to help debug the process: Try installing RPy [1]. RPy also provides access to R via Python and uses the libR.so library. If you can install and import rpy without problem then it must be an issue with RSPython. Hi, I had problems of the same kind recently and finally gave up. I tried to install Rpy without success, errors with undetected libraries occured while I was making the import rpy from python (especially with libblas). Since I was not sure R was correctly configured I downloaded the latest version R-1.7.1 and tried to install it with R-enable-shared option. I could not get out of numerous errors. Please tell me whether the problem you had calling RSPython is solved after installing RPy (if it was possible to install it). Good luck. Laurent __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] all values from a data frame
Hello, I've a data frame with 15 colums and 6000 rows, and I need the data in a single vector of size 9 for ttest. Is there such a conversion function in R, or would I have to write my own loop over the colums? thanks for your help + kind regards Arne __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help