Re: [R] Need Help with robustbase package: fitnorm2 and plotnorm2
On Fri, 8 Jun 2007, M. Jankowski wrote: This is my first post requesting help to this mailing list. I am new to R. My apologies for any breach in posting etiquette. For future reference, telling us your version of R and exact OS would have helped here. The R posting guide suggests showing the output of sessionInfo(). Also, to help the readers, fitNorm2 (R is case-sensitive) is in 'prada', and the missing package is rrcov not robustbase. I am new to this language and just learning my way around. I am attempting to run some sample code and and am confused by the error message: Loading required package: rrcov Error in fitNorm2(fdat[, FSC-H], fdat[, SSC-H], scalefac = ScaleFactor) : Required package rrcov could not be found. In addition: Warning message: there is no package called 'rrcov' in: library(package, lib.loc = lib.loc, character.only = TRUE, logical = TRUE, that I get when I attempt to run the following sample snippet of code. The error above is taken from the code below. I am running Ubuntu Linux with all the r packages listed in the Synaptic package manager (universa). I loaded the prada bioconductor package as instructed in the comments and the robustbase was downloaded and installed with the command: sudo R CMD INSTALL robustbase_0.2- 7.tar.gz, the robustbase folder is in /usr/local/lib/R/site-library/ When I type in 'library(robustbase)' no error appears; I believe robustbase is installed correctly. The sample code was taken from FCS-prada.pdf. The sample code was written in 2005, I understand that rrcov was made part of the robustbase package sometime in the past year. This may be the cause of the problem, but, if it is, I have no idea how to fix it. That is not the case: rrcov is a separate package, and one prada depends on. So somehow you have managed to install prada without an essential dependency 'rrcov'. That looks like a problem in the Debian/Ubuntu packaging of prada. (There is a list R-sig-debian for such issues.) Running install.packages(rrcov) inside R should fix this for you: if your R is not current (i.e. 2.5.0) you may need to run R as root for that session. (There may be a Debian package for rrcov for your OS and R version, but without further details I cannot check.) In the current version of prada (1.12.0 for BioC-2.0 for R 2.5.0) rrcov is in Imports, so probably your version of BioC is not current either. [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluating variables in the context of a data frame
On Thu, 7 Jun 2007, Zack Weinberg wrote: Given D = data.frame(o=gl(2,1,4)) this works as I expected: evalq(o, D) [1] 1 2 1 2 Levels: 1 2 but neither of these does: f - function(x, dat) evalq(x, dat) f(o, D) Error in eval(expr, envir, enclos) : object o not found g - function(x, dat) eval(x, dat) g(o, D) Error in eval(x, dat) : object o not found What am I doing wrong? This seems to be what the helpfiles say you do to evaluate arguments in the context of a passed-in data frame... When you call f(o, D), the argument 'o' is evaluated in the current environment ('context' in R means something different). Because of lazy evaluation, it is not evaluated until evalq is called, but it evaluated as if it was evaluated greedily. g(quote(o), D) will work. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] match rows of data frame
Hi R-experts, I have a data frame (A) , and a subset (B) of this data frame. I am trying to create a new data frame which gives me all the rows of B, plus the 5th next row(occuring in A). I have used the below code, but it gives me all 5 rows after the matching row. I only want the 5th. FiveDaysLater - A[c(sapply(match(rownames(B),rownames(A)), seq, length=6))),] Any guidance much appreciated, Thankyou. Alfonso Sammassimo Melbourne, Australia. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Barplots: Editing the frequency x-axis names
Hi I have a timeSeries object (X) with monthly returns. I want to display the returns with a barplot, which I can fix easily. But my problem is labaling the x-axis, if I use the positions from the timeseries It gets very messy. I have tried rotating and changing the font size but it doesn't do the trick. I think the optimal solution for my purpose is too only display every second or third date, pherhaps only use every 12 month. But how do I do that? Thanks Tom -- View this message in context: http://www.nabble.com/Barplots%3A-Editing-the-frequency-x-axis-names-tf3888029.html#a11021315 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to partition sample space
Hi R-users, I need your help in the following problem. Suppose we have a regression problem containing 25 predictor variables of 1000 individuals. I want to divide the data matrix ( 1000 x 25 ) into two partitions for training (70%) and testing(30%). For this reason, i sample 70% of data into another training matrix and remaining 30% into testing matrix using pseudorandom numbers (for future analysis). I need some efficient solution so that we can generate both matrix with minimal time. Thanks in advance. Sabyasachi -- View this message in context: http://www.nabble.com/How-to-partition-sample-space-tf3888059.html#a11021390 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to partition sample space
Hi, you could use the sample function: sample-sample(1:1000) m.training-m[sample[1:700],] m.test-m[sample[701:1000],] Matthias spime wrote: Hi R-users, I need your help in the following problem. Suppose we have a regression problem containing 25 predictor variables of 1000 individuals. I want to divide the data matrix ( 1000 x 25 ) into two partitions for training (70%) and testing(30%). For this reason, i sample 70% of data into another training matrix and remaining 30% into testing matrix using pseudorandom numbers (for future analysis). I need some efficient solution so that we can generate both matrix with minimal time. Thanks in advance. Sabyasachi -- View this message in context: http://www.nabble.com/How-to-partition-sample-space-tf3888059.html#a11021527 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sorting dataframe by different columns
Dear list, I have a very short question, Suggest a dataframe of four columns. df - data.frame(w,x,y,z) I want this ordered the following way: first by :x, decreasing = FALSE and secondly by: z, decreasing =TRUE How can this be done ? Thanks Gunther __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting dataframe by different columns
probably the function sort.data.frame() posted in R-help some time ago can be useful; check: RSiteSearch(sort.data.frame) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Gunther Höning [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, June 08, 2007 8:58 AM Subject: [R] Sorting dataframe by different columns Dear list, I have a very short question, Suggest a dataframe of four columns. df - data.frame(w,x,y,z) I want this ordered the following way: first by :x, decreasing = FALSE and secondly by: z, decreasing =TRUE How can this be done ? Thanks Gunther __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplots: Editing the frequency x-axis names
Tom.O schrieb: Hi I have a timeSeries object (X) with monthly returns. I want to display the returns with a barplot, which I can fix easily. But my problem is labaling the x-axis, if I use the positions from the timeseries It gets very messy. I have tried rotating and changing the font size but it doesn't do the trick. I think the optimal solution for my purpose is too only display every second or third date, pherhaps only use every 12 month. But how do I do that? Thanks Tom I think you could use: library(chron): f.e x - c(dates(02/27/92),dates(02/27/95)) y - c(10,50) plot(x, y) Regards Knut __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplots: Editing the frequency x-axis names
Hi thanks for the respone, but cant you be more specific with your example. I cant see that this will do the trick. What Im looking for is a function that remembers each position but only displays every n'th date. For example positionReturns Disply Date 2003-01-31 1 N 2003-02-28 2 N 2003-03-31 3 Yes 2003-04-30 4 N 2003-05-31 5 N 2003-06-30 6 Yes 2003-07-31 7 N 2003-08-31 8 N 2006-09-30 9 Yes and so on until present Where I want to display all the returns in a barplot, but where I only want to display every quarterly date in the plot??? Tom Knut Krueger-5 wrote: Tom.O schrieb: Hi I have a timeSeries object (X) with monthly returns. I want to display the returns with a barplot, which I can fix easily. But my problem is labaling the x-axis, if I use the positions from the timeseries It gets very messy. I have tried rotating and changing the font size but it doesn't do the trick. I think the optimal solution for my purpose is too only display every second or third date, pherhaps only use every 12 month. But how do I do that? Thanks Tom I think you could use: library(chron): f.e x - c(dates(02/27/92),dates(02/27/95)) y - c(10,50) plot(x, y) Regards Knut __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Barplots%3A-Editing-the-frequency-x-axis-names-tf3888029.html#a11021815 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplots: Editing the frequency x-axis names
Sorry I forgot the around the dates x - c(dates(01/31/03),dates(06/30/07)) But I think your problem is the plot area. You must first define the plot area with type =n for no plotting, afterwards you could fill in the data. I did this with times() but I am afraid the displayed dates/times will depend on your plot area and the settings with par() did you read the instructions for plot and par already? Regards Knut have a look to ?plot and to ?par __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] choose.dir
Hi all, I have written a R-script under Windows using choose.dir. Now, I have seen that this function is missing at MacOS. Does anybody know an alternative? Antje __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplots: Editing the frequency x-axis names -doouble post
Sorry for double posting - was wrong e-mail adress , thougt this one will run into Spam filter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
Hi, Can you provide examples of data formats that are problematic to read and clean with R ? The only problematic cases I have encountered were cases with multiline and/or varying length records (optional information). Then, it is sometimes a good idea to preprocess the data to present in a tabular format (one record per line). For this purpose, I use awk (e.g. http://www.vectorsite.net/tsawk.html), which is very adept at processing ascii data files (awk is much simpler to learn than perl, spss, sas, ...). I have never encountered a data file in ascii format that I could not reformat with Awk. With binary formats, it is another story... But, again, this is my limited experience; I would like to know if there are situations where using SAS/SPSS is really a better approach. Christophe Pallier On 6/8/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of the yellow Springer ones) that says, more briefly, the same thing. The R Data Import/Export page recommends examples using SAS, Perl, Python, and Java. It takes a bit of courage to say that ( when you go to a corporate software web site, you'll never see a page saying This is the type of problem that our product is not the best at, here's what we suggest instead ). I'd like to provide a few more suggestions, especially for volunteers who are willing to evaluate new candidates. SAS is fine if you're not paying for the license out of your own pocket. But maybe one reason you're using R is you don't have thousands of spare dollars. Using Java for data cleaning is an exercise in sado-masochism, Java has a learning curve (almost) as difficult as C++. There are different types of data transformation, and for some data preparation problems an all-purpose programming language is a good choice ( i.e. Perl , or maybe Python/Ruby ). Perl, for example, has excellent regular expression facilities. However, for some types of complex demanding data preparation problems, an all-purpose programming language is a poor choice. For example: cleaning up and preparing clinical lab data and adverse event data - you could do it in Perl, but it would take way, way too much time. A specialized programming language is needed. And since data transformation is quite different from data query, SQL is not the ideal solution either. There are only three statistical programming languages that are well-known, all dating from the 1970s: SPSS, SAS, and S. SAS is more popular than S for data cleaning. If you're an R user with difficult data preparation problems, frankly you are out of luck, because the products I'm about to mention are new, unknown, and therefore regarded as immature. And while the founders of these products would be very happy if you kicked the tires, most people don't like to look at brand new products. Most innovators and inventers don't realize this, I've learned it the hard way. But if you are a volunteer who likes to help out by evaluating, comparing, and reporting upon new candidates, well you could certainly help out R users and the developers of the products by kicking the tires of these products. And there is a huge need for such volunteers. 1. DAP This is an open source implementation of SAS. The founder: Susan Bassein Find it at: directory.fsf.org/math/stats (GNU GPL) 2. PSPP This is an open source implementation of SPSS. The relatively early version number might not give a good idea of how mature the data transformation features are, it reflects the fact that he has only started doing the statistical tests. The founder: Ben Pfaff, either a grad student or professor at Stanford CS dept. Also at : directory.fsf.org/math/stats (GNU GPL) 3. Vilno This uses a programming language similar to SPSS and SAS, but quite unlike S. Essentially, it's a substitute for the SAS datastep, and also transposes data and calculates averages and such. (No t-tests or regressions in this version). I created this, during the years 2001-2006 mainly. It's version 0.85, and has a fairly low bug rate, in my opinion. The tarball includes about 100 or so test cases used for debugging - for logical calculation errors, but not for extremely high volumes of data. The maintenance of Vilno has slowed down, because I am currently (desparately) looking for employment. But once I've found new employment and living quarters and settled in, I will continue to enhance Vilno in my spare time. The founder: that would be me, Robert Wilkins Find it at: code.google.com/p/vilno ( GNU GPL ) ( In particular, the tarball at code.google.com/p/vilno/downloads/list , since I have yet to figure out how to use Subversion ). 4. Who knows? It was not easy to find out about the existence of DAP and
Re: [R] Barplots: Editing the frequency x-axis names
On 6/8/07, Tom.O [EMAIL PROTECTED] wrote: Hi I have a timeSeries object (X) with monthly returns. I want to display the returns with a barplot, which I can fix easily. But my problem is labaling the x-axis, if I use the positions from the timeseries It gets very messy. I have tried rotating and changing the font size but it doesn't do the trick. I think the optimal solution for my purpose is too only display every second or third date, pherhaps only use every 12 month. But how do I do that? It's quite easy to do that with ggplot2, see below, or http://had.co.nz/ggplot2/scale_date.html for examples. df - data.frame( date = seq(Sys.Date(), len=100, by=1 day)[sample(100, 50)], price = runif(50) ) qplot(date, price, data=df, geom=line) qplot(date, price, data=df, geom=bar, stat=identity) qplot(date, price, data=df, geom=bar, stat=identity) + scale_x_date(major=2 months) qplot(date, price, data=df, geom=bar, stat=identity) + scale_x_date(major=10 day, format=%d-%m) qplot(date, price, data=df, geom=bar, stat=identity) + scale_x_date(major=5 day, format=%d-%m) Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional Sequential Gaussian Simulation
Steve, You can do this with the package gstat. Look for ?krige of ?predict.gstat Post further question on this topic on the R-sig-geo list. You'll get more response. Cheers, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Friedman, Steven Verzonden: donderdag 7 juni 2007 14:46 Aan: r-help@stat.math.ethz.ch Onderwerp: [R] Conditional Sequential Gaussian Simulation Hello, I'm wondering if there are any packages/functions that can perform conditional sequential gaussian simulation. I'm following an article written by Grunwald, Reddy, Prenger and Fisher 2007. Modeling of the spatial variability of biogeochemical soil properties in a freshwater ecosystem. Ecological Modelling 201: 521 - 535, and would like to explore this methodology. Thanks Steve Steve Friedman, PhD Everglades Division Senior Environmental Scientist, Landscape Ecology South Florida Water Management District 3301 Gun Club Road West Palm Beach, Florida 33406 email: [EMAIL PROTECTED] Office: 561 - 682 - 6312 Fax: 561 - 682 - 5980 If you are not doing what you truly enjoy its your obligation to yourself to change. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update packages with R on Vista: error
Thanks for pointing at this. But you know, the user is writable. R is installing Packages in /Documents/R/win-library which works fine so I find it absolutely naturally that update should work as well. Especially since when I install the packages it gets the latest version, library loads this latest version but update still does want to update this latest package with the package I already installed and fails ... In my opinion the update on windows is simply buggy. I think one should definitely not turn UAC off. ( its a good security feature). Btw. MikTeX 2.6 is able to deal with UAC - I can update my latex packages without any problems even though they are in the Program File directory (and also on-the-fly installation does work) ... Stefan Original Message Subject: Re:[R] update packages with R on Vista: error From: R. Villegas [EMAIL PROTECTED] To: Stefan Grosse [EMAIL PROTECTED] Date: 07.06.2007 23:13 If R is installed within Program Files, one of Vista's security settings may interfere with the -update- process. The setting may be disabled globally by choosing: Windows (Start) menu, Control Panels, User Accounts and Family Safety (green title), User Accounts (green title), and Turn User Account Control on or off (very bottom). You will be prompted for permission to continue; click continue. On the screen you will see a checkbox titled Use User Account Control (UAC) to help protect your computer. Uncheck this and click the OK button to save the changes. Windows Vista will now allow programs, including R, to update files in Program Files. Rod. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] overplots - fixing scientific vs normal notation in output
Moving from S-plus to R I encountered many great features and a much more stable system. Currently, I am left with 2 problems that are handled differently: 1) I did lots of overplots in S-Plus using par(new=T,xaxs='d',yaxs='d') to fix the axes -What is the workaround in R ? 2) In S-Plus I could fix scientific notation or normal notation in output -How can I handle this in R ? I found no fix in the documentation I am using R version 2.4.1 (2006-12-18) on Windows XP SR2 Peter Lercher, M.D., M.P.H., Assoc Prof __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
On 08-Jun-07 08:27:21, Christophe Pallier wrote: Hi, Can you provide examples of data formats that are problematic to read and clean with R ? The only problematic cases I have encountered were cases with multiline and/or varying length records (optional information). Then, it is sometimes a good idea to preprocess the data to present in a tabular format (one record per line). For this purpose, I use awk (e.g. http://www.vectorsite.net/tsawk.html), which is very adept at processing ascii data files (awk is much simpler to learn than perl, spss, sas, ...). I want to join in with an enthusiastic Me too!!. For anything which has to do with basic checking for the kind of messes that people can get data into when they put it on the computer, I think awk is ideal. It is very flexible (far more so than many, even long-time, awk users suspect), very transparent in its programming language (as opposed to say perl), fast, and with light impact on system resources (rare delight in these days, when upgrading your software may require upgrading your hardware). Although it may seem on the surface that awk is two-dimensional in its view of data (line by line, and per field in a line), it has some flexible internal data structures and recursive function capability, which allows a lot more to be done with the data that have been read in. For example, I've used awk to trace ancestry through a genealogy, given a data file where each line includes the identifier of an individual and the identifiers of its male and female parents (where known). And that was for pedigree dogs, where what happens in real life makes Oedipus look trivial. I have never encountered a data file in ascii format that I could not reformat with Awk. With binary formats, it is another story... But then it is a good idea to process the binary file using an instance of the creating software, to produce a ASCII file (say in CSV format). But, again, this is my limited experience; I would like to know if there are situations where using SAS/SPSS is really a better approach. The main thing often useful for data cleaning that awk does not have is any associated graphics. It is -- by design -- a line-by-line text-file processor. While, for instance, you could use awk to accumulate numerical histogram counts, you would have to use something else to display the histogram. And for scatter-plots there's probably not much point in bringing awk into the picture at all (unless a preliminary filtration of mess is needed anyway). That being said, though, there can still be a use to extract data fields from a file for submission to other software. Another kind of area where awk would not have much to offer is where, as a part of your preliminary data inspection, you want to inspect the results of some standard statistical analyses. As a final comment, utilities like awk can be used far more fruitfully on operating systems (the unixoid family) which incorporate at ground level the infrastructure for plumbing together streams of data output from different programs. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 08-Jun-07 Time: 10:43:05 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : Sorting dataframe by different columns
see sort_df() in the reshape package Justin BEM Elève Ingénieur Statisticien Economiste BP 294 Yaoundé. Tél (00237)9597295. - Message d'origine De : Gunther Höning [EMAIL PROTECTED] À : r-help@stat.math.ethz.ch Envoyé le : Vendredi, 8 Juin 2007, 7h58mn 53s Objet : [R] Sorting dataframe by different columns Dear list, I have a very short question, Suggest a dataframe of four columns. df - data.frame(w,x,y,z) I want this ordered the following way: first by :x, decreasing = FALSE and secondly by: z, decreasing =TRUE How can this be done ? Thanks Gunther __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] icc from GLMM?
Dear R users I would like to ask a question regarding to icc (intraclass correlation) or many biologists refer it to as repeatability. It is very useful to get icc for many reasons and it is easy to do so from linear mixed-effects models and many packages like psy, psychometric, aod and irr have functions to calculate icc. icc = between-group variance/(between-group variance + residual variance) *residual variance = within-group variance However, I have yet to find a convincing reference or some sort on how to calculate icc from GLMM. I have found below: icc = between-group variance/(between-group variance + 1) *between-group variance = scaled between-group variance Or variance obtained from random intercept of GLMM icc = between-group variance/(between-group variance + pi^2/3) icc = between-group variance/(between-group variance + pi^2/3*(dispersion parameter)) for binomial GLMM icc = between-group variance/(between-group variance + 1/(p(1-p)) I am a little confused which one to trust and use. Or there are no easy formulas to do this? I am guessing formula would change depending on what distribution you use and what link function as well? I want to calculate icc from GLMM with Poisson with log link function and also binomial with logit function. Could anybody help me please? Many thanks for your help Shinichi -- Shinichi Nakagawa Dept of Animal Plant Sciences University of Sheffield Tel: 0114-222-0113 Fax: 0114-222-0002 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : How to partition sample space
also try active.sample-sample(1:1000,n=700) active.df-thedf[active.sample, ] test.df-thedf[-active.sample, ] Justin BEM Elève Ingénieur Statisticien Economiste BP 294 Yaoundé. Tél (00237)9597295. - Message d'origine De : Matthias Kirchner [EMAIL PROTECTED] À : r-help@stat.math.ethz.ch Envoyé le : Vendredi, 8 Juin 2007, 8h06mn 41s Objet : Re: [R] How to partition sample space Hi, you could use the sample function: sample-sample(1:1000) m.training-m[sample[1:700],] m.test-m[sample[701:1000],] Matthias spime wrote: Hi R-users, I need your help in the following problem. Suppose we have a regression problem containing 25 predictor variables of 1000 individuals. I want to divide the data matrix ( 1000 x 25 ) into two partitions for training (70%) and testing(30%). For this reason, i sample 70% of data into another training matrix and remaining 30% into testing matrix using pseudorandom numbers (for future analysis). I need some efficient solution so that we can generate both matrix with minimal time. Thanks in advance. Sabyasachi -- View this message in context: http://www.nabble.com/How-to-partition-sample-space-tf3888059.html#a11021527 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ___ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help.search and Baysian regression
Hi there, two questions. 1) Is there any possibility to look up the help pages within R for more complex combinations of character strings, for example Bayesian AND regression but not necessarily Bayesian regression? 2) Is there a package/command that does fully Bayesian linear regression (if possible with variable selection)? Thanks, Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dependency 'Design' is not available
Dear R-users, When I installed rattle package with the command: install.packages(rattle, dependencies=TRUE), I got Warning message: Dependency 'Design' is not available Is this warning serious? How to avoid this warning? Thanks _ Dr.Ruixin ZHU Shanghai Center for Bioinformation Technology [EMAIL PROTECTED] [EMAIL PROTECTED] 86-21-13040647832 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting dataframe by different columns
maybe this page could give you some hints: http://www.ats.ucla.edu/STAT/r/faq/sort.htm Regards Knut __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update packages with R on Vista: error
I was pointed at that my message might be considered as impolite. It was not intended so. I was just trying to formulate that there should be some improvement since the solutions offered were either not optimal for me (disabling security features) or where not working (FAQ). I apologize for any possible inconvenience caused by my frustration spiced up possibly with my inabilities. Stefan Original Message Subject: Re:[R] update packages with R on Vista: error From: Stefan Grosse [EMAIL PROTECTED] To: R. Villegas [EMAIL PROTECTED] Date: 08.06.2007 11:07 Thanks for pointing at this. But you know, the user is writable. R is installing Packages in /Documents/R/win-library which works fine so I find it absolutely naturally that update should work as well. Especially since when I install the packages it gets the latest version, library loads this latest version but update still does want to update this latest package with the package I already installed and fails ... In my opinion the update on windows is simply buggy. I think one should definitely not turn UAC off. ( its a good security feature). Btw. MikTeX 2.6 is able to deal with UAC - I can update my latex packages without any problems even though they are in the Program File directory (and also on-the-fly installation does work) ... Stefan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] world map matrix
Hi, Is it possible to make a world map matrix where land values are set to 0 and sea values to 1? Cheers, Antonio [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dependency 'Design' is not available
Ruixin ZHU wrote: Dear R-users, When I installed rattle package with the command: install.packages(rattle, dependencies=TRUE), I got Warning message: Dependency 'Design' is not available Version of R? OS? Please do read the posting guide! If R-2.5.0 under Windows: Design did not pass the checks under Windows and is not available for download. In this case please contact the package maintainer and convince him to fix the package. Uwe Ligges Is this warning serious? How to avoid this warning? Thanks _ Dr.Ruixin ZHU Shanghai Center for Bioinformation Technology [EMAIL PROTECTED] [EMAIL PROTECTED] 86-21-13040647832 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] match rows of data frame
try: FiveDaysLater - A[match(rownames(B),rownames(A))+5,] On 6/7/07, Alfonso Sammassimo [EMAIL PROTECTED] wrote: Hi R-experts, I have a data frame (A) , and a subset (B) of this data frame. I am trying to create a new data frame which gives me all the rows of B, plus the 5th next row(occuring in A). I have used the below code, but it gives me all 5 rows after the matching row. I only want the 5th. FiveDaysLater - A[c(sapply(match(rownames(B),rownames(A)), seq, length=6))),] Any guidance much appreciated, Thankyou. Alfonso Sammassimo Melbourne, Australia. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] world map matrix
On 6/8/2007 6:52 AM, Antonio Rodríguez wrote: Hi, Is it possible to make a world map matrix where land values are set to 0 and sea values to 1? It's not hard to produce a bitmap of a world map with the maps package, and then some image manipulation functions could convert it to 0's and 1's. I don't know if there's a more direct way. One minor problem you may encounter is that the default world map display isn't really rectangular: e.g. bits of Siberia that cross 180 degrees east are still displayed attached to Siberia rather than wrapping around and being displayed on the other side of the map. The display also doesn't go all the way to the south pole. I produced a couple of rectangular bitmaps covering 90 south to 90 north and 180 west to 180 east; they're included in the rgl package (and used to display globes in the persp3d example). Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
On 6/7/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of the yellow Springer ones) that says, more briefly, the same thing. The R Data Import/Export page recommends examples using SAS, Perl, Python, and Java. It takes a bit of courage to say that ( when you go to a corporate software web site, you'll never see a page saying This is the type of problem that our product is not the best at, here's what we suggest instead ). I'd like to provide a few more suggestions, especially for volunteers who are willing to evaluate new candidates. SAS is fine if you're not paying for the license out of your own pocket. But maybe one reason you're using R is you don't have thousands of spare dollars. Using Java for data cleaning is an exercise in sado-masochism, Java has a learning curve (almost) as difficult as C++. There are different types of data transformation, and for some data preparation problems an all-purpose programming language is a good choice ( i.e. Perl , or maybe Python/Ruby ). Perl, for example, has excellent regular expression facilities. However, for some types of complex demanding data preparation problems, an all-purpose programming language is a poor choice. For example: cleaning up and preparing clinical lab data and adverse event data - you could do it in Perl, but it would take way, way too much time. A specialized programming language is needed. And since data transformation is quite different from data query, SQL is not the ideal solution either. There are only three statistical programming languages that are well-known, all dating from the 1970s: SPSS, SAS, and S. SAS is more popular than S for data cleaning. If you're an R user with difficult data preparation problems, frankly you are out of luck, because the products I'm about to mention are new, unknown, and therefore regarded as immature. And while the founders of these products would be very happy if you kicked the tires, most people don't like to look at brand new products. Most innovators and inventers don't realize this, I've learned it the hard way. But if you are a volunteer who likes to help out by evaluating, comparing, and reporting upon new candidates, well you could certainly help out R users and the developers of the products by kicking the tires of these products. And there is a huge need for such volunteers. 1. DAP This is an open source implementation of SAS. The founder: Susan Bassein Find it at: directory.fsf.org/math/stats (GNU GPL) 2. PSPP This is an open source implementation of SPSS. The relatively early version number might not give a good idea of how mature the data transformation features are, it reflects the fact that he has only started doing the statistical tests. The founder: Ben Pfaff, either a grad student or professor at Stanford CS dept. Also at : directory.fsf.org/math/stats (GNU GPL) 3. Vilno This uses a programming language similar to SPSS and SAS, but quite unlike S. Essentially, it's a substitute for the SAS datastep, and also transposes data and calculates averages and such. (No t-tests or regressions in this version). I created this, during the years 2001-2006 mainly. It's version 0.85, and has a fairly low bug rate, in my opinion. The tarball includes about 100 or so test cases used for debugging - for logical calculation errors, but not for extremely high volumes of data. The maintenance of Vilno has slowed down, because I am currently (desparately) looking for employment. But once I've found new employment and living quarters and settled in, I will continue to enhance Vilno in my spare time. The founder: that would be me, Robert Wilkins Find it at: code.google.com/p/vilno ( GNU GPL ) ( In particular, the tarball at code.google.com/p/vilno/downloads/list , since I have yet to figure out how to use Subversion ). 4. Who knows? It was not easy to find out about the existence of DAP and PSPP. So who knows what else is out there. However, I think you'll find a lot more statistics software ( regression , etc ) out there, and not so much data transformation software. Not many people work on data preparation software. In fact, the category is so obscure that there isn't one agreed term: data cleaning , data munging , data crunching , or just getting the data ready for analysis. Thanks for bringing up this topic. I think there is definitely a place for such languages, which I would regard as data-filtering languages, but I also think that trying to reproduce the facilities in SAS or SPSS for data analysis is redundant. Other responses in this thread have mentioned 'little language' filters like awk, which is fine for those who were raised in the Bell Labs tradition of programming
[R] data mining/text mining?
Dear R-user, Could anybody tell me of the key difference between data mining and text mining? Please make a list for packages about data/text mining. And give me an example of text mining with R (any relating materials will be highly appreciated), because a vignette written by Ingo Feinerer seems too concise for me. Thanks _ Dr.Ruixin ZHU Shanghai Center for Bioinformation Technology [EMAIL PROTECTED] [EMAIL PROTECTED] 86-21-13040647832 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logical 'or' on list of vectors
Tim Bergsma said the following on 6/8/2007 5:57 AM: Suppose I have a list of logicals, such as returned by lapply: Theoph$Dose[1] - NA Theoph$Time[2] - NA Theoph$conc[3] - NA lapply(Theoph,is.na) Is there a direct way to execute logical or across all vectors? The following gives the desired result, but seems unnecessarily complex. as.logical(apply(do.call(rbind,lapply(Theoph,is.na)),2,sum)) Regards, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. How about: apply(sapply(Theoph, is.na), 1, any) HTH, --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rlm results on trellis plot
Alan S Barnett wrote: How do I add to a trellis plot the best fit line from a robust fit? I can use panel.lm to add a least squares fit, but there is no panel.rlm function. How about using panel.abline() instead of panel.lmline()? fit1 - coef(lm(stack.loss ~ Air.Flow, data = stackloss)) fit2 - coef(rlm(stack.loss ~ Air.Flow, data = stackloss)) xyplot(stack.loss ~ Air.Flow, data=stackloss, panel = function(x, y, ...){ panel.xyplot(x, y, ...) panel.abline(fit1, type=l, col=blue) panel.abline(fit2, type=l, col=red) }, aspect=1) -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logical 'or' on list of vectors
try the following: as.logical(rowSums(is.na(Theoph))) ## or !complete.cases(Theoph) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Tim Bergsma [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, June 08, 2007 2:57 PM Subject: [R] logical 'or' on list of vectors Suppose I have a list of logicals, such as returned by lapply: Theoph$Dose[1] - NA Theoph$Time[2] - NA Theoph$conc[3] - NA lapply(Theoph,is.na) Is there a direct way to execute logical or across all vectors? The following gives the desired result, but seems unnecessarily complex. as.logical(apply(do.call(rbind,lapply(Theoph,is.na)),2,sum)) Regards, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formating the data
A Ezhil wrote: Hi All, I have a vector of length 48, something like: 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I would like to print (reformat) this vector as: 00110011 by simply removing the spaces between them. I have been trying with many option but not able to do this task. I would greatly appreciate your suggestion on fixing this simple task. X - rbinom(n=48, size=1, prob=.3) paste(X, collapse=) [1] 1001001000100011010010010111 print(paste(X, collapse=), quote=FALSE) [1] 1001001000100011010010010111 Thanks in advance. Kind regards, Ezhil Bored stiff? Loosen up... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formating the data
Hi All, I have a vector of length 48, something like: 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I would like to print (reformat) this vector as: 00110011 by simply removing the spaces between them. I have been trying with many option but not able to do this task. I would greatly appreciate your suggestion on fixing this simple task. Thanks in advance. Kind regards, Ezhil Bored stiff? Loosen up... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logical 'or' on list of vectors
Tim Bergsma wrote: Suppose I have a list of logicals, such as returned by lapply: Theoph$Dose[1] - NA Theoph$Time[2] - NA Theoph$conc[3] - NA lapply(Theoph,is.na) Is there a direct way to execute logical or across all vectors? The following gives the desired result, but seems unnecessarily complex. as.logical(apply(do.call(rbind,lapply(Theoph,is.na)),2,sum)) Is this what you want? apply(is.na(Theoph), 1, any) Regards, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logical 'or' on list of vectors
Suppose I have a list of logicals, such as returned by lapply: Theoph$Dose[1] - NA Theoph$Time[2] - NA Theoph$conc[3] - NA lapply(Theoph,is.na) Is there a direct way to execute logical or across all vectors? The following gives the desired result, but seems unnecessarily complex. as.logical(apply(do.call(rbind,lapply(Theoph,is.na)),2,sum)) Regards, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rlm results on trellis plot
How do I add to a trellis plot the best fit line from a robust fit? I can use panel.lm to add a least squares fit, but there is no panel.rlm function. -- Alan S Barnett [EMAIL PROTECTED] NIMH/CBDB __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logical 'or' on list of vectors
a little simplier: apply(do.call(rbind,lapply(Theoph,is.na)),2,any) or !complete.cases(Theoph) On 6/8/07, Tim Bergsma [EMAIL PROTECTED] wrote: Suppose I have a list of logicals, such as returned by lapply: Theoph$Dose[1] - NA Theoph$Time[2] - NA Theoph$conc[3] - NA lapply(Theoph,is.na) Is there a direct way to execute logical or across all vectors? The following gives the desired result, but seems unnecessarily complex. as.logical(apply(do.call(rbind,lapply(Theoph,is.na)),2,sum)) Regards, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R is not a validated software package..
Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni -- dr. Giovanni Parrinello External Lecturer Medical Statistics Unit Department of Biomedical Sciences Viale Europa, 11 - 25123 Brescia Italy Tel: +390303717528 Fax: +390303717488 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formating the data
On Fri, 2007-06-08 at 06:13 -0700, A Ezhil wrote: Hi All, I have a vector of length 48, something like: 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I would like to print (reformat) this vector as: 00110011 by simply removing the spaces between them. I have been trying with many option but not able to do this task. I would greatly appreciate your suggestion on fixing this simple task. Thanks in advance. dat - scan() 1: 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 28: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 49: Read 48 items dat [1] 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1[39] 1 1 1 1 1 1 1 1 1 1 print(dat, print.gap = 0) [1]00110011 Is that what you want? It is just altering how the data are printed. You still get the [1] at the start though. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Using odesolve to produce non-negative solutions
On the 'lsoda' help page, I did not see any option to force some or all parameters to be nonnegative. Have you considered replacing the parameters that must be nonnegative with their logarithms? This effective moves the 0 lower limit to (-Inf) and seems to have worked well for me in the past. Often, it can even make the log likelihood or sum of squares surface more elliptical, which means that the standard normal approximation for the sampling distribution of parameter estimates will likely be more accurate. Hope this helps. Spencer Graves p.s. Your example seems not to be self contained. If I could have easily copied it from your email and run it myself, I might have been able to offer more useful suggestions. Jeremy Goldhaber-Fiebert wrote: Hello, I am using odesolve to simulate a group of people moving through time and transmitting infections to one another. In Matlab, there is a NonNegative option which tells the Matlab solver to keep the vector elements of the ODE solution non-negative at all times. What is the right way to do this in R? Thanks, Jeremy P.S., Below is a simplified version of the code I use to try to do this, but I am not sure that it is theoretically right dynmodel - function(t,y,p) { ## Initialize parameter values birth - p$mybirth(t) death - p$mydeath(t) recover - p$myrecover beta - p$mybeta vaxeff - p$myvaxeff vaccinated - p$myvax(t) vax - vaxeff*vaccinated/100 ## If the state currently has negative quantities (shouldn't have), then reset to reasonable values for computing meaningful derivatives for (i in 1:length(y)) { if (y[i]0) { y[i] - 0 } } S - y[1] I - y[2] R - y[3] N - y[4] shat - (birth*(1-vax)) - (death*S) - (beta*S*I/N) ihat - (beta*S*I/N) - (death*I) - (recover*I) rhat - (birth*(vax)) + (recover*I) - (death*R) ## Do we overshoot into negative space, if so shrink derivative to bring state to 0 ## then rescale the components that take the derivative negative if (shat+S0) { shat_old - shat shat - -1*S scaled_transmission - (shat/shat_old)*(beta*S*I/N) ihat - scaled_transmission - (death*I) - (recover*I) } if (ihat+I0) { ihat_old - ihat ihat - -1*I scaled_recovery - (ihat/ihat_old)*(recover*I) rhat - scaled_recovery +(birth*(vax)) - (death*R) } if (rhat+R0) { rhat - -1*R } nhat - shat + ihat + rhat if (nhat+N0) { nhat - -1*N } ## return derivatives list(c(shat,ihat,rhat,nhat),c(0)) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to time problem
--- Jason Barnhart [EMAIL PROTECTED] wrote: Hi John, a) The NA appears because '30/02/1995' is not a valid date. strptime('30/02/1995' , %d/%m/%Y) [1] NA I knew we should never have moved to the Gregorian Calender! Thanks. I accidently made up the date but this means that I have some invalid dates in the file. Not a problem now I know what's happening. And our contract says someone else gets to fix them :) b) dates which has the following classes uses sort.POSIXlt which in turns sets na.last to NA. ?order details how NA's are handled in ordering data via na.last. class(dates) [1] POSIXt POSIXlt methods(sort) [1] sort.default sort.POSIXlt sort.POSIXlt function (x, decreasing = FALSE, na.last = NA, ...) x[order(as.POSIXct(x), na.last = na.last, decreasing = decreasing)] environment: namespace:base After resetting the Feb. date the code works. HTH, -jason So it does. I had not thought to look at the sort.POSIXlt function. I don't quite understand what na.last is doing and don't seem to see the documentation. Is it sorting the NA's to the last place(s) in the vector and then dropping them? Thanks again - Original Message - From: John Kane [EMAIL PROTECTED] To: R R-help r-help@stat.math.ethz.ch Sent: Thursday, June 07, 2007 2:17 PM Subject: [R] character to time problem I am trying to clean up some dates and I am clearly doing something wrong. I have laid out an example that seems to show what is happening with the real data. The coding is lousy but it looks like it should have worked. Can anyone suggest a) why I am getting that NA appearing after the strptime() command and b) why the NA is disappearing in the sort()? It happens with na.rm=TRUE and na.rm=FALSE - aa - data.frame( c(12/05/2001, , 30/02/1995, NA, 14/02/2007, M ) ) names(aa) - times aa[is.na(aa)] - M aa[aa== ] - M bb - unlist(subset(aa, aa[,1] !=M)) dates - strptime(bb, %d/%m/%Y) dates sort(dates) -- Session Info R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252; LC_CTYPE=English_Canada.1252; LC_MONETARY=English_Canada.1252; LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: gdata Hmisc 2.3.1 3.3-2 (Yes I know I'm out of date but I don't like upgrading just as I am finishing a project) Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pointwise confidence bands or interval values for a non parametric sm.regression
Dear all, Is there a way to plot / calculate pointwise confidence bands or interval values for a non parametric regression like sm.regression? Thank you in advance. Regards, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
People, don't get angry at the pharma statistician, he is just trying to abide by an FDA requirement that is designed to insure that test perform reliably the same. There is no point in getting into which product is better. As far as the FDA rules are concerned a validated system beats a better system any day of the week. Here is your polite answer. You can develop and try your software in R. Should they need to use those results in a report that will matter to the FDA, then you can work together with him to set up a validated environment for S-plus. You then have to commit to port your code to S-plus. As I assume that you do not work in a regulated environment, you probably wouldn't have access to a validated SAS environment anyways. It is not usually enough to install a piece of software, you have to validate every step of the installation. Since AFAIK the FDA uses S-plus, it would be to your pharma person's advantage to speed-up submissions if they also had a validated S-plus environment. http://www.msmiami.com/custom/downloads/S-PLUSValidationdatasheet_Final. pdf -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu Sent: Friday, June 08, 2007 9:24 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. I like to know the answer as well. To be honest, I really have hard time to understand the mentality of clinical trial guys and rather believe it is something related to job security. On 6/8/07, Giovanni Parrinello [EMAIL PROTECTED] wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni -- dr. Giovanni Parrinello External Lecturer Medical Statistics Unit Department of Biomedical Sciences Viale Europa, 11 - 25123 Brescia Italy Tel: +390303717528 Fax: +390303717488 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logical 'or' on list of vectors
Thanks all for the many excellent suggestions! !complete.cases(Theoph) is probably the most succinct form for the current problem, while the examples with 'any' seem readily adaptable to similar situations. Kind regards, Tim. Dimitris Rizopoulos wrote: try the following: as.logical(rowSums(is.na(Theoph))) ## or !complete.cases(Theoph) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Tim Bergsma [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, June 08, 2007 2:57 PM Subject: [R] logical 'or' on list of vectors Suppose I have a list of logicals, such as returned by lapply: Theoph$Dose[1] - NA Theoph$Time[2] - NA Theoph$conc[3] - NA lapply(Theoph,is.na) Is there a direct way to execute logical or across all vectors? The following gives the desired result, but seems unnecessarily complex. as.logical(apply(do.call(rbind,lapply(Theoph,is.na)),2,sum)) Regards, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ubu edgy + latest CRAN R + Rmpi = no go
On 7 June 2007 at 17:22, Tim Keitt wrote: | I'm just curious if anyone else has had problems with this | configuration. I added the CRAN repository to apt and installed 2.5.0 | with apt-get. I then did an install.packages(Rmpi) on cluster nodes. | Rmpi loads and lamhosts() shows the nodes, but mpi.spawn.Rslaves() | fails (something to do with temp files?). Rmpi works fine with the I have had similar issues at work. If you fix the lam packages at version 7.1.1, it works. It does not seem to work with 7.1.2 in the current Ubuntu, not does it work with 7.1.4 (current upstream version). As other MPI tools seem to work, I would put the error on Rmpi, but I have not had time to pin this down. For what it's worth, a few of us are trying to revive the OpenMPI packages in Debian, and I have started to on a port of Rmpi to ROpenMPI. No ETA for that. | Edgy-native version of R (2.3.x) and installing Edgy's r-cran-rmpi | with apt. (But I need some other packages that only work in 2.4+!) | Could this be a problem with the latest Ubu debs on CRAN? The Rmpi R itself is just fine on Ubuntu, thank you. Dirk | author says his R 2.5 setup works fine. CC me please as I'm not | subscribed. | | THK | | -- | Timothy H. Keitt, University of Texas at Austin | Contact info and schedule at http://www.keittlab.org/tkeitt/ | Reprints at http://www.keittlab.org/tkeitt/papers/ | ODF attachment? See http://www.openoffice.org/ | | __ | R-help@stat.math.ethz.ch mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
I had mentioned exactly the same thing to others and the feedback I got is - 'when you have a hammer, everything will look like a nail' ^_^. On 6/7/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: Robert Wilkins wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of the yellow Springer ones) that says, more briefly, the same thing. The R Data Import/Export page recommends examples using SAS, Perl, Python, and Java. It takes a bit of courage to say that ( when you go to a corporate software web site, you'll never see a page saying This is the type of problem that our product is not the best at, here's what we suggest instead ). I'd like to provide a few more suggestions, especially for volunteers who are willing to evaluate new candidates. SAS is fine if you're not paying for the license out of your own pocket. But maybe one reason you're using R is you don't have thousands of spare dollars. Using Java for data cleaning is an exercise in sado-masochism, Java has a learning curve (almost) as difficult as C++. There are different types of data transformation, and for some data preparation problems an all-purpose programming language is a good choice ( i.e. Perl , or maybe Python/Ruby ). Perl, for example, has excellent regular expression facilities. However, for some types of complex demanding data preparation problems, an all-purpose programming language is a poor choice. For example: cleaning up and preparing clinical lab data and adverse event data - you could do it in Perl, but it would take way, way too much time. A specialized programming language is needed. And since data transformation is quite different from data query, SQL is not the ideal solution either. We deal with exactly those kinds of data solely using R. R is exceptionally powerful for data manipulation, just a bit hard to learn. Many examples are at http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf Frank There are only three statistical programming languages that are well-known, all dating from the 1970s: SPSS, SAS, and S. SAS is more popular than S for data cleaning. If you're an R user with difficult data preparation problems, frankly you are out of luck, because the products I'm about to mention are new, unknown, and therefore regarded as immature. And while the founders of these products would be very happy if you kicked the tires, most people don't like to look at brand new products. Most innovators and inventers don't realize this, I've learned it the hard way. But if you are a volunteer who likes to help out by evaluating, comparing, and reporting upon new candidates, well you could certainly help out R users and the developers of the products by kicking the tires of these products. And there is a huge need for such volunteers. 1. DAP This is an open source implementation of SAS. The founder: Susan Bassein Find it at: directory.fsf.org/math/stats (GNU GPL) 2. PSPP This is an open source implementation of SPSS. The relatively early version number might not give a good idea of how mature the data transformation features are, it reflects the fact that he has only started doing the statistical tests. The founder: Ben Pfaff, either a grad student or professor at Stanford CS dept. Also at : directory.fsf.org/math/stats (GNU GPL) 3. Vilno This uses a programming language similar to SPSS and SAS, but quite unlike S. Essentially, it's a substitute for the SAS datastep, and also transposes data and calculates averages and such. (No t-tests or regressions in this version). I created this, during the years 2001-2006 mainly. It's version 0.85, and has a fairly low bug rate, in my opinion. The tarball includes about 100 or so test cases used for debugging - for logical calculation errors, but not for extremely high volumes of data. The maintenance of Vilno has slowed down, because I am currently (desparately) looking for employment. But once I've found new employment and living quarters and settled in, I will continue to enhance Vilno in my spare time. The founder: that would be me, Robert Wilkins Find it at: code.google.com/p/vilno ( GNU GPL ) ( In particular, the tarball at code.google.com/p/vilno/downloads/list , since I have yet to figure out how to use Subversion ). 4. Who knows? It was not easy to find out about the existence of DAP and PSPP. So who knows what else is out there. However, I think you'll find a lot more statistics software ( regression , etc ) out there, and not so much data transformation software. Not many people work on data preparation software. In fact, the category is so obscure that there isn't one agreed term: data
Re: [R] Tools For Preparing Data For Analysis
Is there an example available of this sort of problematic data that requires this kind of data screening and filtering? For many of us, this issue would be nice to learn about, and deal with within R. If a package could be created, that would be optimal for some of us. I would like to learn a tad more, if it were not too much effort for someone else to point me in the right direction? Cheers, Hank On Jun 8, 2007, at 8:47 AM, Douglas Bates wrote: On 6/7/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of the yellow Springer ones) that says, more briefly, the same thing. The R Data Import/Export page recommends examples using SAS, Perl, Python, and Java. It takes a bit of courage to say that ( when you go to a corporate software web site, you'll never see a page saying This is the type of problem that our product is not the best at, here's what we suggest instead ). I'd like to provide a few more suggestions, especially for volunteers who are willing to evaluate new candidates. SAS is fine if you're not paying for the license out of your own pocket. But maybe one reason you're using R is you don't have thousands of spare dollars. Using Java for data cleaning is an exercise in sado-masochism, Java has a learning curve (almost) as difficult as C++. There are different types of data transformation, and for some data preparation problems an all-purpose programming language is a good choice ( i.e. Perl , or maybe Python/Ruby ). Perl, for example, has excellent regular expression facilities. However, for some types of complex demanding data preparation problems, an all-purpose programming language is a poor choice. For example: cleaning up and preparing clinical lab data and adverse event data - you could do it in Perl, but it would take way, way too much time. A specialized programming language is needed. And since data transformation is quite different from data query, SQL is not the ideal solution either. There are only three statistical programming languages that are well-known, all dating from the 1970s: SPSS, SAS, and S. SAS is more popular than S for data cleaning. If you're an R user with difficult data preparation problems, frankly you are out of luck, because the products I'm about to mention are new, unknown, and therefore regarded as immature. And while the founders of these products would be very happy if you kicked the tires, most people don't like to look at brand new products. Most innovators and inventers don't realize this, I've learned it the hard way. But if you are a volunteer who likes to help out by evaluating, comparing, and reporting upon new candidates, well you could certainly help out R users and the developers of the products by kicking the tires of these products. And there is a huge need for such volunteers. 1. DAP This is an open source implementation of SAS. The founder: Susan Bassein Find it at: directory.fsf.org/math/stats (GNU GPL) 2. PSPP This is an open source implementation of SPSS. The relatively early version number might not give a good idea of how mature the data transformation features are, it reflects the fact that he has only started doing the statistical tests. The founder: Ben Pfaff, either a grad student or professor at Stanford CS dept. Also at : directory.fsf.org/math/stats (GNU GPL) 3. Vilno This uses a programming language similar to SPSS and SAS, but quite unlike S. Essentially, it's a substitute for the SAS datastep, and also transposes data and calculates averages and such. (No t-tests or regressions in this version). I created this, during the years 2001-2006 mainly. It's version 0.85, and has a fairly low bug rate, in my opinion. The tarball includes about 100 or so test cases used for debugging - for logical calculation errors, but not for extremely high volumes of data. The maintenance of Vilno has slowed down, because I am currently (desparately) looking for employment. But once I've found new employment and living quarters and settled in, I will continue to enhance Vilno in my spare time. The founder: that would be me, Robert Wilkins Find it at: code.google.com/p/vilno ( GNU GPL ) ( In particular, the tarball at code.google.com/p/vilno/downloads/ list , since I have yet to figure out how to use Subversion ). 4. Who knows? It was not easy to find out about the existence of DAP and PSPP. So who knows what else is out there. However, I think you'll find a lot more statistics software ( regression , etc ) out there, and not so much data transformation software. Not many people work on data preparation software. In fact, the category is so obscure that there isn't one agreed term: data cleaning ,
Re: [R] R is not a validated software package..
I like to know the answer as well. To be honest, I really have hard time to understand the mentality of clinical trial guys and rather believe it is something related to job security. On 6/8/07, Giovanni Parrinello [EMAIL PROTECTED] wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni -- dr. Giovanni Parrinello External Lecturer Medical Statistics Unit Department of Biomedical Sciences Viale Europa, 11 - 25123 Brescia Italy Tel: +390303717528 Fax: +390303717488 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to time problem
Looks much better. I seldom use dates for much and didn't think to look at the sort.POSIXlt function. If I understand this correctly the sort.POSIXlt with na.last = FALSE is dropping all the NAs. Very nice. --- Gabor Grothendieck [EMAIL PROTECTED] wrote: Perhaps you want one of these: sort(as.Date(aa$times, %d/%m/%Y)) [1] 1995-03-02 2001-05-12 2007-02-14 sort(as.Date(aa$times, %d/%m/%Y), na.last = TRUE) [1] 1995-03-02 2001-05-12 2007-02-14 NA NA [6] NA On 6/7/07, John Kane [EMAIL PROTECTED] wrote: I am trying to clean up some dates and I am clearly doing something wrong. I have laid out an example that seems to show what is happening with the real data. The coding is lousy but it looks like it should have worked. Can anyone suggest a) why I am getting that NA appearing after the strptime() command and b) why the NA is disappearing in the sort()? It happens with na.rm=TRUE and na.rm=FALSE - aa - data.frame( c(12/05/2001, , 30/02/1995, NA, 14/02/2007, M ) ) names(aa) - times aa[is.na(aa)] - M aa[aa== ] - M bb - unlist(subset(aa, aa[,1] !=M)) dates - strptime(bb, %d/%m/%Y) dates sort(dates) -- Session Info R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252; LC_CTYPE=English_Canada.1252; LC_MONETARY=English_Canada.1252; LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: gdata Hmisc 2.3.1 3.3-2 (Yes I know I'm out of date but I don't like upgrading just as I am finishing a project) Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rlm results on trellis plot
On 6/7/07, Alan S Barnett [EMAIL PROTECTED] wrote: How do I add to a trellis plot the best fit line from a robust fit? I can use panel.lm to add a least squares fit, but there is no panel.rlm function. It's not trellis, but it's really easy to do this with ggplot2: install.packages(ggplot2, dep=T) library(ggplot2) p - qplot(x, y, data=diamonds) p + geom_smooth(method=lm) p + geom_smooth(method=rlm) p + geom_smooth(method=lm, formula=y ~ poly(x,3)) see http://had.co.nz/ggplot2/stat_smooth.html for more examples. Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting dataframe by different columns
On the R wiki site there is a general-purpose function (sort.data.frame) that allows you to do this: sort(df, by=~ x-z) See: http://wiki.r-project.org/rwiki/doku.php?id=tips:data-frames:sort Regards, Kevin On 6/8/07, Gunther Höning [EMAIL PROTECTED] wrote: Dear list, I have a very short question, Suggest a dataframe of four columns. df - data.frame(w,x,y,z) I want this ordered the following way: first by :x, decreasing = FALSE and secondly by: z, decreasing =TRUE How can this be done ? Thanks Gunther __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overplots - fixing scientific vs normal notation in output
--- Peter Lercher [EMAIL PROTECTED] wrote: Moving from S-plus to R I encountered many great features and a much more stable system. Currently, I am left with 2 problems that are handled differently: 1) I did lots of overplots in S-Plus using par(new=T,xaxs='d',yaxs='d') to fix the axes -What is the workaround in R ? What does S=Plus do here? 2) In S-Plus I could fix scientific notation or normal notation in output -How can I handle this in R ? I found no fix in the documentation ?format() maybe? I am using R version 2.4.1 (2006-12-18) on Windows XP SR2 Peter Lercher, M.D., M.P.H., Assoc Prof __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni Search the archives and you'll find a LOT of responses. Briefly, in my view there are no requirements, just some pharma companies that think there are. FDA is required to accepted all submissions, and they get some where only Excel was used, or Minitab, and lots more. There is a session on this at the upcoming R International Users Meeting in Iowa in August. The session will include dicussions of federal regulation compliance for R, for those users who feel that such compliance is actually needed. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix and data frame
hello, I have just a question before the week end it's that I don't know how to do to paste matrixs and these matrix they have one same column and I'd like to paste its by this column and I wanna paste its not below but just at right side hand thanks good week end _ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make a table of a desired dimension
Hi ComRades, I want to make a matrix of frequencies from vectors of a continuous variable spanning different values. For example this code x-c(runif(100,10,40),runif(100,43,55)) y-c(runif(100,7,35),runif(100,37,50)) z-c(runif(100,10,42),runif(100,45,52)) a-table(ceiling(x)) b-table(ceiling(y)) c-table(ceiling(z)) a b c will give me three tables that start and end at different integer values, and besides, they have 'holes' in between different integer values. Is it possible to use 'table' to make these three tables have the same dimensions, filling in the absent labels with zeroes? In the example above, the desired tables should all start at 8 and tables 'a' and 'c' should put a zero at labels '8' to '10', should all put zeros in the frequencies of the labels corresponding to the holes, and should all end at label '55'. The final purpose is the make a matrix and use 'matplot' to plot all the frequencies in one plot, such as #code valid only when 'a', 'b', and 'c' have the proper dimension p-mat.or.vec(48,4) p[,1]-8:55 p[,2]-c(matrix(a)[1:48]) p[,3]-c(matrix(b)[1:48]) p[,4]-c(matrix(c)[1:48]) matplot(p) I read the help about 'table' but I couldn't figure out if dnn, deparse.level, or the other arguments could serve my purpose. Thanks for your help Rubén __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use R in a pipeline as a filter
On 7 June 2007 at 14:27, [EMAIL PROTECTED] wrote: | how can I use R in a pipline like this | | $ ./generate-data | R --script-file=Script.R | ./further-analyse-data result.dat The 'r' in our 'littler' package can do that. One example we show on the littler webpage is $ ls -l /boot | awk '!/^total/ {print $5}' | \ r -e 'fsizes - as.integer(readLines()); print(summary(fsizes)); stem(fsizes)' We use R's readLines to read from stdin, and you can of course also have r 'in the middle' if you take care of the output generated -- which our example doesn't do as it prints straight to screen. Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to time problem
The code in my post uses Date class, not POSIX. sort.POSIXlt is never invoked. Suggest you read the help desk article in R News 4/1 for more. On 6/8/07, John Kane [EMAIL PROTECTED] wrote: Looks much better. I seldom use dates for much and didn't think to look at the sort.POSIXlt function. If I understand this correctly the sort.POSIXlt with na.last = FALSE is dropping all the NAs. Very nice. --- Gabor Grothendieck [EMAIL PROTECTED] wrote: Perhaps you want one of these: sort(as.Date(aa$times, %d/%m/%Y)) [1] 1995-03-02 2001-05-12 2007-02-14 sort(as.Date(aa$times, %d/%m/%Y), na.last = TRUE) [1] 1995-03-02 2001-05-12 2007-02-14 NA NA [6] NA On 6/7/07, John Kane [EMAIL PROTECTED] wrote: I am trying to clean up some dates and I am clearly doing something wrong. I have laid out an example that seems to show what is happening with the real data. The coding is lousy but it looks like it should have worked. Can anyone suggest a) why I am getting that NA appearing after the strptime() command and b) why the NA is disappearing in the sort()? It happens with na.rm=TRUE and na.rm=FALSE - aa - data.frame( c(12/05/2001, , 30/02/1995, NA, 14/02/2007, M ) ) names(aa) - times aa[is.na(aa)] - M aa[aa== ] - M bb - unlist(subset(aa, aa[,1] !=M)) dates - strptime(bb, %d/%m/%Y) dates sort(dates) -- Session Info R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252; LC_CTYPE=English_Canada.1252; LC_MONETARY=English_Canada.1252; LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: gdata Hmisc 2.3.1 3.3-2 (Yes I know I'm out of date but I don't like upgrading just as I am finishing a project) Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluating variables in the context of a data frame
On 6/7/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: f - function(x, dat) evalq(x, dat) f(o, D) Error in eval(expr, envir, enclos) : object o not found g - function(x, dat) eval(x, dat) g(o, D) Error in eval(x, dat) : object o not found What am I doing wrong? This seems to be what the helpfiles say you do to evaluate arguments in the context of a passed-in data frame... When you call f(o, D), the argument 'o' is evaluated in the current environment ('context' in R means something different). Because of lazy evaluation, it is not evaluated until evalq is called, but it evaluated as if it was evaluated greedily. g(quote(o), D) will work. Thanks. After a bit more experimentation I figured out that this does what I want: h - function(x, d) eval(substitute(x), d, parent.frame()) but I don't understand why the substitute() helps, or indeed why it has any effect at all... zw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix and data frame
I'm not at all certain I understand your question, but try ?cbind Sarah On 6/8/07, elyakhlifi mustapha [EMAIL PROTECTED] wrote: hello, I have just a question before the week end it's that I don't know how to do to paste matrixs and these matrix they have one same column and I'd like to paste its by this column and I wanna paste its not below but just at right side hand thanks good week end -- Sarah Goslee http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data mining/text mining?
Dear Ruixin: Among others, text mining is dealing with non-structural data while data mining mainly focuses on structural one. Many algorithms can be shared b/w them; however, some necessary data preprocessing is required for text mining. There are a lot of online-resource there. As to packages used for text mining in R, esp. for preprocessing, please check the following link: http://wwwpeople.unil.ch/jean-pierre.mueller/ I used that package very long time ago and am not sure if they are updated for this current version of R; otherwise, you might need to go back the old version like R1.1. If you want to do text mining for chinese text (I guess :), there is additional work (i.e. word splitting) needed. I remember there is some researcher from Taiwan who does pretty good work and you can google that. I cannot remember the details. HTH, Weiwei On 6/8/07, Ruixin ZHU [EMAIL PROTECTED] wrote: Dear R-user, Could anybody tell me of the key difference between data mining and text mining? Please make a list for packages about data/text mining. And give me an example of text mining with R (any relating materials will be highly appreciated), because a vignette written by Ingo Feinerer seems too concise for me. Thanks _ Dr.Ruixin ZHU Shanghai Center for Bioinformation Technology [EMAIL PROTECTED] [EMAIL PROTECTED] 86-21-13040647832 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
agree with Frank. as far as I've known, FDA doesn't encourage or discourage the usage of software. On 6/8/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni Search the archives and you'll find a LOT of responses. Briefly, in my view there are no requirements, just some pharma companies that think there are. FDA is required to accepted all submissions, and they get some where only Excel was used, or Minitab, and lots more. There is a session on this at the upcoming R International Users Meeting in Iowa in August. The session will include dicussions of federal regulation compliance for R, for those users who feel that such compliance is actually needed. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ubu edgy + latest CRAN R + Rmpi = no go
On 6/8/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On 7 June 2007 at 17:22, Tim Keitt wrote: | I'm just curious if anyone else has had problems with this | configuration. I added the CRAN repository to apt and installed 2.5.0 | with apt-get. I then did an install.packages(Rmpi) on cluster nodes. | Rmpi loads and lamhosts() shows the nodes, but mpi.spawn.Rslaves() | fails (something to do with temp files?). Rmpi works fine with the I have had similar issues at work. If you fix the lam packages at version 7.1.1, it works. It does not seem to work with 7.1.2 in the current Ubuntu, not does it work with 7.1.4 (current upstream version). As other MPI tools seem to work, I would put the error on Rmpi, but I have not had time to pin this down. For what it's worth, a few of us are trying to revive the OpenMPI packages in Debian, and I have started to on a port of Rmpi to ROpenMPI. No ETA for that. | Edgy-native version of R (2.3.x) and installing Edgy's r-cran-rmpi | with apt. (But I need some other packages that only work in 2.4+!) | Could this be a problem with the latest Ubu debs on CRAN? The Rmpi R itself is just fine on Ubuntu, thank you. And very much appreciated. ;-) THK Dirk | author says his R 2.5 setup works fine. CC me please as I'm not | subscribed. | | THK | | -- | Timothy H. Keitt, University of Texas at Austin | Contact info and schedule at http://www.keittlab.org/tkeitt/ | Reprints at http://www.keittlab.org/tkeitt/papers/ | ODF attachment? See http://www.openoffice.org/ | | __ | R-help@stat.math.ethz.ch mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison -- Timothy H. Keitt, University of Texas at Austin Contact info and schedule at http://www.keittlab.org/tkeitt/ Reprints at http://www.keittlab.org/tkeitt/papers/ ODF attachment? See http://www.openoffice.org/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overplots - fixing scientific vs normal notation in output
Peter Lercher wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Lercher Sent: Friday, June 08, 2007 3:07 AM To: r-help@stat.math.ethz.ch Subject: [R] overplots - fixing scientific vs normal notation in output Moving from S-plus to R I encountered many great features and a much more stable system. Currently, I am left with 2 problems that are handled differently: 1) I did lots of overplots in S-Plus using par(new=T,xaxs='d',yaxs='d') to fix the axes -What is the workaround in R ? Since you are using the same axes, do you really need to do the overplotting instead of just using lines/points to add to the plot? R has not implemented xaxs='d', so on your additional plots, just specify xlim and/or ylim directly. There are a couple of ways to do this. First, find the range of values from all of your plots then use this as the argument to xlim and ylim for each plot. Second, create the first plot then use par('usr') to find what the limits of the coordinates are, then use these values for xlim/ylim in further plots (using xaxs/yaxs='i' so the extra 4% is not added). Third, there are probably other ways, but the above should get you started. 2) In S-Plus I could fix scientific notation or normal notation in output -How can I handle this in R ? I found no fix in the documentation Look at options('scipen'), this is not exactly fixing it like S-PLUS, but could solve most your problems. I am using R version 2.4.1 (2006-12-18) on Windows XP SR2 Peter Lercher, M.D., M.P.H., Assoc Prof Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Sicotte, Hugues Ph.D. wrote: People, don't get angry at the pharma statistician, he is just trying to abide by an FDA requirement that is designed to insure that test perform reliably the same. There is no point in getting into which product is better. As far as the FDA rules are concerned a validated system beats a better system any day of the week. There is no such requirement. Here is your polite answer. You can develop and try your software in R. Should they need to use those results in a report that will matter to the FDA, then you can work together with him to set up a validated environment for S-plus. You then have to commit to port your code to S-plus. That doesn't follow. What matters is good statistical analysis practice no matter which environment you use. Note that more errors are made in the data preparation / derived variables stage than are made by statistical software. Frank As I assume that you do not work in a regulated environment, you probably wouldn't have access to a validated SAS environment anyways. It is not usually enough to install a piece of software, you have to validate every step of the installation. Since AFAIK the FDA uses S-plus, it would be to your pharma person's advantage to speed-up submissions if they also had a validated S-plus environment. http://www.msmiami.com/custom/downloads/S-PLUSValidationdatasheet_Final. pdf -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu Sent: Friday, June 08, 2007 9:24 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. I like to know the answer as well. To be honest, I really have hard time to understand the mentality of clinical trial guys and rather believe it is something related to job security. On 6/8/07, Giovanni Parrinello [EMAIL PROTECTED] wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni -- dr. Giovanni Parrinello External Lecturer Medical Statistics Unit Department of Biomedical Sciences Viale Europa, 11 - 25123 Brescia Italy Tel: +390303717528 Fax: +390303717488 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rlm results on trellis plot
I don't think the code below does what's requested, as it assumes a single overall fit for all panels, and I think the requester wanted separate fits by panel. This can be easily done, of course, by a minor modification: xyplot( y ~ x | z, panel = function(x,y,...){ panel.xyplot(x,y,...) panel.abline(lm(y~x),col=blue,lwd=2) panel.abline(rlm(y~x),col = red,lwd=2) }) Note that the coefficients do not need to be explicitly extracted by coef(), as panel.abline will do this automatically. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 Alan S Barnett wrote: How do I add to a trellis plot the best fit line from a robust fit? I can use panel.lm to add a least squares fit, but there is no panel.rlm function. How about using panel.abline() instead of panel.lmline()? fit1 - coef(lm(stack.loss ~ Air.Flow, data = stackloss)) fit2 - coef(rlm(stack.loss ~ Air.Flow, data = stackloss)) xyplot(stack.loss ~ Air.Flow, data=stackloss, panel = function(x, y, ...){ panel.xyplot(x, y, ...) panel.abline(fit1, type=l, col=blue) panel.abline(fit2, type=l, col=red) }, aspect=1) -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Frank et. al: I believe this is a bit too facile. 21 CFR Part 11 does necessitate a software validation **process** -- but this process does not require any particular software. Rather, it requires that those using whatever software demonstrate to the FDA's satisfaction that the software does what it's supposed to do appropriately. This includes a lot more than assuring, say, the numerical accuracy of computations; I think it also requires demonstration that the data are secure, that it is properly transferred from one source to another, etc. I assume that the statistical validation of R would be relatively simple, as R already has an extensive test suite, and it would simply be a matter of providing that test suite info. A bit more might be required, but I don't think it's such a big deal. I think Wensui Liu's characterization of clinical statisticians as having a mentality related to job security is a canard. Although I work in nonclinical, my observation is that clinical statistics is complex and difficult, not only because of many challenging statistical issues, but also because of the labyrinthian complexities of the regulated and extremely costly environment in which they work. It is certainly a job that I could not do. That said, probably the greatest obstacle to change from SAS is neither obstinacy nor ignorance, but rather inertia: pharmaceutical companies have over the decades made a huge investment in SAS infrastructure to support the collection, organization, analysis, and submission of data for clinical trials. To convert this to anything else would be a herculean task involving huge expense, risk, and resources. R, S-Plus (and much else -- e.g. numerous unvalidated data mining software packages) are routinely used by clinical statisticians to better understand their data and for exploratory analyses that are used to supplement official analyses (e.g. for trying to justify collection of tissue samples or a pivotal study in a patient subpopulation). But it is difficult for me to see how one could make a business case to change clinical trial analysis software infrastructure from SAS to S-Plus, SPSS, or anything else. **DISCLAINMER** My opinions only. They do not in any way represent the view of my company or its employees. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Friday, June 08, 2007 7:45 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni Search the archives and you'll find a LOT of responses. Briefly, in my view there are no requirements, just some pharma companies that think there are. FDA is required to accepted all submissions, and they get some where only Excel was used, or Minitab, and lots more. There is a session on this at the upcoming R International Users Meeting in Iowa in August. The session will include dicussions of federal regulation compliance for R, for those users who feel that such compliance is actually needed. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rlm results on trellis plot
On 6/7/07, Alan S Barnett [EMAIL PROTECTED] wrote: How do I add to a trellis plot the best fit line from a robust fit? I can use panel.lm to add a least squares fit, but there is no panel.rlm function. Well, panel.lmline (not panel.lm, BTW) is defined as: panel.lmline function (x, y, ...) { if (length(x) 0) panel.abline(lm(as.numeric(y) ~ as.numeric(x)), ...) } So it's not much of a stretch to define panel.rlmline - function(x, y, ...) if (require(MASS) length(x) 0) panel.abline(rlm(as.numeric(y) ~ as.numeric(x)), ...) The other replies have already shown you how you might use this in a call. -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
I may have overstated things a bit. See section VIII http://www.fda.gov/CDER/GUIDANCE/2396dft.htm If you are analyzing data your statistical package does not necessarely have to be validated. You may have to show that the statistical methods are adequate/appropriate or that the results are reproduced with different softwares if you are using non-standard packages. By all tests, S-plus appears acceptable, do not know about R. However, If your statistical method is an intricate part of a test, then you do have to validate the system. This is becoming increasingly relevant for theragnostics. .. Which is why I said Should they need to use those results in a report [where] that will matter to the FDA.. (I added the where .. It makes more sense) -Original Message- From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] Sent: Friday, June 08, 2007 11:08 AM To: Sicotte, Hugues Ph.D. Cc: Wensui Liu; Giovanni Parrinello; r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. Sicotte, Hugues Ph.D. wrote: People, don't get angry at the pharma statistician, he is just trying to abide by an FDA requirement that is designed to insure that test perform reliably the same. There is no point in getting into which product is better. As far as the FDA rules are concerned a validated system beats a better system any day of the week. There is no such requirement. Here is your polite answer. You can develop and try your software in R. Should they need to use those results in a report that will matter to the FDA, then you can work together with him to set up a validated environment for S-plus. You then have to commit to port your code to S-plus. That doesn't follow. What matters is good statistical analysis practice no matter which environment you use. Note that more errors are made in the data preparation / derived variables stage than are made by statistical software. Frank As I assume that you do not work in a regulated environment, you probably wouldn't have access to a validated SAS environment anyways. It is not usually enough to install a piece of software, you have to validate every step of the installation. Since AFAIK the FDA uses S-plus, it would be to your pharma person's advantage to speed-up submissions if they also had a validated S-plus environment. http://www.msmiami.com/custom/downloads/S-PLUSValidationdatasheet_Final. pdf -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu Sent: Friday, June 08, 2007 9:24 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. I like to know the answer as well. To be honest, I really have hard time to understand the mentality of clinical trial guys and rather believe it is something related to job security. On 6/8/07, Giovanni Parrinello [EMAIL PROTECTED] wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni -- dr. Giovanni Parrinello External Lecturer Medical Statistics Unit Department of Biomedical Sciences Viale Europa, 11 - 25123 Brescia Italy Tel: +390303717528 Fax: +390303717488 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
As I read 21 CFR 11, the regulation deals more with ensuring the security of the electronic health record itself. Thus, it seemed to me that so long as the software (SAS, R, Splus, etc.) could not alter the data base in any way then you're fine (this may be naive, but that's how I understood it). The 'software validation' referred to in the document seems to be concerned with software that is directly related to the device (i.e. software that is part of the functionality of the device). How on earth could you really validate every process that a stat software package is capable of anyway? It seems to me that we would be better off focusing our attention on the quality of the programming/data manipulation rather than on the 'validation of the software' (which in terms of verifying its complete functionality isn't possible anyway). I'm also curious that the open source (i.e. non-black-box) nature of R hasn't struck more of a chord with regulatory bodies. I tried to make the case above to our software quality people, but they insisted on a full OQ/IQ/PQ plan (which I'm in the process of drafting - possibly my most boring task ever). In addition, my boss insisted on using Splus instead of R because I couldn't convince him that R would be acceptable (even though the OQ/IQ/PQ requirements would have been the same and R would have saved money). That said, I would have had to write new OQ/IQ/PQ plans every time there was a new version of R available (ouch). I have no complaints with regards to Splus, but R would have been free. One last thought to add to Bert's - I think one other thing that is holding up the spread of S to the pharma/device companies is the availability of S programmers (SAS programmers are plentiful). I tried to get 'has experience in S' added to our next job opening, but my boss insisted that we would never find a person with experience in SAS and S. I countered by asking if we could just ask for S experience (and drop the SAS requirement), but he gave me a dirty look. :) Cody Hamilton, PhD Edwards Lifesciences [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compute new variable
Hello, maybe my question ist stupid, but I would like to calculate a new variable for all cases in my dataset. Inspired by the dialog in Rcmdr I tried Datenmatrix$cohigha- with(Datenmatrix,mean (c(M2ORG, M5ORG, M8ORG, M11ORG), na.rm = TRUE) as output I got the same number for all my cases (possibly the overallmean of all cases), instead of a mean for each case. Can you help me with this problem? regards Matthias __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ievent.wait
I am working on a plot and would be like to click on a few points and then have a line connect them. Could anyone help me with this or advise me in a direction that would suit this. I know I would be using ievent.wait in iplot but not sure about this. thank you. -- View this message in context: http://www.nabble.com/ievent.wait-tf3891095.html#a11030568 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Bert Gunter wrote: Frank et. al: I believe this is a bit too facile. 21 CFR Part 11 does necessitate a software validation **process** -- but this process does not require any For database software and for medical devices - particular software. Rather, it requires that those using whatever software demonstrate to the FDA's satisfaction that the software does what it's supposed to do appropriately. This includes a lot more than assuring, say, the numerical accuracy of computations; I think it also requires demonstration that the data are secure, that it is properly transferred from one source to another, etc. I assume that the statistical validation of R would be relatively simple, as R already has an extensive test suite, and it would simply be a matter of providing that test suite info. A bit more might be required, but I don't think it's such a big deal. I think Wensui Liu's characterization of clinical statisticians as having a mentality related to job security is a canard. Although I work in nonclinical, my observation is that clinical statistics is complex and difficult, not only because of many challenging statistical issues, but also because of the labyrinthian complexities of the regulated and extremely costly environment in which they work. It is certainly a job that I could not do. That said, probably the greatest obstacle to change from SAS is neither obstinacy nor ignorance, but rather inertia: pharmaceutical companies have over the decades made a huge investment in SAS infrastructure to support the collection, organization, analysis, and submission of data for clinical trials. To convert this to anything else would be a herculean task involving huge expense, risk, and resources. R, S-Plus (and much else -- e.g. numerous unvalidated data mining software packages) are routinely used by clinical statisticians to better understand their data and for exploratory analyses that are used to supplement official analyses (e.g. for trying to justify collection of tissue samples or a pivotal study in a patient subpopulation). But it is difficult for me to see how one could make a business case to change clinical trial analysis software infrastructure from SAS to S-Plus, SPSS, or anything else. What I would love to have is some efficiency estimates for SAS macro programming as done in pharma vs. using a high-level language. My bias is that SAS macro programming, which costs companies more than SAS licenses, is incredibly inefficient. Frank **DISCLAINMER** My opinions only. They do not in any way represent the view of my company or its employees. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Friday, June 08, 2007 7:45 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni Search the archives and you'll find a LOT of responses. Briefly, in my view there are no requirements, just some pharma companies that think there are. FDA is required to accepted all submissions, and they get some where only Excel was used, or Minitab, and lots more. There is a session on this at the upcoming R International Users Meeting in Iowa in August. The session will include dicussions of federal regulation compliance for R, for those users who feel that such compliance is actually needed. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compute new variable
Matthias von Rad wrote: Hello, maybe my question ist stupid, but I would like to calculate a new variable for all cases in my dataset. Inspired by the dialog in Rcmdr I tried Datenmatrix$cohigha- with(Datenmatrix,mean (c(M2ORG, M5ORG, M8ORG, M11ORG), na.rm = TRUE) as output I got the same number for all my cases (possibly the overallmean of all cases), instead of a mean for each case. Can you help me with this problem? Datenmatrix$cohigha - rowMeans(Datenmatrix[,c(M2ORG, M5ORG, M8ORG, M11ORG)], na.rm=TRUE) regards Matthias __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Bert, I just want to make sure what I said is not overstated to offend statistician who use SAS. actually, i am using SAS daily and able to use it pretty well. ^_^ What I meant are: 1) I don't understand the mentality 2) using SAS instead of R might be related to job-security. which is very different from their mentality is related to job security. On 6/8/07, Bert Gunter [EMAIL PROTECTED] wrote: Frank et. al: I believe this is a bit too facile. 21 CFR Part 11 does necessitate a software validation **process** -- but this process does not require any particular software. Rather, it requires that those using whatever software demonstrate to the FDA's satisfaction that the software does what it's supposed to do appropriately. This includes a lot more than assuring, say, the numerical accuracy of computations; I think it also requires demonstration that the data are secure, that it is properly transferred from one source to another, etc. I assume that the statistical validation of R would be relatively simple, as R already has an extensive test suite, and it would simply be a matter of providing that test suite info. A bit more might be required, but I don't think it's such a big deal. I think Wensui Liu's characterization of clinical statisticians as having a mentality related to job security is a canard. Although I work in nonclinical, my observation is that clinical statistics is complex and difficult, not only because of many challenging statistical issues, but also because of the labyrinthian complexities of the regulated and extremely costly environment in which they work. It is certainly a job that I could not do. That said, probably the greatest obstacle to change from SAS is neither obstinacy nor ignorance, but rather inertia: pharmaceutical companies have over the decades made a huge investment in SAS infrastructure to support the collection, organization, analysis, and submission of data for clinical trials. To convert this to anything else would be a herculean task involving huge expense, risk, and resources. R, S-Plus (and much else -- e.g. numerous unvalidated data mining software packages) are routinely used by clinical statisticians to better understand their data and for exploratory analyses that are used to supplement official analyses (e.g. for trying to justify collection of tissue samples or a pivotal study in a patient subpopulation). But it is difficult for me to see how one could make a business case to change clinical trial analysis software infrastructure from SAS to S-Plus, SPSS, or anything else. **DISCLAINMER** My opinions only. They do not in any way represent the view of my company or its employees. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Friday, June 08, 2007 7:45 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni Search the archives and you'll find a LOT of responses. Briefly, in my view there are no requirements, just some pharma companies that think there are. FDA is required to accepted all submissions, and they get some where only Excel was used, or Minitab, and lots more. There is a session on this at the upcoming R International Users Meeting in Iowa in August. The session will include dicussions of federal regulation compliance for R, for those users who feel that such compliance is actually needed. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
[R] pnorm how to decide lower-tail true or false
Hi to all, maybe the last question was not clear enough. I did not found any hints how to decide whether it should use lower.tail or not. As it is an extra R-feature ( written in http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66250.html ) I do not find anything about it in any statistical books of me. Regards Carmen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Batch processing in Windows
Hi, I am a complete newbe to R, so the following problem will probably be trivial for most of you guys: I get an error message every time I try to run a R file directly from the DOS shell. My R file (test.R) is intended to create a basic graph and has a very simple code: x-rep(1:10,1) y-rep(1:10,1) plot(x,y) I am using the following command to call this file directly from the c:/ root: C:/R CMD BATCH e:/Documents Seb/3_/test.R And here is the error message (Translated from french to english): 'R' is not recognized as an internal or external command, an executable script or a command file My OS is a french Windows XP sp2 and I am using R version 2.5.0. I wonder if the problem comes from an installation problem... Thank you in advance for your help. Sebastien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a table of a desired dimension
You need to basically use table on factors with fixed pre-specified levels. For example: x - c(runif(100,10,40), runif(100,43,55)) y - c(runif(100,7,35), runif(100,37,50)) z - c(runif(100,10,42), runif(100,45,52)) xx - ceiling(x); yy - ceiling(y); zz - ceiling(z) mylevels - min( c(xx, yy, zz) ) : max( c(xx, yy, zz) ) out - cbind( table( factor(xx, levels=mylevels) ), table( factor(yy, levels=mylevels) ), table( factor(zz, levels=mylevels) ) ) You could replace the last command with simply sapply( list(xx, yy, zz), function(vec) table( factor(vec, levels=mylevels) ) ) Regards, Adai Rubén Roa-Ureta wrote: Hi ComRades, I want to make a matrix of frequencies from vectors of a continuous variable spanning different values. For example this code x-c(runif(100,10,40),runif(100,43,55)) y-c(runif(100,7,35),runif(100,37,50)) z-c(runif(100,10,42),runif(100,45,52)) a-table(ceiling(x)) b-table(ceiling(y)) c-table(ceiling(z)) a b c will give me three tables that start and end at different integer values, and besides, they have 'holes' in between different integer values. Is it possible to use 'table' to make these three tables have the same dimensions, filling in the absent labels with zeroes? In the example above, the desired tables should all start at 8 and tables 'a' and 'c' should put a zero at labels '8' to '10', should all put zeros in the frequencies of the labels corresponding to the holes, and should all end at label '55'. The final purpose is the make a matrix and use 'matplot' to plot all the frequencies in one plot, such as #code valid only when 'a', 'b', and 'c' have the proper dimension p-mat.or.vec(48,4) p[,1]-8:55 p[,2]-c(matrix(a)[1:48]) p[,3]-c(matrix(b)[1:48]) p[,4]-c(matrix(c)[1:48]) matplot(p) I read the help about 'table' but I couldn't figure out if dnn, deparse.level, or the other arguments could serve my purpose. Thanks for your help Rubén __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
Martin Henry H. Stevens sent the following at 08/06/2007 15:11: Is there an example available of this sort of problematic data that requires this kind of data screening and filtering? For many of us, this issue would be nice to learn about, and deal with within R. If a package could be created, that would be optimal for some of us. I would like to learn a tad more, if it were not too much effort for someone else to point me in the right direction? Cheers, Hank On Jun 8, 2007, at 8:47 AM, Douglas Bates wrote: On 6/7/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - ... rest snipped ... OK, I can't resist that invitation. I think there are many kinds of problematic data. I handle some nasty textish things in perl (and I loved the purgatory quote) and I'm afraid I do some things in Excel and some cleaning I can handle in R, but I never enter data directly into R. However, one very common scenario I have faceda all my working life is psych data from questionnaires or interviews in low budget work, mostly student research or routine entry of therapists' data. Typically you have an identifier, a date, some demographics and then a lot of item data. There's little money (usual zero) involved for data entry and cleaning but I've produced a lot of good(ish) papers out of this sort of very low budget work over the last 20 years. (Right at the other end of a financial spectrum from the FDA/validated s'ware thread but this is about validation again!) The problem I often face is that people are lousy data entry machines (well, actually, they vary ... enormously) and if they mess up the data entry we all know how horrible this can be. SPSS (boo hiss) used to have an excellent module, actually a standalone PC/Windoze program, that allowed you to define variables so they had allowed values and it would refuse to accept out of range or out of acceptable entries, it also allowed you to create checking rules and rules that would, in the light of earlier entries, set later values and not ask about them. In a rudimentary way you could also lay things out on the screen so that it paginated where the q'aire or paper data record did etc. The final nice touch was that you could define some variables as invariant and then set the thing so an independent data entry person could re-enter the other data (i.e. pick up q'aire, see if ID fits the one showing on screen, if so, enter the rest of the data). It would bleep and not move on if you entered a value other than that entered by the first person and you had to confirm that one of you was right. That saved me wasted weeks I'm sure on analysing data that turned out to be awful and I'd love to see someone build something to replace that. Currently I tend to use (boo hiss) Excel for this as everyone I work with seems to have it (and not all can install open office and anyway I haven't had time to learn that properly yet either ...) and I set up spreadsheets with validation rules set. That doesn't get the branching rules and checks (e.g. if male, skip questions about periods, PMT and pregnancies), or at least, with my poor Excel skills it doesn't. I just skip a column to indicate page breaks in the q'aire, and I get, when I can, two people to enter the data separately and then use R to compare the two spreadsheets having yanked them into data frames. I would really, really love someone to develop (and perhaps replace) the rather buggy edit() and fix() routines (seem to hang on big data frames in Rcmdr which is what I'm trying to get students onto) with something that did some or all of what SPSS/DE used to do for me or I bodge now in Excel. If any generous coding whiz were willing to do this, I'll try to alpha and beta test and write help etc. There _may_ be good open source things out there that do what I need but something that really integrated into R would be another huge step forward in being able to phase out SPSS in my work settings and phase in R. Very best all, Chris -- Chris Evans [EMAIL PROTECTED] Skype: chris-psyctc Professor of Psychotherapy, Nottingham University; Consultant Psychiatrist in Psychotherapy, Notts PDD network; Research Programmes Director, Nottinghamshire NHS Trust; *If I am writing from one of those roles, it will be clear. Otherwise* *my views are my own and not representative of those institutions* __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pointwise confidence bands or interval values for a non parametric sm.regression
Hi Martin, Do please, at least, read the documentation for the package you are using!: ?sm.options ## sub: display ## Example with(iris, sm.regression(Sepal.Length, Sepal.Width, display=se)) Regards, Mark Difford. M. P. Papadatos wrote: Dear all, Is there a way to plot / calculate pointwise confidence bands or interval values for a non parametric regression like sm.regression? Thank you in advance. Regards, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/pointwise-confidence-bands-or-interval-values-for-a-non-parametric-sm.regression-tf3890206.html#a11030924 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Batch processing in Windows
R isn't in your path. Either change your path to include it or place Rcmd.bat from batchfiles anywhere in your existing path: http://code.google.com/p/batchfiles/ and then: Rcmd BATCH ...whatever... On 6/8/07, Sébastien Bihorel [EMAIL PROTECTED] wrote: Hi, I am a complete newbe to R, so the following problem will probably be trivial for most of you guys: I get an error message every time I try to run a R file directly from the DOS shell. My R file (test.R) is intended to create a basic graph and has a very simple code: x-rep(1:10,1) y-rep(1:10,1) plot(x,y) I am using the following command to call this file directly from the c:/ root: C:/R CMD BATCH e:/Documents Seb/3_/test.R And here is the error message (Translated from french to english): 'R' is not recognized as an internal or external command, an executable script or a command file My OS is a french Windows XP sp2 and I am using R version 2.5.0. I wonder if the problem comes from an installation problem... Thank you in advance for your help. Sebastien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
For windows users, EpiData Entry http://www.epidata.dk/ is an excellent (free) tool for data entry and documentation.--Dale On 6/8/07, Chris Evans [EMAIL PROTECTED] wrote: Martin Henry H. Stevens sent the following at 08/06/2007 15:11: Is there an example available of this sort of problematic data that requires this kind of data screening and filtering? For many of us, this issue would be nice to learn about, and deal with within R. If a package could be created, that would be optimal for some of us. I would like to learn a tad more, if it were not too much effort for someone else to point me in the right direction? Cheers, Hank On Jun 8, 2007, at 8:47 AM, Douglas Bates wrote: On 6/7/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - ... rest snipped ... OK, I can't resist that invitation. I think there are many kinds of problematic data. I handle some nasty textish things in perl (and I loved the purgatory quote) and I'm afraid I do some things in Excel and some cleaning I can handle in R, but I never enter data directly into R. However, one very common scenario I have faceda all my working life is psych data from questionnaires or interviews in low budget work, mostly student research or routine entry of therapists' data. Typically you have an identifier, a date, some demographics and then a lot of item data. There's little money (usual zero) involved for data entry and cleaning but I've produced a lot of good(ish) papers out of this sort of very low budget work over the last 20 years. (Right at the other end of a financial spectrum from the FDA/validated s'ware thread but this is about validation again!) The problem I often face is that people are lousy data entry machines (well, actually, they vary ... enormously) and if they mess up the data entry we all know how horrible this can be. SPSS (boo hiss) used to have an excellent module, actually a standalone PC/Windoze program, that allowed you to define variables so they had allowed values and it would refuse to accept out of range or out of acceptable entries, it also allowed you to create checking rules and rules that would, in the light of earlier entries, set later values and not ask about them. In a rudimentary way you could also lay things out on the screen so that it paginated where the q'aire or paper data record did etc. The final nice touch was that you could define some variables as invariant and then set the thing so an independent data entry person could re-enter the other data (i.e. pick up q'aire, see if ID fits the one showing on screen, if so, enter the rest of the data). It would bleep and not move on if you entered a value other than that entered by the first person and you had to confirm that one of you was right. That saved me wasted weeks I'm sure on analysing data that turned out to be awful and I'd love to see someone build something to replace that. Currently I tend to use (boo hiss) Excel for this as everyone I work with seems to have it (and not all can install open office and anyway I haven't had time to learn that properly yet either ...) and I set up spreadsheets with validation rules set. That doesn't get the branching rules and checks (e.g. if male, skip questions about periods, PMT and pregnancies), or at least, with my poor Excel skills it doesn't. I just skip a column to indicate page breaks in the q'aire, and I get, when I can, two people to enter the data separately and then use R to compare the two spreadsheets having yanked them into data frames. I would really, really love someone to develop (and perhaps replace) the rather buggy edit() and fix() routines (seem to hang on big data frames in Rcmdr which is what I'm trying to get students onto) with something that did some or all of what SPSS/DE used to do for me or I bodge now in Excel. If any generous coding whiz were willing to do this, I'll try to alpha and beta test and write help etc. There _may_ be good open source things out there that do what I need but something that really integrated into R would be another huge step forward in being able to phase out SPSS in my work settings and phase in R. Very best all, Chris -- Chris Evans [EMAIL PROTECTED] Skype: chris-psyctc Professor of Psychotherapy, Nottingham University; Consultant Psychiatrist in Psychotherapy, Notts PDD network; Research Programmes Director, Nottinghamshire NHS Trust; *If I am writing from one of those roles, it will be clear. Otherwise* *my views are my own and not representative of those institutions* __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pnorm how to decide lower-tail true or false
At 01:31 PM 6/8/2007, Carmen wrote: Hi to all, maybe the last question was not clear enough. I did not found any hints how to decide whether it should use lower.tail or not. As it is an extra R-feature ( written in http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66250.html ) I do not find anything about it in any statistical books of me. Regards Carmen pnorm(z, lower.tail=TRUE) (the R default) gives the probability of a normal variate being at or below z. This is the value commonly called the cumulative distribution function at the point z, or the integral from -Inf to z of the gaussian density. pnorm(z, lower.tail=FALSE) gives the complement of the above, or 1 - cdf(z), and is the integral from z to Inf of the gaussian density. E.g., pnorm(1.96, lower.tail=TRUE) [1] 0.9750021 pnorm(1.96, lower.tail=FALSE) [1] 0.02499790 Use lower.tail=TRUE if you are, e.g., finding the probability at the lower tail of a confidence interval or if you want to the probability of values no larger than z. Use lower.tail=FALSE if you are, e.g., trying to calculate test value significance or at the upper confidence limit, or you want the probability of values z or larger. You should use pnorm(z, lower.tail=FALSE) instead of 1-pnorm(z) because the former returns a more accurate answer for large z. This is really simple issue, and has no inherent complexity associated with it. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Batch processing in Windows
Alternatively, use the full path in your call to R as I do below: F:\Program Files\R\R-2.4.1pat\bin\R.exe CMD BATCH --vanilla --slave whatever.R HTH, Roger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Friday, June 08, 2007 1:51 PM To: Sébastien Bihorel Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Batch processing in Windows R isn't in your path. Either change your path to include it or place Rcmd.bat from batchfiles anywhere in your existing path: http://code.google.com/p/batchfiles/ and then: Rcmd BATCH ...whatever... On 6/8/07, Sébastien Bihorel [EMAIL PROTECTED] wrote: Hi, I am a complete newbe to R, so the following problem will probably be trivial for most of you guys: I get an error message every time I try to run a R file directly from the DOS shell. My R file (test.R) is intended to create a basic graph and has a very simple code: x-rep(1:10,1) y-rep(1:10,1) plot(x,y) I am using the following command to call this file directly from the c:/ root: C:/R CMD BATCH e:/Documents Seb/3_/test.R And here is the error message (Translated from french to english): 'R' is not recognized as an internal or external command, an executable script or a command file My OS is a french Windows XP sp2 and I am using R version 2.5.0. I wonder if the problem comes from an installation problem... Thank you in advance for your help. Sebastien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm() for log link and Weibull family
I need to be able to run a generalized linear model with a log() link and a Weibull family, or something similar to deal with an extreme value distribution. I actually have a large dataset where this is apparently necessary. It has to do with recovery of forensic samples from surfaces, where as much powder as possible is collected. This apparently causes the results to conform to some type of extreme value distribution, so Weibull is a reasonable starting point for exploration. I have tried ('surface' and 'team' are factors) glm(surfcount ~ surface*team, data=powderd, family=Gamma(link='log')) but this doesn't quite do the trick. The standardized deviance residuals are still curved away from normal at the tails. Thanks for any info you can give on this nonstandard model. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluating variables in the context of a data frame
On 6/8/2007 11:33 AM, Zack Weinberg wrote: On 6/7/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: f - function(x, dat) evalq(x, dat) f(o, D) Error in eval(expr, envir, enclos) : object o not found g - function(x, dat) eval(x, dat) g(o, D) Error in eval(x, dat) : object o not found What am I doing wrong? This seems to be what the helpfiles say you do to evaluate arguments in the context of a passed-in data frame... When you call f(o, D), the argument 'o' is evaluated in the current environment ('context' in R means something different). Because of lazy evaluation, it is not evaluated until evalq is called, but it evaluated as if it was evaluated greedily. g(quote(o), D) will work. Thanks. After a bit more experimentation I figured out that this does what I want: h - function(x, d) eval(substitute(x), d, parent.frame()) but I don't understand why the substitute() helps, or indeed why it has any effect at all... Within the evaluation frame of h, x is a promise to evaluate an expression. substitute(x) extracts the expression. If you just use x, it gets evaluated in the frame from which h was called, rather than in a frame created from d. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Not to mention all the work that goes into PROC TEMPLATE and ANNOTATE to make SAS graphs presentable! I suspect that a lot of companies don't use SAS graphs or tables at all - they just export the data from SAS to Excel. -Cody Cody Hamilton, PhD Edwards Lifesciences What I would love to have is some efficiency estimates for SAS macro programming as done in pharma vs. using a high-level language. My bias is that SAS macro programming, which costs companies more than SAS licenses, is incredibly inefficient. Frank [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] wrapping lattice xyplot
This is an expanded version of the question I tried to ask last night - I thought I had it this morning, but it's still not working and I just do not understand what is going wrong. What I am trying to do is write a wrapper for lattice xyplot() that passes a whole bunch of its secondary arguments, so that I can produce similarly formatted graphs for several different data sets. This is what I've got: graph - function (x, data, groups, xlab) { g - eval(substitute(groups), data, parent.frame()) pg - function(x, y, group.number, ...) { panel.xyplot(x, y, ..., group.number=group.number) panel.text(2, unique(y[x==2]), levels(g)[group.number], pos=4, cex=0.5) } xyplot(x, data=data, groups=substitute(g), type='l', ylab=list(cex=1.1, label='Mean RT (ms)'), xlab=list(cex=1.1, label=xlab), scales=list( x=list(alternating=c(1,1), tck=c(1,0)), y=list(alternating=c(1,0)) ), panel=panel.superpose, panel.groups=pg ) } pg is supposed to pick g up from the lexical enclosure. I have no idea whether that actually works, because it never gets that far. A typical call to this function looks like so: graph(est ~ pro | hemi, sm, obs, Probe type) (where 'sm' is a data frame that really does contain all four columns 'est', 'pro', 'hemi', and 'obs', pinky swear) and, as it stands above, invariably gives me this error: Error in eval(expr, envir, enclos) : object est not found I tried substitute(x) (as that seems to have cured a similar problem with g) but then x is not a formula and method dispatch fails. Help? zw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluating variables in the context of a data frame
On 6/8/07, Duncan Murdoch [EMAIL PROTECTED] wrote: After a bit more experimentation I figured out that this does what I want: h - function(x, d) eval(substitute(x), d, parent.frame()) but I don't understand why the substitute() helps, or indeed why it has any effect at all... Within the evaluation frame of h, x is a promise to evaluate an expression. substitute(x) extracts the expression. If you just use x, it gets evaluated in the frame from which h was called, rather than in a frame created from d. Thanks, that's helpful. Could you comment on substitute() use in the message I just posted which contains the actual code I'm trying to get to work? In addition to the question asked there, after your explanation I still do not understand why g - ... xyplot ( ..., groups=g, ... ) should refuse to find g, and the same thing with groups=substitute(g) works (well, gets farther before blowing up). zw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to find how many modes in 2 dimensions case
Hi, Does anyone know how to count the number of modes in 2 dimensions using kde2d function? Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
On Fri, 2007-06-08 at 16:02 +0200, Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni The polite answer is that there is no such thing as 'FDA approved' software for conducting clinical trials. The FDA does not approve, validate or otherwise endorse software. If the pharma company in question has developed their own list of acceptable software applications that you must comply with, that is different, but is independent of any FDA requirements. As the saying used to be several decades ago, Nobody ever got fired for buying IBM. In the clinical trials realm today, the same could be said for SAS or Oracle Clinical. That is a political, and perhaps a corporate legal counsel driven risk aversion based issue, not a scientific one. It is also a human behavioral issue, as Bert noted, relative to fighting inertia, training or re-training issues and the pre-existing investment in internal processes and infrastructure. This will change over time as more statisticians, who have been trained in the use of R during their academic years, enter into industry positions. As others have noted, there is a PERCEPTION that somehow SAS is endorsed by the FDA or that it constitutes a 'gold standard' of sorts. This is a perception and not reality. That being said: There are a variety of relevant Guidance and Guideline documents that the FDA has put forth to address these issues. Most recently, the FDA approved final guidance for the use of computerized systems in clinical investigations (May 2007): http://www.fda.gov/OHRMS/DOCKETS/98fr/04d-0440-gdl0002.pdf In addition, there is a General Principles of Software Validation document: http://www.fda.gov/cdrh/comp/guidance/938.html The majority of the 21 CFR Part 11 requirements (audit trails, electronic signatures, etc.) are relevant to systems that manage source medical records. These would typically be database applications and medical devices, not statistical applications. In our shop for example, our Oracle 10g server has been implemented in accordance with these requirements. There is a 21 CFR Part 11 guidance document here: http://www.fda.gov/ohrms/dockets/98fr/5667fnl.pdf There are also all of the so-called FDA and ICH GxP (Good x Practice) documents: http://www.fda.gov/oc/gcp/guidance.html http://www.ich.org/cache/compo/475-272-1.html that provide a framework for operations in a regulated environment and for relevant statistical practice guidance. The 'x' above is replaced by words such as Clinical, Manufacturing, Laboratory, etc. There is even a draft guidance document on the use of Bayesian techniques for medical device trials: http://www.fda.gov/cdrh/osb/guidance/1601.html Some of the references in other posts have to do with software embedded in medical devices, which could be anything such as bedside ECG monitoring stations, diagnostic imaging systems, radiation therapy instrumentation and pacemakers. These are generally not relevant to this discussion. The bottom line, is that while there is a burden on the part of the 'software publisher' to utilize and document reasonable manufacturing, version control, software maintenance and quality processes, the overwhelming burden is on the END USER to determine that their statistical package is suitable for the application intended and to have written SOPs (Standard Operating Procedures) to define how they will validate their installation and use of the statistical software. This goes to some of the comments that Cody had relative to IQ/OQ/PQ documentation, which refers to Installation Qualification, Operational Qualification and Performance Qualification. For example, in the context of R, the use of make check-all and the retention of the output subsequent to compiling R from source code can be part of that documentation process. Bert referred to this in his comments. Beyond that, the details of such documentation will be driven by a variety of characteristics that are relevant to the nature of the environment (academic, commercial, clinical, pre-clinical, etc.) in which one is operating and related considerations. As Frank noted, there will be a session at useR!2007: http://user2007.org/ entitled The Use of R in Clinical Trials and Industry-Sponsored Medical Research. This session will take place on Friday, August 10 and I would invite any interested parties to attend the meetings. I think that you will find the subject matter quite enlightening. One closing comment: There is increasing use of R within the FDA itself and this will only further help to assuage the fears of prospective users over time. Best regards, Marc
Re: [R] glm() for log link and Weibull family
On Fri, 8 Jun 2007, Robert A. LaBudde wrote: I need to be able to run a generalized linear model with a log() link and a Weibull family, or something similar to deal with an extreme value distribution. The Weibull with log link is not a GLM, but survreg() in package survival can fit it, as well as other extreme-value distributions. I actually have a large dataset where this is apparently necessary. It has to do with recovery of forensic samples from surfaces, where as much powder as possible is collected. This apparently causes the results to conform to some type of extreme value distribution, so Weibull is a reasonable starting point for exploration. I have tried ('surface' and 'team' are factors) glm(surfcount ~ surface*team, data=powderd, family=Gamma(link='log')) but this doesn't quite do the trick. The standardized deviance residuals are still curved away from normal at the tails. Thanks for any info you can give on this nonstandard model. It's perfectly standard, just not a GLM. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] still trying to wrap xyplot - ignore previous
As you may not be surprised to hear, no sooner did I post the previous message than I realized I had a really dumb mistake. I've now gotten a bit farther but am still stuck. New code: graph - function (x, data, groups, xlab) { pg - function(x, y, group.number, ...) fnord body(pg) - substitute({ panel.xyplot(x, y, ..., group.number=group.number) panel.text(2, unique(y[x==2]), levels(G)[group.number], pos=4, cex=0.5) }, list(G=eval(substitute(groups), data, parent.frame( print(xyplot(x, data=data, groups=substitute(groups), type='l', ylab=list(cex=1.1, label='Mean RT (ms)'), xlab=list(cex=1.1, label=xlab), scales=list( x=list(alternating=c(1,1), tck=c(1,0)), y=list(alternating=c(1,0)) ), panel=panel.superpose, panel.groups=pg )) } Questions: 1) The groups=substitute(groups) bit (in the call to xyplot) still doesn't work. As far as I can tell, xyplot wants the *symbol* which is the name of the factor (in the data frame) to group by. The above seems to wind up passing it the symbol groups, which causes the prepanel function to barf. I have not been able to find any way to evaluate one layer of groups to get me the symbol passed in, rather than the value of that symbol. Am I right? How do I give it what it wants? 2) Why do I have to do that stupid dance with replacing the body of pg? The documentation leads me to believe this is a lexically scoped language, shouldn't it be able to pick G out of the enclosing frame? zw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
Dale Steele wrote: For windows users, EpiData Entry http://www.epidata.dk/ is an excellent (free) tool for data entry and documentation.--Dale Note that EpiData seems to work well under linux using wine. Frank __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
The fact that FDA statisticians are using R also assuages one of the main concerns that I have heard voiced about using R for FDA submissions - that there would be no statisticians available at FDA to review R code which would seriously delay the review of a submission. Mark also brings up a good point by mentioning the FDA guidance on Bayesian submissions. If SAS were the only approved product, Bayesian trials would be in real trouble. Cody Hamilton, PhD Staff Biostatistician Edwards Lifesciences Disclaimer: As always, I am speaking for myself and not necessarily for Edwards lifesciences. One closing comment: There is increasing use of R within the FDA itself and this will only further help to assuage the fears of prospective users over time. Best regards, Marc Schwartz [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find how many modes in 2 dimensions case
Note that the number of modes (local maxima??) is a function of the bandwidth, so I'm not sure your question is even meaningful. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 11:54 AM To: R-help@stat.math.ethz.ch Subject: [R] how to find how many modes in 2 dimensions case Hi, Does anyone know how to count the number of modes in 2 dimensions using kde2d function? Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find how many modes in 2 dimensions case
Thanks for the reply, maybe I shall say bumps, I can use persp to show a density on a X Y dimensions. one peak is one mode I think. I try to find an automatic way to detect how many peaks of the densities. Pat Note that the number of modes (local maxima??) is a function of the bandwidth, so I'm not sure your question is even meaningful. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 11:54 AM To: R-help@stat.math.ethz.ch Subject: [R] how to find how many modes in 2 dimensions case Hi, Does anyone know how to count the number of modes in 2 dimensions using kde2d function? Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.