Re: [R] How I can read the binary file with different type?
On Fri, 22 Aug 2008, Bingxiang Miao wrote: Hi all, I have a binary file which have 8*100 bytes. The structure of the file is as follows: every eigth bytes represent 2 data:the first four bytes is the little-endian for integer, the next four bytes is the little-endian for floating32. The structure of the following bytes is as the same as the first eight bytes'. As the function readBin only read the binary file with one structure like this: readBin(my data file,what=integer,size=4,n=200) or readBin(my data file,what=numeric,size=4,n=200).But only one type of the data can be read properly. That's plain wrong, and the examples on the help page disprove it. Please do read the help before posting. The concept is to use multiple readBin calls from an open connection (as in the examples on the help page). Can anyone suggest me how should I read this file? We DO suggest you 'should' read the posting guide and the help pages. Thanks in advance. Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help Regarding 'integrate'
As my post showed, it is a scaling issue. The function has so small a peak that it is effectively 0 -- when scaled sensibly integrate then works out of the box. On Thu, 21 Aug 2008, Thomas Lumley wrote: On Thu, 21 Aug 2008, Moshe Olshansky wrote: The phenomenon is most likely caused by numerical errors. I do not know how 'integrate' works but numerical integration over a very long interval does not look a good idea to me. Numerical integration over a long (or infinite) interval is fine. The problem here is that integrate appears not to find the peak of the integrand. The integrand is very flat near zero, which makes things harder. It seems to work well to split the integral somewhere not too far to the left of its peak. Splits at 20, 50,100,or 200 give good agreement. integrate(f,0,100) 0.0001600355 with absolute error 6.5e-10 integrate(f,100,Inf) 5.355248e-06 with absolute error 3.4e-06 integrate(f,0,200) 0.0001653797 with absolute error 3.1e-05 integrate(f,200,Inf) 3.141137e-13 with absolute error 1.2e-13 integrate(f,0,50) 2.702318e-05 with absolute error 1.3e-16 integrate(f,50, Inf) 0.0001383181 with absolute error 8e-05 Transforming the integral is only going to be helpful if you have a good idea of what it looks like and what the integrate() function is expecting. An automated transformation to a bounded interval won't help since that is the first thing the underlying Fortran code does. -thomas I would do the following: f1-function(x){ return(dchisq(x,9,77)*((13.5/x)^5)*exp(-13.5/x)) } f2-function(y){ return(dchisq(1/y,9,77)*((13.5*y)^5)*exp(-13.5*y)/y^2) } # substitute y = 1/x I1 - integrate(f1,0,1,abs.tol=abs.tol=1.0e-20*.Machine$double.eps^0.25) I2 - integrate(f2,0,1,abs.tol=abs.tol=1.0e-20*.Machine$double.eps^0.25) --- On Thu, 21/8/08, A [EMAIL PROTECTED] wrote: From: A [EMAIL PROTECTED] Subject: [R] Help Regarding 'integrate' To: r-help@r-project.org Received: Thursday, 21 August, 2008, 4:44 PM I have an R function defined as: f-function(x){ return(dchisq(x,9,77)*((13.5/x)^5)*exp(-13.5/x)) } Numerically integrating this function, I observed a couple of things: A) Integrating the function f over the entire positive real line gives an error: integrate(f,0,Inf) Error in integrate(f, 0, Inf) : the integral is probably divergent B) Increasing the interval of integration actually decreases the value of the integral: integrate(f,0,10^5) 9.813968e-06 with absolute error 1.9e-05 integrate(f,0,10^6) 4.62233e-319 with absolute error 4.6e-319 Since the function f is uniformly positive, B) can not occur. Also, the theory tells me that the infinite integral actually exists and is finite, so A) can not occur. That means there are certain problems with the usage of function 'integrate' which I do not understand. The help document tells me that 'integrate' uses quadrature approximation to evaluate integrals numerically. Since I do not come from the numerical methods community, I would not know the pros and cons of various methods of quadrature approximation. One naive way that I thought of evaluating the above integral was by first locating the maximum of the function (may be by using 'optimize' in R) and then applying the Simpson's rule to an interval around the maximum. However, I am sure that the people behind the R project and other users have much better ideas, and I am sure the 'correct' method has already been implemented in R. Therefore, I would appreciate if someone can help me find it. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] R memory limitations under Windows vs UNIX
On Fri, 22 Aug 2008, Fiona Giannini wrote: Hi all, Our section at work is looking at buying a high-powered computer (32 Gb RAM, 64-bit) to be able run models on large datasets and have more processing power for the software we commonly use. We mostly use R to fit GLMs and GAMs. Our department uses Windows as the standard OS (we are upgrading to Vista in the next few months) and so most people would be in favour of having the same OS on this computer. I have concerns with Windows as I'm not convinced it will make full use of the increased RAM. I have done some searching on the web and as far as I can see, the most Well, that *is* in the rw-FAQ -- so do we take it that as an R for Windows users you have yet to read it? amount of RAM under Windows I can expect to make use of is 4 Gb using Vista 64-bit whereas UNIX will make use of closer to the full amount of RAM. Is this correct? Is there a way to make the windows version of R use more of the available RAM? Or is going for a UNIX OS the better solution? See the rw-FAQ for info on 64-bit builds of R under Windows. Any advice would be very much appreciated! Fiona _ [[elided Hotmail spam]] au%2Fchannel%2Findex%2Easpx%3Ftrackingid%3D1046247_t=773166080_r=WL_TAGLINE_m=EXT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Predictive Analytics event Sept 24-25, Chicago
Hi, I wanted to make sure you were all aware of these upcoming events. There is a seminar in Predictive Analytics on Sept 24-25 in Chicago, and two following in D.C. (Oct.) and San Francisco (Nov.). This is intensive training for managers, marketers, and IT people who need to make sense of customer data to predict buying behavior, profit, etc. Past attendees have given rave reviews. You can find more info at http://www.predictionimpact.com/predictive-analytics-training.html, e-mail [EMAIL PROTECTED], or call (415) 683-1146. thanks --Elise Johnson, Prediction Impact -- View this message in context: http://www.nabble.com/Predictive-Analytics-event-Sept-24-25%2C-Chicago-tp19101661p19101661.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset grouped data with quantile and NA's
I can't quite seem to solve a problem subsetting a data frame. Here's a reproducible example. Given a data frame: dat - data.frame(fac = rep(c(a, b), each = 100), value = c(rnorm(130), rep(NA, 70)), other = rnorm(200)) What I want is a new data frame (with the same columns as dat) excluding the top 5% of value separately by a and b. For example, this produces the results I'm after in an array: sub - tapply(dat$value, dat$fac, function(x) x[x quantile(x, probs = 0.95, na.rm = TRUE)]) My difficulty is putting them into a data frame along with the other columns fac and other. Note that quantile will return different length vectors due to different numbers of NAs for a and b. There's something I'm just not seeing - can you help? Many thanks. David Carslaw - Institute for Transport Studies University of Leeds -- View this message in context: http://www.nabble.com/subset-grouped-data-with-quantile-and-NA%27s-tp19102795p19102795.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parabolic cylinder function
Dear all, I need your advice since I am looking for an implementation of the parabolic cylinder function in R. I found implemantations of the hypergemetric functions (the Whittaker and the confluent hypogeometric functions) in the package fAsianOptions but the parabolic cylinder function was unfortunately not there. Do you know of such implementation? Thank you very much for your advice. Cheers, Zoe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM-binomial logit link
Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find larger references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632). Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula for discovering possible problems. Thank you in advance for your help. Marina Fabrizio Cipollini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Censored Poisson Data
Dear list, I am wondering whether R allows to perform a Poisson regression on counting data which include censored observations, that is, observations of the form 2 events or more or less than 2 events or even 1 or 2 events. Can anyone give me a hint whether this is already possible and how to do it? Data of the form (obstime is the observation time, offset in Poisson regression): obs obstime cmin cmax 1 1 2 Inf 2 2 0 1 3 1 1 2 and so on. Best wishes, Justine -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple look up
Please take a look at the manuals before posting (DO read the posting guide!) Bart PS: yourdataframe[yourdataframe$TAZ==103,2] PDXRugger wrote: So i am hoping this solution is simple, which i beleive it is. I would like to look up a value in one column of a data set and display the corresponding value in the 2nd column. For example TAZVACANT ACRES 100 45 101 62 102 23 103 64 104 101 105 280 So if i want to find TAZ 103 it would give me 64 as the corresponding value. Hope you guys can help Cheers -- View this message in context: http://www.nabble.com/Simple-look-up-tp19098777p19103028.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv : double quoted numbers
On Thu, Aug 21, 2008 at 12:54 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: The problem is the commas, not the quotes: First thanks to everyone for taking out time to help me. As Mr. Gabor rightly corrected commas was my problem and not the double quotes (in numerical values). This solves my problem. But I have couple more questions. 1) Is it recommended to set default locale to C when working with R? I am asking this since, the file that I am reading is xls (but containing text only). The file command on Linux terminal returns ASCII English text, with very long lines, with CRLF line terminators. gsub fails on this file due to a particular value Cte d'Ivoire which is present in the file. This is when my locale is set to en_US.utf8 but when I set locale to C it (gsub) works. So is it okay to set default locale to C? 2) Other problem is escaping double quote in pipe and sed. I have tried almost all possible combination that came to my limited mind. a = scan(pipe(sed -e s/\//g WEOall.xls), sep=\t) a = scan(pipe('sed -e s/\//g WEOall.xls'), sep=\t) a = scan(pipe(sed -r s/\\//g WEOall.xls), sep=\t) a = scan(pipe(sed -e s///g WEOall.xls), sep=\t) a = scan(pipe('sed -e s///g WEOall.xls'), sep=\t) a = scan(pipe(paste('sed -e s/\//g WEOall.xls')),sep=\t) a = scan(pipe(paste('sed -e s///g WEOall.xls')),sep=\t) For all of the above I am getting sh: Syntax error: Unterminated quoted string sessionInfo() R version 2.6.2 (2008-02-08) i486-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks and Regards Aval S. Lines.raw - '1,001.231,008,000.456' Lines - readLines(textConnection(Lines.raw)) Lines - gsub(,, , Lines) read.table(textConnection(Lines)) V1 V2 1 1001.23 1008000 On Wed, Aug 20, 2008 at 2:27 PM, Aval Sarri [EMAIL PROTECTED] wrote: Hello; I am new user of R; so pardon me. I am reading a .txt file that has around 50+ numeric columns with '\t' as separator. I am using read.csv function along with colClasses but that fails to recognize double quoted numeric values. (My numeric values are something like 1,001.23; 1,008,000.456.) Basically read.csv fails with - scan() expected 'a real', got '1,044.059'. What I have tried and problems with them: 1) I tried scan and pipe but getting following error message; that is how do I replace all double quotes with nothing. I tired enclosing sed command in single quotes but that does not help. (Though the sed command works from shell) scan(pipe(sed -e s/\//g DataAll.txt), sep=\t) sh: Syntax error: Unterminated quoted string 2) On mailing list on solution I found was setAs() described here http://www.nabble.com/Re%3A--R--read.table()-and-scientific-notation-p6734890.html 3) Other than using as.is=TRUE and then doing as.numeric for numeric columns what is the solution? But then how do I efficiently convert 50+ columns to numeric using regular expression? That is all my numeric columns name starts with 'X' character, so how do I use sapply and/or regular expression to convert all columns starting with X to numeric? What is the alternate method to do so? Basically 2 and 3 works but which one is efficient and correct way to do this. (Also what is most efficient way to apply field level validation and conversion while reading a file? Does one has to read the file and only after that validation and conversion can happen?) Thanks for taking out time to read through the mail. Thanks and Regards -Aval __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] skipping values when reading binary files
Pieter Beck wrote: Hi, I would like to skip a preset number of values when reading in a binary file (readBin). Is there any way to do this? See ?seek Uwe Ligges kind regards, and thanks in advance for the help. Pieter Beck [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot with anchored bars
Paulo Barata wrote: Dear R list members, How to produce barplots anchored to the x-axis (not floating above the x-axis) with a box around? With both following codes, the lower horizontal line of the box is below the y = 0 line: # first code x - c(1,2,3,4) barplot(x,yaxs='i') box() # second code x - c(1,2,3,4) op - par(yaxs='i') barplot(x) box() par(op) You have to set the ylim in that case. If you read the barplot.default code, you will find that rAdj is some adjustment value that can only be omitted by specifying ylim. Uwe Ligges The parameter yaxs='i' does not seem to work with barplot. I use R version 2.7.1, running on Windows XP. Thank you very much. Paulo Barata Paulo Barata Fundacao Oswaldo Cruz - Oswaldo Cruz Foundation Rua Leopoldo Bulhoes 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: [EMAIL PROTECTED] Alternative e-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv : double quoted numbers
gsub fails on this file due to a particular value Cte d'Ivoire which is present in the file. This is when my locale is set to en_US.utf8 but when I set locale to C it (gsub) works. Have a look at ?file, especially the section on Encoding. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq
Hi, the chisq value for .05 level of signif. and 3 DF is 7.81 (one-sided). Is there a way to calculated it in R (giving level of signif. and DF)? Marco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on competing risk package cmprsk with time dependent covariate
Dear R users, I d like to assess the effect of treatment covariate on a disease relapse risk with the package cmprsk. However, the effect of this covariate on survival is time-dependent (assessed with cox.zph): no significant effect during the first year of follow-up, then after 1 year a favorable effect is observed on survival (step function might be the correct way to say that ?). For overall survival analysis I have used a time dependent Cox model which has confirmed this positive effect after 1 year. Now I m moving to disease relapse incidence and a similar time dependency seems to be present. what I d like to have is that: for patients without treatment the code for treatment covariate is always 0, and for patients who received treatment covariate I d like to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1 year. Correct me if I m wrong in trying to do so. First, I have run the following script (R2.7.1 under XPpro) according to previous advices: library(cmprsk) attach(LAMrelapse) fit1- crr(rel.t, rel.s, treatment, treatment, function(uft) cbind(ifelse(uft=1,1,0),ifelse(uft1,1,0)), failcode=1, cencode=0, na.action=na.omit, gtol-06, maxiter) fit1 where: rel.t = time to event (in years) rel.s = status , =1 if disease relapse, =2 if death from non disease related cause (toxicity of previous chemotherapy), =0 if alive not in relapse treatment = binary covariate (value: 0 or 1) representing the treatment to test (different from chemotherapy above, with no known toxicity) I have not yet added other covariates in the model. this script gave me the following result: fit1 - crr(relcmp.t, relcmp.s, treatment, treatment, function(uft) cbind(ifelse(uft = 1, 1, 0), ifelse(uft 1, 1, 0)), failcode = 1, cencode = 0, na.action = na.omit, gtol = 1e-006, maxiter = 10) fit1 convergence: TRUE coefficients: [1] -0.6808 0.7508 standard errors: [1] 0.2881 0.3644 two-sided p-values: [1] 0.018 0.039 ...That I dont understand at all since it looks like if treatment covariate had also a significant effect of the first period of time !? This is absolutely not the case. So I m surely wrong with a part of this script... cov2 and tf are pretty obscure for me in the help file of the package. I would really appreciate advices regarding these 2 terms. I was thinking that I might changed : cbind(ifelse(uft = 1, 1, 0), ifelse(uft 1, 1, 0) into:cbind(ifelse(uft = 1, 0, 1), ifelse(uft 1, 1, 0) But since I only have one covariate (treatment) to test, shouldnt I only write the following: fit1- crr(rel.t, rel.s, treatment, treatment, function(uft) ifelse(uft=1,0,1)), failcode=1, cencode=0, na.action=na.omit, gtol-06, maxiter) which gives me : fit1 convergence: TRUE coefficients: [1] 0.06995 -0.75080 standard errors: [1] 0.2236 0.3644 two-sided p-values: [1] 0.750 0.039 which, if I understand things correctly (I m not sure at all !) confirms that before 1 year, the effect of treatment covariate is not significant, but is significant after 1 year of follow up. But there I m again not sure of the result I obtain... any help would be greatly appreciated with cov2 and tf thanks for if you have some time for this, Philippe Guardiola _ o.fr [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM-binomial logit link
Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632). Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula to discover possible problems. Thank you in advance for your help. Marina Fabrizio Cipollini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Retrieving sample points
Hi everyone. My problem is about multivariate analysis. Given a multivariate data X, of dimension (n by p), n is the number of observations while p is the number of variables. I wanted a sample of size p+1 that will have its product of the eigen values of the covariance matrix to be the minimum among all the possible samples (n combination (p+1)). I have written a syntax with worked well, but the problem is that it does not work when the dimension of X is large (when n is greater than 25). I think the problem is memory size, is there anyway I can maximize the memory. Here is the syntax that I used. bestunits - function(x){ + n - nrow(x) + p - ncol(x) + h - p+1 + nchoosek - function(n,h){ + factorial(n)/(factorial(h+1)*factorial(n-h-1))} + vec11 - matrix(nrow=nchoosek(n,h), ncol=p) + vec12 - matrix(nrow=nchoosek(n,h), ncol=h) + for (i in 1:nchoosek(n,h)){ + flag - sample(1:n, h, replace=FALSE) + z - x[flag,] + y - eigen(var(z), only.values=TRUE)$values + vec11[i,] - y + vec12[i,] - flag + } + product - apply(vec11,1,prod) + best - which.min(product) + sample - vec12[best,] + bestsample - x[sample,] + return(bestsample) + } Thanks Oyeyemi, Gafar Matanmi Department of Statistics University of Ilorin Ilorin, Nigeria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] psychometric functions
Paul, thanks a lot for your advice! I got that kuss et al. paper and started to read, this is more than I had expected! Very cool paper, and an algorithm implemented in R that seems to be even superior to psignifit in matlab. So thanks again! mario __ Mario Maiworm Biological Psychology and Neuropsychology University of Hamburg Von-Melle-Park 11 D-20146 Hamburg Phone: +49 40 42838 8265 Fax: +49 40 42838 6591 http://bpn.uni-hamburg.de/Maiworm_e.html http://cinacs.org __ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of Paul Artes Sent: Thursday, August 21, 2008 5:37 PM To: r-help@r-project.org Subject: Re: [R] psychometric functions There is a nice paper by Yssaad-Fesselier and Knoblauch on Modelling Psychometric Functions in R. http://hal.archives-ouvertes.fr/docs/00/13/17/99/PDF/B125.pdf You might also be interested in this: http://www.journalofvision.org/5/5/8/article.aspx which comes from the same group as the psignifit toolbox for matlab (methinks), but is a step ahead. Best wishes Paul -- View this message in context: http://www.nabble.com/psychometric- functions-tp19086590p19091317.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] swap
Hello everybody, Â I wonder if there is any swap function in R that does the following: x - seq(1,10,12) x1 - swap(x) x1 1 8 3 4 5 6 7 2 10 Thank you very much! Â Amor __ Schutz gegen Massenmails. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very confused with class
Hi Rich, Richard M. Heiberger wrote: Dan, The real problem is the use of csv files. csv files don't handle missing values (#VALUE is most likely from Excel), dates, or other complications very well. Read your Excel file directly into R with one of the packages designed specifically for that purpose. I recommend RExcel (Windows only) which allows complete two-way communication between R and Excel. Missing values and dates are handled correctly. You can download the RExcelInstaller package from CRAN. I'm sure RExcel is an excellent technology. However, it is an unnecessarily complex technology in this instance. What I was trying to do was help the original poster read in tabular data stored in a standard text format, which is a fundamental skill for any R programmer. In general, I would encourage people (beginners especially) to avoid the use of hi-tech solutions, when simple text-based solutions suffice. But when people do need to have more sophisticated integration of R and e.g. Excel, it's nice that the tools exist. Dan Richard M. Heiberger wrote: Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Very-confused-with-class-tp19090246p19104343.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on competing risk package cmprsk with time dependent covariate
Hello, Something i don't understand in your question. Is treatment a time-dependent covariate? That is, do patients receive the treatment at the beginning of the study or later? cmprsk cannot handle time-dependent covariates. But if treatment is a baseline covariate, but has a time-varying effect (i.e. does the subdistribution hazard ratio varies with time?), your solution to assess that is weird, because you will transform your baseline covariate into a time-dependent one, thus considering all the patients to receive no treatment the first year. For sure, the treatment wont have any effect for the first year. To assess a time-varying effect on competing risks, i would either follow the cmprsk documentation, including an interaction with functions of time, or use the comp.risk function in the timereg package, which fits more flexible models for the cumulative incidence functions. Best regards, Arthur Allignol Philippe Guardiola wrote: Dear R users, I d like to assess the effect of treatment covariate on a disease relapse risk with the package cmprsk. However, the effect of this covariate on survival is time-dependent (assessed with cox.zph): no significant effect during the first year of follow-up, then after 1 year a favorable effect is observed on survival (step function might be the correct way to say that ?). For overall survival analysis I have used a time dependent Cox model which has confirmed this positive effect after 1 year. Now I m moving to disease relapse incidence and a similar time dependency seems to be present. what I d like to have is that: for patients without treatment the code for treatment covariate is always 0, and for patients who received treatment covariate I d like to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1 year. Correct me if I m wrong in trying to do so. First, I have run the following script (R2.7.1 under XPpro) according to previous advices: library(cmprsk) attach(LAMrelapse) fit1- crr(rel.t, rel.s, treatment, treatment, function(uft) cbind(ifelse(uft=1,1,0),ifelse(uft1,1,0)), failcode=1, cencode=0, na.action=na.omit, gtol-06, maxiter) fit1 where: rel.t = time to event (in years) rel.s = status , =1 if disease relapse, =2 if death from non disease related cause (toxicity of previous chemotherapy), =0 if alive not in relapse treatment = binary covariate (value: 0 or 1) representing the treatment to test (different from chemotherapy above, with no known toxicity) I have not yet added other covariates in the model. this script gave me the following result: fit1 - crr(relcmp.t, relcmp.s, treatment, treatment, function(uft) cbind(ifelse(uft = 1, 1, 0), ifelse(uft 1, 1, 0)), failcode = 1, cencode = 0, na.action = na.omit, gtol = 1e-006, maxiter = 10) fit1 convergence: TRUE coefficients: [1] -0.6808 0.7508 standard errors: [1] 0.2881 0.3644 two-sided p-values: [1] 0.018 0.039 ...That I dont understand at all since it looks like if treatment covariate had also a significant effect of the first period of time !? This is absolutely not the case. So I m surely wrong with a part of this script... cov2 and tf are pretty obscure for me in the help file of the package. I would really appreciate advices regarding these 2 terms. I was thinking that I might changed : cbind(ifelse(uft = 1, 1, 0), ifelse(uft 1, 1, 0) into:cbind(ifelse(uft = 1, 0, 1), ifelse(uft 1, 1, 0) But since I only have one covariate (treatment) to test, shouldnt I only write the following: fit1- crr(rel.t, rel.s, treatment, treatment, function(uft) ifelse(uft=1,0,1)), failcode=1, cencode=0, na.action=na.omit, gtol-06, maxiter) which gives me : fit1 convergence: TRUE coefficients: [1] 0.06995 -0.75080 standard errors: [1] 0.2236 0.3644 two-sided p-values: [1] 0.750 0.039 which, if I understand things correctly (I m not sure at all !) confirms that before 1 year, the effect of treatment covariate is not significant, but is significant after 1 year of follow up. But there I m again not sure of the result I obtain... any help would be greatly appreciated with cov2 and tf thanks for if you have some time for this, Philippe Guardiola _ o.fr [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] swap
I wonder if there is any swap function in R that does the following: x - seq(1,10,12) This just makes x equal to 1. I'm guessing you meant something like x - 1:10 x1 - swap(x) x1 1 8 3 4 5 6 7 2 10 It's not terribly clear what you want swap to do. You can reorder the elements of x using sample, e.g. sample(x) [1] 6 10 8 4 2 5 1 9 3 7 Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coordinate systems for geostatistics in R
Hi, I read somewhere that when carrying out geostatistical analysis in R you should not use latitude and longitude...can anyone expand on this a little for me, and what would be the best coordinate system to use? I have my data in a geographic coordinate system, WGS84, decimal degreesis this the wrong format for such analyses? I have also converted my data in the UTM projection and so have it in metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N). If I was to use the UTM coordinates, should I be using the actual coordinates in metres, or should I convert this into an arbitrary coordinate system (i.e. from 0 - 1) somehow? I have noticed that running an analysis on the data gives different results depending on which type of system you use, so I want to make sure I have the correct system. I should also probably note that I am a geostatistical novice! Thanks, -- View this message in context: http://www.nabble.com/Coordinate-systems-for-geostatistics-in-R-tp19104598p19104598.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large data sets with R (binding to hadoop available?)
Hi Apart from database interfaces such as sqldf which Gabor has mentioned, there are also packages specifically for handling large data: see the ff package, for instance. I am currently playing with parallelizing R computations via Hadoop. I haven't looked at PIG yet though. Rory -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland Rau Sent: 21 August 2008 20:04 To: Avram Aelony Cc: r-help@r-project.org Subject: Re: [R] Large data sets with R (binding to hadoop available?) Hi Avram Aelony wrote: Dear R community, I find R fantastic and use R whenever I can for my data analytic needs. Certain data sets, however, are so large that other tools seem to be needed to pre-process data such that it can be brought into R for further analysis. Questions I have for the many expert contributors on this list are: 1. How do others handle situations of large data sets (gigabytes, terabytes) for analysis in R ? I usually try to store the data in an SQLite database and interface via functions from the packages RSQLite (and DBI). No idea about Question No. 2, though. Hope this helps, Roland P.S. When I am sure that I only need a certain subset of large data sets, I still prefer to do some pre-processing in awk (gawk). 2.P.S. The size of my data sets are in the gigabyte range (not terabyte range). This might be important if your data sets are *really large* and you want to use sqlite: http://www.sqlite.org/whentouse.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. Authorised and regulated by the Financial Services Authority This e-mail message is confidential and for use by the=2...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] call perl
Maybe ?system Might be what you need here? -Rory -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jay an Sent: 22 August 2008 03:08 To: r-help@r-project.org Subject: [R] call perl Hi, It may be the old question. can anyone tell me how to call perl in R? thanks Y. [[alternative HTML version deleted]] *** The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. Authorised and regulated by the Financial Services Authority This e-mail message is confidential and for use by the=2...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help needed for HWE.exact in library genetics
You could follow the advice given when loading the library (see the code you posted for details) and use the enhanced genetics packages to cross-validate the results (ideally you should get the same answer). The results not weird though. Your working with SNPs and having a homozygote with a frequency of zero isn't that unlikely, it depends on the minor allele frequency (MAF). In this instance its the controls you appear to be concerned about, and for this locus the MAF of the minor allele is 0.0145, so in a sample of 173 individuals you would expect to see 0.0363 individuals with the recessive genotype (basic Mendelian genetics of 173 * 0.0145^2). Thus at best you might expect to see one person in a sample of that size, but its not surprising that you've not seen any. You may also find the following references of interest... Guangyong Zou, Allan Donner The Merits of Testing Hardy-Weinberg Equilibrium in the Analysis of Unmatched Case-Control Data: A Cautionary Note Annals of Human Genetics Volume 70, Issue 6, Pages 923-933 (basically advocates the use of a robust test for association such as the trend test over the two-stage design of testing for HWeqm and then testing for association). There is follow up discussion of the article in Yik Y. Teo, Andrew E. Fry, Taane G. Clark, E. S. Tai, Mark Seielstad On the Usage of HWE for Identifying Genotyping Errors Annals of Human Genetics Volume 71 Issue 5 , Pages 701 - 703 ...and a rejoinder from Zou and Donner... Guangyong Zou, Allan Donner The Reply of Zou Donner to Teo's Commentary on their Manuscript (p 704-704) Annals of Human Genetics Volume 71 Issue 5 , Pages 701 - 703 Neil Sun, Ying [BSD] - HGD wrote: library( genetics ) NOTE: THIS PACKAGE IS NOW OBSOLETE. The R-Genetics project has developed an set of enhanced genetics packages to replace 'genetics'. Please visit the project homepage at http://rgenetics.org for informtion. -- View this message in context: http://www.nabble.com/help-needed-for-HWE.exact-in-library-%22genetics%22-tp19100974p19105112.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bias reduction method
Dear all, I'm doing a simulation study for an MM1 queue and I would like to know if there are bias correction methods in R for time series (estimations of warm-up points or something else). Hope you can help me. Yahoo! MTV Blog Rock gt;¡Cuéntanos tu historia, inspira una canción y gánate un viaje a los Premios MTV! Participa aquà http://mtvla.yahoo.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot with anchored bars
On Thu, 2008-08-21 at 15:47 -0300, Paulo Barata wrote: Dear R list members, How to produce barplots anchored to the x-axis (not floating above the x-axis) with a box around? With both following codes, the lower horizontal line of the box is below the y = 0 line: # first code x - c(1,2,3,4) barplot(x,yaxs='i') box() # second code x - c(1,2,3,4) op - par(yaxs='i') barplot(x) box() par(op) The parameter yaxs='i' does not seem to work with barplot. I use R version 2.7.1, running on Windows XP. Hi Paolo, Have a look at the barp function in the plotrix package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple generation of artificial data with defined features
Dear R-colleagues, I am quite a newbie to R fighting my stupidity to solve a probably quite simple problem of generating artificial data with defined features. I am conducting a study of inter-observer-agreement in child-bronchoscopy. One of the most important measures is Kappa according to Fleiss, which is very comfortable available in R through the irr-package. Unfortunately medical doctors like me don't really understand much of statistics. Therefore I'd like to give the reader an easy understandable example of Fleiss-Kappa in the Methods part. To achieve this, I obtained a table with the results of the German election from 2005: partynumber of votespercent SPD1619466534,2 CDU1313674027,8 CSU34943097,4 Gruene38383268,1 FDP46481449,8 PDS41181948,7 I want to show the agreement of voters measured by Fleiss-Kappa. To calculate this with the kappam.fleiss-function of irr, I need a data.frame like this: (id of 1st voter) (id of 2nd voter) party spd cdu Of course I don't plan to calculate this with the million of cases mentioned in the table above (I am working on a small laptop). A division by 1000 would be more than perfect for this example. The exact format of the table is generally not so important, as I could reshape nearly every format with the help of the reshape-package. Unfortunately I could not figure out how to create such a fictive/artificial dataset as described above. Any data.frame would be nice, that keeps at least the percentage. String-IDs of parties could be substituted by numbers of course (would be even better for function kappam.fleiss in irr!). I would appreciate any kind of help very much indeed. Greetings from Munich, Felix Mueller-Sarnowski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] random effects in lmer
Dear all I am running several generalized mixed model using lmer. The models typical look like this: model2xx-lmer(numbers~Treatment+b+(1|Genotype)+(1|Field)+(1|Genotype:Treatment), family=quasipoisson) All factors are categorical. And the output looks like this: Generalized linear mixed model fit using Laplace formula:numbers~Treatment+Dead+(1|Genotype)+(1|Field)+(1|Genotype:Treatment), family=quasipoisson) Family: quasipoisson(log link) AIC BIC logLik deviance 1033 1051 -509.4 1019 Random effects: Groups NameVariance Std.Dev. Genotype:Treatment (Intercept) 0.40555 0.63683 Genotype(Intercept) 1.16642 1.08001 Field(Intercept) 1.23738 1.11238 Residual 9.47740 3.07854 number of obs: 100, groups: Genotype1:Treatment1, 14; Genotype1, 5; Felt1, 4 Fixed effects: Estimate Std. Error t value (Intercept) 3.200610.80010 4.000 Treatment a -0.054820.42619 -0.129 Treatment c 0.083160.46395 0.179 Dead No 0.276040.14873 1.856 Correlation of Fixed Effects: (Intr) Tretmnt1 Trtmnt1c Treatment a -0.247 Treatment c -0.239 0.450 Dead No -0.111 -0.1320.003 I need some help to evaluate the importance of the random factors. The random factor of interest is Genotype. I have tried to delete random factors from the model and comparing the model with the original model by log likelihood-ratio statistics. Is this an appropriate method for testing the random factors in lmer? Is it possible to evaluate how much of the total variation the random factor Genotype explains? I have am a new user in lmer and my questions is probably very naive. But I appreciate any help. Thanks for the help. Line -- Line Johansen Department of Biology Norwegian University of Natural Science and Technology Høgskoleringen 15 No-7491 Trondheim Phone: 73 55 12 94 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset grouped data with quantile and NA's
This will also remove the NAs from the output; you will have to change it to also keep the NAs. Wasn't sure what you wanted to do with them. dat - data.frame(fac = rep(c(a, b), each = 100), value = c(rnorm(130), rep(NA, 70)), other = rnorm(200)) # split the data x.s - split(dat, dat$fac, drop=TRUE) # process the quantiles x.l - lapply(x.s, function(.fac){ # remove NAs from the output -- need to change if you want to keep NAs .fac[(!is.na(.fac$value)) (.fac$value = quantile(.fac$value, prob=0.95, na.rm=TRUE)),] }) # put back into a dataframe dat.new - do.call(rbind, x.l) On Fri, Aug 22, 2008 at 3:35 AM, David Carslaw [EMAIL PROTECTED] wrote: I can't quite seem to solve a problem subsetting a data frame. Here's a reproducible example. Given a data frame: dat - data.frame(fac = rep(c(a, b), each = 100), value = c(rnorm(130), rep(NA, 70)), other = rnorm(200)) What I want is a new data frame (with the same columns as dat) excluding the top 5% of value separately by a and b. For example, this produces the results I'm after in an array: sub - tapply(dat$value, dat$fac, function(x) x[x quantile(x, probs = 0.95, na.rm = TRUE)]) My difficulty is putting them into a data frame along with the other columns fac and other. Note that quantile will return different length vectors due to different numbers of NAs for a and b. There's something I'm just not seeing - can you help? Many thanks. David Carslaw - Institute for Transport Studies University of Leeds -- View this message in context: http://www.nabble.com/subset-grouped-data-with-quantile-and-NA%27s-tp19102795p19102795.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining multiple datasets
Here is the way that I usually do it using lapply: myData - lapply(list.files(pattern=.RWL$), function(.file) read.rel(.file, header=TRUE)) # combine into a single dataframe myDF - do.call(rbind, myData) On Thu, Aug 21, 2008 at 10:42 PM, Alison Macalady [EMAIL PROTECTED] wrote: Hi, I've tried to figure this out using Intro to R and help(), to no avail - I am new at this. I'm trying to write a script that will read multiple files from a directory and then merge them into a single new data frame. The original data are in a tree-ring specific format, and so I've first used a function (read.rwl) from the dplR package to read each file, translate each into a more intuitive time series format, and then put each new data frame into a single object, demo.Rwl: demo.Rwl - list.files(pattern=.RWL$); for(i in demo.Rwl) { x - read.rwl(i, header=TRUE); assign(print(i, quote=FALSE), x)} This part seems to work. Now, I want to be able to put all of the individual data frames contained in demo.Rwl into a single data frame, merging by rows (rows are calendar years in the time series). I think I know how to do this by typing each data set name into a merge: merge(x,y..., by.x=0, all=TRUE ) However, I'd like to be able to script something that will save me the time of typing in all of the individual data set names, as I will be repeating this operation many times with many different numbers of original input files. Is there a way to code a merge such that every individual data frame contained in demo.Rwl (a different number every time) gets combined into a single data set? Thanks for the help! Ali __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nesting environments
Hello, I got the hint to use nesting environments (for a GUI problem), but I'm not sure whether I understand what he means (and I did not really succeed with reading documentation - probably I looked for the wrong keywords?). That's what he wrote: Another good way to control the scoping is nesting environments (e.g. through closures). That is, your object does not need to be global (in the workspace) but it can be in a parent environment. In fact, global variables won't even work when you put your code into a namespaced pacakge, because the namespace will be locked. Can anybody explain it (probably with a simple example) That would be great! Thanks, Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] swap
Hello Richie, I would like to do three (or k) swap steps in each step just 2 ID recursive swaping x - 1:10 swap - function(x){  a - sample(x,2)  x[x==a[1]] - swap[2]  x[x==a[2]] - swap[1]  return(x)  }  swap(swap(swap(x))) - mix  Is this possible? Thanks you in advance! Amor __ Schutz gegen Massenmails. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : Help on competing risk package cmprsk with time dependent covariate
Hello again, I m trying to use timereg package as suggested (R2.7.1 on XP Pro). here is my script based on the example from timereg for a fine gray model in which relt = time to event, rels = status 0/1/2 2=competing, 1=event of interest, 0=censored random = covariate I want to test  library(timereg) rel-read.csv(relapse2.csv, header = TRUE, sep = ,, quote=\, dec=., fill = TRUE, comment.char=) names(rel) names(rel) [1] upn   rels  relt  random times-rel$relt[rel$rels==1] fg1-comp.risk(Surv(relt,rels0)~const(random),rel,rel$rels,times[-1],causeS=1,resample.iid=1,model=prop) summary(fg) fg2-comp.risk(Surv(relt,rels0)~random,rel,rel$rels,times[-1],causeS=1,resample.iid=1,model=prop) summary(fg) Question1: can you confirm that fg1 evaluates random with the hypothesis that it has a constant hazard over time and that it is assumed as being time dependent in fg2 ? Question 2: both summaries give me the following that I dont understand at all, is there a mistake in my script ? Competing risks Model Test for nonparametric terms Test for non-significant effects            sup| hat B(t)/SD(t) | p-value H_0: B(t)=0 (Intercept)                    0                  0 random                      C2  0                  0 Test for time invariant effects            supremum test p-value H_0: B(t)=b t (Intercept)            0                    0 random                 0                    0            int (b(t)-g(t,gam))^2dt p-value H_0:constant effect (Intercept)                      0                           0 random                           0                           0  Call: comp.risk(Surv(relt, rels 0) ~ random, rel, rel$rels, times[-1],    causeS = 1, resample.iid = 1, model = prop) any help is very welcome regards Philippe G -E-mail d'origine- De : Arthur Allignol [EMAIL PROTECTED] A : Philippe Guardiola [EMAIL PROTECTED] Cc : R-help@r-project.org; phgua [EMAIL PROTECTED] Envoyé le : Vendredi, 22 Août 2008 11:53 Sujet : Re: [R] Help on competing risk package cmprsk with time dependent covariate Hello,  Something i don't understand in your question. Is treatment a time-dependent covariate? That is, do patients receive the treatment at the beginning of the study or later?  cmprsk cannot handle time-dependent covariates.  But if treatment is a baseline covariate, but has a time-varying effect (i.e. does the subdistribution hazard ratio varies with time?), your solution to assess that is weird, because you will transform your baseline covariate into a time-dependent one, thus considering all the patients to receive no treatment the first year. For sure, the treatment wont have any effect for the first year. To assess a time-varying effect on competing risks, i would either follow the cmprsk documentation, including an interaction with functions of time, or use the comp.risk function in the timereg package, which fits more flexible models for the cumulative incidence functions.  Best regards, Arthur Allignol  Philippe Guardiola wrote: Dear R users, I d like to assess the effect of treatment covariate on a disease relapse risk with the package cmprsk. However, the effect of this covariate on survival is time-dependent (assessed with cox.zph): no significant effect during the first year of follow-up, then after 1 year a favorable effect is observed on survival (step function might be the correct way to say that ?). For overall survival analysis I have used a time dependent Cox model which has confirmed this positive effect after 1 year. Now I m moving to disease relapse incidence and a similar time dependency seems to be present. what I d like to have is that: for patients without treatment the code for treatment covariate is always 0, and for patients who received treatment covariate I d like to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1 year. Correct me if I m wrong in trying to do so. First, I have run the following script (R2.7.1 under XPpro) according to previous advices: library(cmprsk) attach(LAMrelapse) fit1- crr(rel.t, rel.s, treatment, treatment, function(uft) cbind(ifelse(uft=1,1,0),ifelse(uft1,1,0)), failcode=1, cencode=0, na.action=na.omit, gtol-06, maxiter) fit1 where: rel.t = time to event (in years) rel.s = status , =1 if disease relapse, =2 if death from non disease related cause (toxicity of previous chemotherapy), =0 if alive  not in relapse treatment = binary covariate (value: 0 or 1)
Re: [R] swap
Is this what you want: swap - function(z){ + a - sample(length(z), 2) + z[a] - z[rev(a)] + z + } swap(1:10) [1] 2 1 3 4 5 6 7 8 9 10 swap(1:10) [1] 5 2 3 4 1 6 7 8 9 10 swap(1:10) [1] 8 2 3 4 5 6 7 1 9 10 swap(1:10) [1] 2 1 3 4 5 6 7 8 9 10 swap(1:10) [1] 4 2 3 1 5 6 7 8 9 10 swap(swap(swap(1:10))) [1] 8 10 3 5 4 6 7 1 9 2 swap(swap(swap(1:10))) [1] 8 2 7 4 5 6 3 10 9 1 On Fri, Aug 22, 2008 at 8:57 AM, amor Gandhi [EMAIL PROTECTED] wrote: Hello Richie, I would like to do three (or k) swap steps in each step just 2 ID recursive swaping x - 1:10 swap - function(x){ a - sample(x,2) x[x==a[1]] - swap[2] x[x==a[2]] - swap[1] return(x) } swap(swap(swap(x))) - mix Is this possible? Thanks you in advance! Amor __ Schutz gegen Massenmails. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nesting environments
Look at the example section of ?new.env Also the proto package on CRAN defines proto objects (which are just environments with slightly different semantics) and in the Applications section of the home page has links to examples of using proto objects to creates GUIs with gWidgets: http://r-proto.googlecode.com On Fri, Aug 22, 2008 at 8:54 AM, Antje [EMAIL PROTECTED] wrote: Hello, I got the hint to use nesting environments (for a GUI problem), but I'm not sure whether I understand what he means (and I did not really succeed with reading documentation - probably I looked for the wrong keywords?). That's what he wrote: Another good way to control the scoping is nesting environments (e.g. through closures). That is, your object does not need to be global (in the workspace) but it can be in a parent environment. In fact, global variables won't even work when you put your code into a namespaced pacakge, because the namespace will be locked. Can anybody explain it (probably with a simple example) That would be great! Thanks, Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nesting environments
Thanks a lot! I'll give it a closer look :-) Ciao, Antje Gabor Grothendieck schrieb: Look at the example section of ?new.env Also the proto package on CRAN defines proto objects (which are just environments with slightly different semantics) and in the Applications section of the home page has links to examples of using proto objects to creates GUIs with gWidgets: http://r-proto.googlecode.com On Fri, Aug 22, 2008 at 8:54 AM, Antje [EMAIL PROTECTED] wrote: Hello, I got the hint to use nesting environments (for a GUI problem), but I'm not sure whether I understand what he means (and I did not really succeed with reading documentation - probably I looked for the wrong keywords?). That's what he wrote: Another good way to control the scoping is nesting environments (e.g. through closures). That is, your object does not need to be global (in the workspace) but it can be in a parent environment. In fact, global variables won't even work when you put your code into a namespaced pacakge, because the namespace will be locked. Can anybody explain it (probably with a simple example) That would be great! Thanks, Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate systems for geostatistics in R
imicola schreef: Hi, I read somewhere that when carrying out geostatistical analysis in R you should not use latitude and longitude...can anyone expand on this a little for me, and what would be the best coordinate system to use? I have my data in a geographic coordinate system, WGS84, decimal degreesis this the wrong format for such analyses? I have also converted my data in the UTM projection and so have it in metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N). If I was to use the UTM coordinates, should I be using the actual coordinates in metres, or should I convert this into an arbitrary coordinate system (i.e. from 0 - 1) somehow? I have noticed that running an analysis on the data gives different results depending on which type of system you use, so I want to make sure I have the correct system. I should also probably note that I am a geostatistical novice! Thanks, Hi, I use the gstat package for geostatistics. For doing the analysis I don't think it is necessary to convert to UTM. But maybe just do it to be on the safe side. If you use the spatial objects provided by the sp-package (http://cran.r-project.org/web/packages/sp/vignettes/sp.pdf) you transform your data to other projections using the spTransform package. Questions regarding geostatistics and spatial data will result in more answers on the r-sig-geo list. cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +31302535773 Fax:+31302531145 http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] swap
Hello Richie, I would like to do three (or k) swap steps in each step just 2 ID recursive swaping x - 1:10 swap - function(x){ a - sample(x,2) x[x==a[1]] - swap[2] x[x==a[2]] - swap[1] return(x) } swap(swap(swap(x))) - mix I tried my best with a response before, but if you want a sensible answer you are going to have to try harder explaining what you really want. What do you mean by 'swap step'? If you want to swap the position of two elements in a vector (as I suspect you might) then which positions do you want swapping? Do you specify them yourself (as inputs to the swap function perhaps), or should they be randomly generated? If you provide more context (the rest of your code, what you are trying to achieve etc.) the help will be better. Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to read malformed csv files with read.table?
Hi, how do I read files that have two header fields less than they have columns? The easiest solution would be to insert one or two additional header fields, but I have a lot of files and that would be quite a lot of awful work. Any ideas on how to solve that problem? ### R stuff: read.table(myfile.CSV, sep = \t, header = T) Error in read.table(myfile.CSV, sep = \t, : more columns than column names count.fields(myfile.CSV, sep = \t) [1] 10 12 12 12 12 12 12 12 12 12 12 [...] ### ugly sample (Exported by SDL DataTable component): time/ms C550.KMS Cyt_b559.KMS Cyt_b563.KMS Cyt_f_.KMS P515FR.KMS Scatt.KMS Zea2.KMS PC P700 0 Point1 -599.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 Point2 -598.000 -0.012 -0.013 0.040 0.013 0.027 0.010 0.022 0.000 0.000 0 Point3 -596.500 -0.015 -0.015 0.044 0.020 0.025 0.010 0.033 0.000 0.000 [...] Cheers, Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : Help on competing risk package cmprsk with time dependent covariate
Hello again, I m trying to use timereg package as you suggested (R2.7.1 on XP Pro). here is my script based on the example from timereg for a fine gray model in which relt = time to event, rels = status 0/1/2 2=competing, 1=event of interest, 0=censored random = covariate I want to test library(timereg) rel-read.csv(relapse2.csv, header = TRUE, sep = ,, quote=\, dec=., fill = TRUE, comment.char=) names(rel) names(rel) [1] upnrels relt random times-rel$relt[rel$rels==1] fg1-comp.risk(Surv(relt,rels0)~const(random),rel,rel$rels,times[-1],causeS=1,resample.iid=1,model=prop) summary(fg) fg2-comp.risk(Surv(relt,rels0)~random,rel,rel$rels,times[-1],causeS=1,resample.iid=1,model=prop) summary(fg) Question1: can you confirm that fg1 evaluates random with the hypothesis that it has a constant hazard over time and that it is assumed as being time dependent in fg2 ? Question 2: both summaries give me the following that I dont understand at all, is there a mistake in my script ? Competing risks Model Test for nonparametric terms Test for non-significant effects sup| hat B(t)/SD(t) | p-value H_0: B(t)=0 (Intercept) 0 0 random 0 0 Test for time invariant effects supremum test p-value H_0: B(t)=b t (Intercept) 0 0 random 0 0 int (b(t)-g(t,gam))^2dt p-value H_0:constant effect (Intercept) 00 random00 Call: comp.risk(Surv(relt, rels 0) ~ random, rel, rel$rels, times[-1], causeS = 1, resample.iid = 1, model = prop) any help is very welcome regards Philippe G - Message d'origine De : Arthur Allignol [EMAIL PROTECTED] à : Philippe Guardiola [EMAIL PROTECTED] Cc : R-help@r-project.org; [EMAIL PROTECTED] Envoyé le : Vendredi, 22 Août 2008, 11h53mn 42s Objet : Re: [R] Help on competing risk package cmprsk with time dependent covariate Hello, Something i don't understand in your question. Is treatment a time-dependent covariate? That is, do patients receive the treatment at the beginning of the study or later? cmprsk cannot handle time-dependent covariates. But if treatment is a baseline covariate, but has a time-varying effect (i.e. does the subdistribution hazard ratio varies with time?), your solution to assess that is weird, because you will transform your baseline covariate into a time-dependent one, thus considering all the patients to receive no treatment the first year. For sure, the treatment wont have any effect for the first year. To assess a time-varying effect on competing risks, i would either follow the cmprsk documentation, including an interaction with functions of time, or use the comp.risk function in the timereg package, which fits more flexible models for the cumulative incidence functions. Best regards, Arthur Allignol Philippe Guardiola wrote: Dear R users, I d like to assess the effect of treatment covariate on a disease relapse risk with the package cmprsk. However, the effect of this covariate on survival is time-dependent (assessed with cox.zph): no significant effect during the first year of follow-up, then after 1 year a favorable effect is observed on survival (step function might be the correct way to say that ?). For overall survival analysis I have used a time dependent Cox model which has confirmed this positive effect after 1 year. Now I m moving to disease relapse incidence and a similar time dependency seems to be present. what I d like to have is that: for patients without treatment the code for treatment covariate is always 0, and for patients who received treatment covariate I d like to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1 year. Correct me if I m wrong in trying to do so. First, I have run the following script (R2.7.1 under XPpro) according to previous advices: library(cmprsk) attach(LAMrelapse) fit1- crr(rel.t, rel.s, treatment, treatment, function(uft) cbind(ifelse(uft=1,1,0),ifelse(uft1,1,0)), failcode=1, cencode=0, na.action=na.omit, gtol-06, maxiter) fit1 where: rel.t = time to event (in years) rel.s = status , =1 if disease relapse, =2 if death from non disease related cause (toxicity of previous chemotherapy), =0 if alive not in relapse treatment = binary covariate (value: 0 or 1) representing the treatment to test (different from chemotherapy above, with no known toxicity) I have not yet added other covariates in the model. this script gave me the following result: fit1 - crr(relcmp.t, relcmp.s, treatment, treatment, function(uft) cbind(ifelse(uft = 1, 1, 0), ifelse(uft 1, 1, 0)), failcode = 1, cencode = 0, na.action = na.omit, gtol = 1e-006, maxiter = 10) fit1 convergence: TRUE coefficients: [1] -0.6808 0.7508
[R] draw a function over a histogram
Sorry, I have some troubles with the graph device. How can I draw a function over a histogram? Thank's so much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mapping a vector into predefined bins
Hi, I have a mapping M consisting of a vector of bin centers (say 0.01, 0.02, 0.03 ..., 1) with associated values (say red, orange, yellow, ...), stored as a 2-column matrix with column 1 the bin center and column 2 the value. I also have a vector D of data (say 0.9056, 0.4118, 0.2162, ...). I want to map each datum in D to the value associated with its bin in M (say 0.0111- red, 0.0198 - orange, ...) . Does R have a built-in function for this? Thanks for tips/suggestions. +glenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] filtering out data
Greetings, Apologies for such a simple question, but I have been trying to figure this out for a few days now (I'm quite new to R), have read manuals, list posts, etc., and not finding precisely what I need. I am accustomed to working in Stata, but I am currently working through the multilevel package right now. In Stata, I would include the statement if model1 == 1 at the end of my regression statement, but I can't seem to find the correct syntax for doing the same thing in R. I have successfully converted Stata datasets, merged, aggregated, run lm, etc., but this simple task is evading me entirely. Here's what I have tried (and various iterations): if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) I get this: Warning message: In if (merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv + : the condition has length 1 and only the first element will be used What it seems to do is just run lm on the first opportunities of model1==1, but then doesn't try it again. I tried sorting so that the model1==1 appear first, but that doesn't seem to be a great solution either. I'll just go back into Stata and create separate new datasets if I have to, but there HAS to be a more elegant way. Thank you for ANY feedback! Kate Kate Rohrbaugh Independent Project Analysis, Inc. 44426 Atwater Drive Ashburn, VA 20147 office - 703.726.5465 fax - 703.729.8301 email - [EMAIL PROTECTED] website - www.ipaglobal.com ( http://www.ipaglobal.com/ ) HTMLHEAD META http-equiv=Content-Type content=text/html; charset=iso-8859-15 META content=MSHTML 6.00.5730.11 name=GENERATOR/HEAD BODY style=MARGIN: 4px 4px 1px; FONT: 12pt Comic Sans MS; COLOR: #00 DIVnbsp;/DIV DIVnbsp;/DIV DIV HR /DIV DIVFONT face=Arial size=1This email message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. This communication may contain material protected by legal privileges. If you are not the intended recipient or person responsible for delivering this confidential communication to the intended recipient, you have received this communication in error; any review, use, dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is created, received, or sent on its network. If you have received this confidential communication in error, please notify the sender immediately by reply email message and permanently delete the original message. Thank you for your cooperation./FONT/DIV/BODY/HTML [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] filtering out data
Greetings, Apologies for such a simple question, but I have been trying to figure this out for a few days now (I'm quite new to R), have read manuals, list posts, etc., and not finding precisely what I need. I am accustomed to working in Stata, but I am currently working through the multilevel package right now. In Stata, I would include the statement if model1 == 1 at the end of my regression statement, but I can't seem to find the correct syntax for doing the same thing in R. I have successfully converted Stata datasets, merged, aggregated, run lm, etc., but this simple task is evading me entirely. Here's what I have tried (and various iterations): if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) I get this: Warning message: In if (merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv + : the condition has length 1 and only the first element will be used What it seems to do is just run lm on the first opportunities of model1==1, but then doesn't try it again. I tried sorting so that the model1==1 appear first, but that doesn't seem to be a great solution either. I'll just go back into Stata and create separate new datasets if I have to, but there HAS to be a more elegant way. Thank you for ANY feedback! Kate Kate Rohrbaugh Independent Project Analysis, Inc. 44426 Atwater Drive Ashburn, VA 20147 office - 703.726.5465 fax - 703.729.8301 email - [EMAIL PROTECTED] website - www.ipaglobal.com HTMLHEAD META http-equiv=Content-Type content=text/html; charset=iso-8859-15 META content=MSHTML 6.00.5730.11 name=GENERATOR/HEAD BODY style=MARGIN: 4px 4px 1px; FONT: 12pt Comic Sans MS; COLOR: #00 DIVnbsp;/DIV DIVnbsp;/DIV DIV HR /DIV DIVFONT face=Arial size=1This email message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. This communication may contain material protected by legal privileges. If you are not the intended recipient or person responsible for delivering this confidential communication to the intended recipient, you have received this communication in error; any review, use, dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is created, received, or sent on its network. If you have received this confidential communication in error, please notify the sender immediately by reply email message and permanently delete the original message. Thank you for your cooperation./FONT/DIV/BODY/HTML __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] draw a function over a histogram
Daniela Garavaglia wrote: Sorry, I have some troubles with the graph device. How can I draw a function over a histogram? Thank's so much. Daniela, What function? Here is one example using density() and using dnorm() x - rnorm(1000,2,2) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col=red) library(MASS) y - fitdistr(x,normal) curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE) HTH R. [[alternative HTML version deleted]] --- What about this?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read malformed csv files with read.table?
Try this. It will read the file and see if there is a difference and add in the extra headers: x -time/ms C550.KMSCyt_b559.KMSCyt_b563.KMS Cyt_f_.KMS P515FR.KMS Scatt.KMS Zea2.KMSPC P700 0 Point1 -599.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 Point2 -598.000 -0.012 -0.013 0.040 0.013 0.027 0.010 0.022 0.000 0.000 0 Point3 -596.500 -0.015 -0.015 0.044 0.020 0.025 0.010 0.033 0.000 0.000 # find out how many dummy headers you have to add x.c - count.fields(textConnection(x)) x.diff - x.c[2] - x.c[1] # assume first line is short x.connection - textConnection(x) # setup connection if (x.diff 0){ # read first line x.first - readLines(x.connection, n=1) # add dummy headers x.first - paste(x.first, paste(LETTERS[1:x.diff], collapse= )) pushBack(x.first, x.connection) # push back the line so it is ready for read.table } input - read.table(x.connection, header=TRUE) closeAllConnections() On Fri, Aug 22, 2008 at 10:19 AM, Martin Ballaschk [EMAIL PROTECTED] wrote: Hi, how do I read files that have two header fields less than they have columns? The easiest solution would be to insert one or two additional header fields, but I have a lot of files and that would be quite a lot of awful work. Any ideas on how to solve that problem? ### R stuff: read.table(myfile.CSV, sep = \t, header = T) Error in read.table(myfile.CSV, sep = \t, : more columns than column names count.fields(myfile.CSV, sep = \t) [1] 10 12 12 12 12 12 12 12 12 12 12 [...] ### ugly sample (Exported by SDL DataTable component): time/ms C550.KMSCyt_b559.KMSCyt_b563.KMSCyt_f_.KMS P515FR.KMS Scatt.KMS Zea2.KMSPC P700 0 Point1 -599.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 Point2 -598.000 -0.012 -0.013 0.040 0.013 0.027 0.010 0.022 0.000 0.000 0 Point3 -596.500 -0.015 -0.015 0.044 0.020 0.025 0.010 0.033 0.000 0.000 [...] Cheers, Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read malformed csv files with read.table?
on 08/22/2008 10:19 AM Martin Ballaschk said the following: how do I read files that have two header fields less than they have columns? The easiest solution would be to insert one or two additional header fields, but I have a lot of files and that would be quite a lot of awful work. Any ideas on how to solve that problem? you could use read.table with header = F, that way it will read the table without worrying about column names (they will end up in the first row of the data). Then, you can just delete the first row, or assign it to names(), or whatever. if all the columns in all your files have the same names, you can read them all with header=F and col.names=vectorofcolumnnames, and then delete first row (which will contain the incomplete col names from the file). hope this helps :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filtering out data
Exactly what do you want this statement to do: if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) The result in the if is a vector of logical values (assuming that merged.dataset is longer than 1); that is the source of the error message. Do you want the regression to be done on every row? So what is the problem you are trying to solve? On Fri, Aug 22, 2008 at 10:47 AM, Kate Rohrbaugh [EMAIL PROTECTED] wrote: Greetings, Apologies for such a simple question, but I have been trying to figure this out for a few days now (I'm quite new to R), have read manuals, list posts, etc., and not finding precisely what I need. I am accustomed to working in Stata, but I am currently working through the multilevel package right now. In Stata, I would include the statement if model1 == 1 at the end of my regression statement, but I can't seem to find the correct syntax for doing the same thing in R. I have successfully converted Stata datasets, merged, aggregated, run lm, etc., but this simple task is evading me entirely. Here's what I have tried (and various iterations): if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) I get this: Warning message: In if (merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv + : the condition has length 1 and only the first element will be used What it seems to do is just run lm on the first opportunities of model1==1, but then doesn't try it again. I tried sorting so that the model1==1 appear first, but that doesn't seem to be a great solution either. I'll just go back into Stata and create separate new datasets if I have to, but there HAS to be a more elegant way. Thank you for ANY feedback! Kate Kate Rohrbaugh Independent Project Analysis, Inc. 44426 Atwater Drive Ashburn, VA 20147 office - 703.726.5465 fax - 703.729.8301 email - [EMAIL PROTECTED] website - www.ipaglobal.com HTMLHEAD META http-equiv=Content-Type content=text/html; charset=iso-8859-15 META content=MSHTML 6.00.5730.11 name=GENERATOR/HEAD BODY style=MARGIN: 4px 4px 1px; FONT: 12pt Comic Sans MS; COLOR: #00 DIVnbsp;/DIV DIVnbsp;/DIV DIV HR /DIV DIVFONT face=Arial size=1This email message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. This communication may contain material protected by legal privileges. If you are not the intended recipient or person responsible for delivering this confidential communication to the intended recipient, you have received this communication in error; any review, use, dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is created, received, or sent on its network. If you have received this confidential communication in error, please notify the sender immediately by reply email message and permanently delete the original message. Thank you for your cooperation./FONT/DIV/BODY/HTML __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] testing for differences in sums - how to?
If I want to test for a salesman effect on average sale, I model it as Sale ~ Salesman. But what if I want to test for a salesman effect on total sales? How do I model that? Given a significant salesman effect, how do I follow up with the equivalent of Tukey's HSD to determine who the (indistinguishly) top salesmen are? If I add another factor, say region to the design, how do I then model the result? Thanks, jkh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read malformed csv files with read.table?
On Fri, 22 Aug 2008, Daniel Folkinshteyn wrote: on 08/22/2008 10:19 AM Martin Ballaschk said the following: how do I read files that have two header fields less than they have columns? The easiest solution would be to insert one or two additional header fields, but I have a lot of files and that would be quite a lot of awful work. Any ideas on how to solve that problem? you could use read.table with header = F, that way it will read the table without worrying about column names (they will end up in the first row of the data). Or, better, use header=FALSE, skip=1 and the col.names arg of read.table(). Then, you can just delete the first row, or assign it to names(), or whatever. if all the columns in all your files have the same names, you can read them all with header=F and col.names=vectorofcolumnnames, and then delete first row (which will contain the incomplete col names from the file). The trouble with that approach is that your will get all factor columns. hope this helps :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Test of Homogeneity of Variances
I am testing the homogeneity of variances via bartlett.test and fligner.test. Using the following example, how should I interpret the p-value in order to accept or reject the null hypothesis ? set.seed(5) x - rnorm(20) bartlett.test(x, rep(1:5, each=4)) Bartlett test of homogeneity of variances data: x and rep(1:5, each = 4) Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778 fligner.test(x, rep(1:5, each=4)) Fligner-Killeen test of homogeneity of variances data: x and rep(1:5, each = 4) Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HELP: how to add weight to a [x,y] coordinate
There are several options. You can use the sunflowerplot function. You can plot solid points with a small alpha value (not all graphics devices support this), so that just a few points show up pale, but a lot show up as more opaque. You can use the symbols function (or bubbleplot, or my.symbols, or ..., from other packages). You can use the hexbin package and function (bioconductor) (I think this is my prefered method). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jan Akko Eleveld Sent: Thursday, August 21, 2008 3:11 PM To: r-help@r-project.org Subject: [R] HELP: how to add weight to a [x,y] coordinate Anyone who can help me with the following question? How can I add weight to [x,y] coordinates on a graph/scatterplot? Background: Monte Carlo simulation generated 730,000 [x,y] coordinates with a weight attached (from 0-0.5). Both x and y are rounded and fit on a raster with x-axis 0-170 months (smalles unit = 1 month) and y-axis 0-6 (smallest unit=0.1). I would like every [x,y] to add its weight, so that after all 730,000 [x,y] are plotted, a maximum likelyhood becomes apparent through the size of the point that are made up by accumulated weights. Thank you in advance for any thoughts! Best, Akko Eleveld __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Homogeneity of Variances
What are your hypotheses? Once you state what they are, interpretation should be straightforward. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan Sent: Friday, August 22, 2008 11:18 AM To: [EMAIL PROTECTED] Subject: [R] Test of Homogeneity of Variances I am testing the homogeneity of variances via bartlett.test and fligner.test. Using the following example, how should I interpret the p-value in order to accept or reject the null hypothesis ? set.seed(5) x - rnorm(20) bartlett.test(x, rep(1:5, each=4)) Bartlett test of homogeneity of variances data: x and rep(1:5, each = 4) Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778 fligner.test(x, rep(1:5, each=4)) Fligner-Killeen test of homogeneity of variances data: x and rep(1:5, each = 4) Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme questions re: repeated measures covariance structure
Hello, We are attempting to use nlme to fit a linear mixed model to explain bird abundance as a function of habitat: lme(abundance~habitat-1,data=data,method=ML,random=~1|sampleunit) The data consist of repeated counts of birds in sample units across multiple years, and we have two questions: 1) Is it necessary (and, if so, how) to specify the repeated measure (years)? As written, the above code does not. 2) How can we specify a Toeplitz heterogeneous covariance structure for this model? We have searched the help file for lme, and the R-help archives, but cannot find any pertinent information. If that's not possible, can we adapt an existing covariance structure, and if so how? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Null and Alternate hypothesis for Significance test
The problem with trying to prove that 2 distributions are equal is that there is exactly one way in which they can be equal and an infinite number of ways that they can be different (and its rather a large infinity). The traditional use of equality as the Null hypothesis works, because we can assume that it is true and then compute the probability of seeing data as or more extreem than that observed. If we just want to assume that the 2 distributions are different and compute the probability of the similarity between the datasets, then we can assume differences that are small enough, but still non-zero, to easily result in the similarity of the data seen. It is not hard to come up with 2 distributions that are clearly different, but which would yield identical datasets. Given this, any test to show exact equality would need to have a p-value identically set at 1 (data observed is completely plausible with null hypothesis). Or we could get a significantly higher powered ! test of the correct size by generating a p-value from a uniform(0,1) distribution. Neither option has much scientific merit. Stepping away from proving equality, the more common approach is to prove equivalence, where the alternative hypothesis is that the 2 distributions are close enough even if not equal. Determining close enough is subjective and dependent on the scientific question. Close enough is often determined by visual inspection of graphs rather than a hypothesis test. If you insist on a hypothesis test, then you need to determine in advance what is meant by close enough, both in deciding the distance measure and how big that distance has to be before they are no longer equivalent. You asked about the KS distance measure, that is one option for choosing a distance, there are others, which works best depends on you and the scientific question. Take for example 2 distributions F and G. F is uniform(0,1) meaning the the density function is 1 between 0 and 1 and 0 elsewhere, the distribution of G is equal to 1 between 0 and 0.99 and also equal to 1 between 99.99 and 100, zero elsewhere. Are these 2 functions equivalent? The 2 functions have 99% overlap, the KS distance is small (0.01 if I remember correctly), but the means and variances are quite different. When generating random values from the 2 distributions we will see very similar numbers with the exception that G will generate outliers near 100 1% of the time. Some people would consider these 2 distributions equivalent (they are pretty much the same if we discard the outliers), while others would consider the potential outliers (very extreeme) to make them non-equivalent. You need to decide th! at based on the science from which your data comes. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nitin Agrawal Sent: Thursday, August 21, 2008 2:59 PM To: r-help@r-project.org Subject: [R] Null and Alternate hypothesis for Significance test Hi, I had a question about specifying the Null hypothesis in a significance test. Advance apologies if this has already been asked previously or is a naive question. I have two samples A and B, and I want to test whether A and B come from the same distribution. The default Null hypothesis would be H0: A=B But since I am trying to prove that A and B indeed come from the same distribution, I think this is not the right choice for the null hypothesis (it should be one that is set up to be rejected) How do I specify a null hypothesis H0: A not equal to B for say a KS test. An example to do this in R would be greatly appreciated. On a related note: what is a good way to measure the difference between observed and expected PDFs? Is the D statistic of the KS test a good choice? Thanks! Nitin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] swap
I'm guessing that the following (untested) does what is wanted: function(x) { pos - sample(length(x), 2, replace=FALSE) x[pos] - x[ pos[2:1] ] x } Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) [EMAIL PROTECTED] wrote: Hello Richie, I would like to do three (or k) swap steps in each step just 2 ID recursive swaping x - 1:10 swap - function(x){ a - sample(x,2) x[x==a[1]] - swap[2] x[x==a[2]] - swap[1] return(x) } swap(swap(swap(x))) - mix I tried my best with a response before, but if you want a sensible answer you are going to have to try harder explaining what you really want. What do you mean by 'swap step'? If you want to swap the position of two elements in a vector (as I suspect you might) then which positions do you want swapping? Do you specify them yourself (as inputs to the swap function perhaps), or should they be randomly generated? If you provide more context (the rest of your code, what you are trying to achieve etc.) the help will be better. Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme questions re: repeated measures covariance structure
Dear Mark, I would include the repeated measure as the smallest stratum in the random effects specification: random=~1|sampleunit/year Setting up user-defined variance structures should be possible using for example: weights=varPower(form=~habitat) or also try out the available corStruct() classes (found in Pinheiro and Bates 2000) HTH Christoph Mark Na schrieb: Hello, We are attempting to use nlme to fit a linear mixed model to explain bird abundance as a function of habitat: lme(abundance~habitat-1,data=data,method=ML,random=~1|sampleunit) The data consist of repeated counts of birds in sample units across multiple years, and we have two questions: 1) Is it necessary (and, if so, how) to specify the repeated measure (years)? As written, the above code does not. 2) How can we specify a Toeplitz heterogeneous covariance structure for this model? We have searched the help file for lme, and the R-help archives, but cannot find any pertinent information. If that's not possible, can we adapt an existing covariance structure, and if so how? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. . -- Dr. rer.nat. Christoph Scherber University of Goettingen DNPW, Agroecology Waldweg 26 D-37073 Goettingen Germany phone +49 (0)551 39 8807 fax +49 (0)551 39 8806 Homepage http://www.gwdg.de/~cscherb1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mapping a vector into predefined bins
Have a look at ? cut Best wishes Wolfgang -- Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 22/08/2008 14:14 Dr. G. Lawyer scripsit Hi, I have a mapping M consisting of a vector of bin centers (say 0.01, 0.02, 0.03 ..., 1) with associated values (say red, orange, yellow, ...), stored as a 2-column matrix with column 1 the bin center and column 2 the value. I also have a vector D of data (say 0.9056, 0.4118, 0.2162, ...). I want to map each datum in D to the value associated with its bin in M (say 0.0111- red, 0.0198 - orange, ...) . Does R have a built-in function for this? Thanks for tips/suggestions. +glenn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filtering out data
If I understand correctly the result you are trying to achieve, I think what you may be looking for is this: model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset[merged.dataset$model1 == 1, ]) This will give you a regression using only the data for which model1 == 1. on 08/22/2008 11:07 AM jim holtman said the following: Exactly what do you want this statement to do: if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) The result in the if is a vector of logical values (assuming that merged.dataset is longer than 1); that is the source of the error message. Do you want the regression to be done on every row? So what is the problem you are trying to solve? On Fri, Aug 22, 2008 at 10:47 AM, Kate Rohrbaugh [EMAIL PROTECTED] wrote: Greetings, Apologies for such a simple question, but I have been trying to figure this out for a few days now (I'm quite new to R), have read manuals, list posts, etc., and not finding precisely what I need. I am accustomed to working in Stata, but I am currently working through the multilevel package right now. In Stata, I would include the statement if model1 == 1 at the end of my regression statement, but I can't seem to find the correct syntax for doing the same thing in R. I have successfully converted Stata datasets, merged, aggregated, run lm, etc., but this simple task is evading me entirely. Here's what I have tried (and various iterations): if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) I get this: Warning message: In if (merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv + : the condition has length 1 and only the first element will be used What it seems to do is just run lm on the first opportunities of model1==1, but then doesn't try it again. I tried sorting so that the model1==1 appear first, but that doesn't seem to be a great solution either. I'll just go back into Stata and create separate new datasets if I have to, but there HAS to be a more elegant way. Thank you for ANY feedback! Kate Kate Rohrbaugh Independent Project Analysis, Inc. 44426 Atwater Drive Ashburn, VA 20147 office - 703.726.5465 fax - 703.729.8301 email - [EMAIL PROTECTED] website - www.ipaglobal.com HTMLHEAD META http-equiv=Content-Type content=text/html; charset=iso-8859-15 META content=MSHTML 6.00.5730.11 name=GENERATOR/HEAD BODY style=MARGIN: 4px 4px 1px; FONT: 12pt Comic Sans MS; COLOR: #00 DIVnbsp;/DIV DIVnbsp;/DIV DIV HR /DIV DIVFONT face=Arial size=1This email message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. This communication may contain material protected by legal privileges. If you are not the intended recipient or person responsible for delivering this confidential communication to the intended recipient, you have received this communication in error; any review, use, dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is created, received, or sent on its network. If you have received this confidential communication in error, please notify the sender immediately by reply email message and permanently delete the original message. Thank you for your cooperation./FONT/DIV/BODY/HTML __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Homogeneity of Variances
I am testing whether the sample variances are equal. When p-value 0.05 (alpha), should accept null hypothesis (sample variances are equal) or reject it ? The two new examples with each having same sample variances also puzzle me. Why are the p-values different ? bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) Bartlett test of homogeneity of variances data: rep(rnorm(5), times = 4) and rep(1:5, each = 4) Bartlett's K-squared = 0.8681, df = 4, p-value = 0.929 bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) Bartlett test of homogeneity of variances data: rep(rnorm(5), times = 4) and rep(1:5, each = 4) Bartlett's K-squared = 3.5599, df = 4, p-value = 0.4688 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Date: Fri, 22 Aug 2008 11:25:36 -0400 Subject: RE: [R] Test of Homogeneity of Variances What are your hypotheses? Once you state what they are, interpretation should be straightforward. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan Sent: Friday, August 22, 2008 11:18 AM To: [EMAIL PROTECTED] Subject: [R] Test of Homogeneity of Variances I am testing the homogeneity of variances via bartlett.test and fligner.test. Using the following example, how should I interpret the p-value in order to accept or reject the null hypothesis ? set.seed(5) x - rnorm(20) bartlett.test(x, rep(1:5, each=4)) Bartlett test of homogeneity of variances data: x and rep(1:5, each = 4) Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778 fligner.test(x, rep(1:5, each=4)) Fligner-Killeen test of homogeneity of variances data: x and rep(1:5, each = 4) Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for ...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filtering out data
Thank you, thank you everyone! This last worked the best for what I need to do. The other options were definitely helpful in that they help me understand how R is structured (something I haven't been able to get my head around yet) and gave me good ideas about how to manipulate data, however they generated regressions of data I don't care about (i.e., when model1==0). Regards -- Kate Kate Rohrbaugh Independent Project Analysis, Inc. 44426 Atwater Drive Ashburn, VA 20147 office - 703.726.5465 fax - 703.729.8301 email - [EMAIL PROTECTED] website - www.ipaglobal.com ( http://www.ipaglobal.com/ ) Daniel Folkinshteyn [EMAIL PROTECTED] 8/22/2008 12:08 PM If I understand correctly the result you are trying to achieve, I think what you may be looking for is this: model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset[merged.dataset$model1 == 1, ]) This will give you a regression using only the data for which model1 == 1. on 08/22/2008 11:07 AM jim holtman said the following: Exactly what do you want this statement to do: if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) The result in the if is a vector of logical values (assuming that merged.dataset is longer than 1); that is the source of the error message. Do you want the regression to be done on every row? So what is the problem you are trying to solve? On Fri, Aug 22, 2008 at 10:47 AM, Kate Rohrbaugh [EMAIL PROTECTED] wrote: Greetings, Apologies for such a simple question, but I have been trying to figure this out for a few days now (I'm quite new to R), have read manuals, list posts, etc., and not finding precisely what I need. I am accustomed to working in Stata, but I am currently working through the multilevel package right now. In Stata, I would include the statement if model1 == 1 at the end of my regression statement, but I can't seem to find the correct syntax for doing the same thing in R. I have successfully converted Stata datasets, merged, aggregated, run lm, etc., but this simple task is evading me entirely. Here's what I have tried (and various iterations): if(merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv1 + iv2 + iv3, data = merged.dataset) I get this: Warning message: In if (merged.dataset$model1 == 1) model1.coeff - lm(dv ~ iv + : the condition has length 1 and only the first element will be used What it seems to do is just run lm on the first opportunities of model1==1, but then doesn't try it again. I tried sorting so that the model1==1 appear first, but that doesn't seem to be a great solution either. I'll just go back into Stata and create separate new datasets if I have to, but there HAS to be a more elegant way. Thank you for ANY feedback! Kate Kate Rohrbaugh Independent Project Analysis, Inc. 44426 Atwater Drive Ashburn, VA 20147 office - 703.726.5465 fax - 703.729.8301 email - [EMAIL PROTECTED] website - www.ipaglobal.com HTMLHEAD META http-equiv=Content-Type content=text/html; charset=iso-8859-15 META content=MSHTML 6.00.5730.11 name=GENERATOR/HEAD BODY style=MARGIN: 4px 4px 1px; FONT: 12pt Comic Sans MS; COLOR: #00 DIVnbsp;/DIV DIVnbsp;/DIV DIV HR /DIV DIVFONT face=Arial size=1This email message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. This communication may contain material protected by legal privileges. If you are not the intended recipient or person responsible for delivering this confidential communication to the intended recipient, you have received this communication in error; any review, use, dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is created, received, or sent on its network. If you have received this confidential communication in error, please notify the sender immediately by reply email message and permanently delete the original message. Thank you for your cooperation./FONT/DIV/BODY/HTML __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. HTMLHEAD META http-equiv=Content-Type content=text/html; charset=iso-8859-15 META content=MSHTML 6.00.5730.11 name=GENERATOR/HEAD BODY style=MARGIN: 4px 4px 1px; FONT: 12pt Comic Sans MS; COLOR: #00 DIVnbsp;/DIV DIVnbsp;/DIV DIV HR /DIV DIVFONT face=Arial size=1This email message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. This communication may contain material protected by legal privileges. If you are not the intended
[R] WinBUGS with R
Dear Users, I am new to both of things, so do not blame me too much... I am busy with semiparametric regression and use WinBUGS to sample posteriors. The code to call Winbugs is as follows: data- list(y,X,n,m) #My variables inits.beta - rep(0,K) inits.beta0 - 0 inits - function(){list(beta=inits.beta,beta0=inits.beta0,taueps=1.0E-3)} parameters - list(sigma,beta,beta0,y.star) fitm- bugs(data,inits,parameters,model.file=model.bug, n.chains=3, n.iter=n.iter, n.burnin=n.burnin, n.thin = n.thin, debug=FALSE,DIC=FALSE,digit=5,codaPkg=FALSE, bugs.directory=C:/Program Files/WinBUGS14/) but I always get the following error: Error in FUN(X[[1L]], ...) : .C(..): 'type' must be real for this format I tried the web, but failed. Could anyone give me a clue? Best! -- View this message in context: http://www.nabble.com/WinBUGS-with-R-tp19108557p19108557.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining multiple datasets
Here's a function that does what you asked, you may need to adjust the column names of your data frames before using it. If all your data frames are similar (same number of rows, same years) then try do.call('cbind', yourList). #This function takes a list of data frames and merges them into one data frame. # The data frames are assumed to have one common column name (the one used # for joins) and customized name(s) for the remaining column(s). #LDF = list of data frames to be merged rmerge - function(LDF, verbose=FALSE) { DF - LDF[[1]] cat(paste(Started with, nrow(DF), rows from, names(LDF)[1]), \n) for (i in 2:length(LDF)) { DF - merge(DF, LDF[[i]], all=TRUE) #outer join if (verbose) cat(paste(Adding , nrow(LDF[[i]]), rows from , names(LDF)[i], (, nrow(DF), rows so far)..., sep=), \n) } rownames(DF) - NULL cat(paste(Got, nrow(DF), rows after recursive merging), \n) DF } Alison Macalady wrote: Hi, I've tried to figure this out using Intro to R and help(), to no avail - I am new at this. I'm trying to write a script that will read multiple files from a directory and then merge them into a single new data frame. The original data are in a tree-ring specific format, and so I've first used a function (read.rwl) from the dplR package to read each file, translate each into a more intuitive time series format, and then put each new data frame into a single object, demo.Rwl: demo.Rwl - list.files(pattern=.RWL$); for(i in demo.Rwl) { x - read.rwl(i, header=TRUE); assign(print(i, quote=FALSE), x)} This part seems to work. Now, I want to be able to put all of the individual data frames contained in demo.Rwl into a single data frame, merging by rows (rows are calendar years in the time series). I think I know how to do this by typing each data set name into a merge: merge(x,y..., by.x=0, all=TRUE ) However, I'd like to be able to script something that will save me the time of typing in all of the individual data set names, as I will be repeating this operation many times with many different numbers of original input files. Is there a way to code a merge such that every individual data frame contained in demo.Rwl (a different number every time) gets combined into a single data set? Thanks for the help! Ali __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Combining-multiple-datasets-tp19100369p19108774.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantile regression with complex survey data
Just as a follow-up on T. Lumley's post, 2 citations that may be useful in reference to application of quantile regression with survey samples are: Bassett and Saleh. 1994. L_1 estimation of the median of a survey population. Nonparametric Statistics 3: 277-283. (L_1 estimation of median is 0.50 quantile regression). Bassett et al. 2002. Quantile models and estimators for data analysis. Metrika 55: 17-26. (describes weighted QR for survey of school characteristics and student achievement scores). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 Stas Kolenikov [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 08/20/2008 01:14 PM To Cheng, Yiling (CDC/CCHP/NCCDPHP) [EMAIL PROTECTED] cc r-help@r-project.org Subject Re: [R] Quantile regression with complex survey data On Wed, Aug 20, 2008 at 8:12 AM, Cheng, Yiling (CDC/CCHP/NCCDPHP) [EMAIL PROTECTED] wrote: I am working on the NHANES survey data, and want to apply quantile regression on these complex survey data. Does anyone know how to do this? There are no references in technical literature (thinking, Annals, JASA, JRSS B, Survey Methodology). Absolutely none. Zero. You might be able to apply the procedure mechanically and then adjust the standard errors, but God only knows what the population equivalent is of whatever that model estimates. If there is a population analogue at all. In general, a quantile regression is a heavily model based concept: for each value of the explanatory variables, there is a well defined distribution of the response, and quantile regression puts additional structure on it -- linearity of quantiles wrt to some explanatory variables. That does not mesh well with the design paradigm according to which the survey estimation is usually conducted. With the latter, the finite population and characteristics of every unit are assumed fixed, and randomness comes only from the sampling procedure. Within that paradigm, you can define the marginal distribution of the response (or any other) variable, but the conditional distributions may simply be unavailable because there are no units in the population satisfying the conditions. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep, gsub and metacharacters
Hello, I have an expression that I want to substitute for something else. myvector-c(ajkhfkiuwe,Variable(kg/min)) if I use the following code to extract variable(kg/min) and substitute it for va gsub(myvector[grep(var, myvector, ignore=T)], va, myvector) grep identifies the element of the vector that matches my query, but it won't substitute the value. I have been reading the help pages for regular expression, but still can't figure out the syntax to read parenthesis and forward slashes, which are considered metacharacters. As usual, thank you for your help, Judith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey Design / Rake questions
I *think* I'm making progress, but I'm still failing at the same step. My rake call fails with: Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], : Stratifying variables don't match To my naïve eyes, it seems that my factors are in the wrong order. If so, how do I assert an ordering in my survey dataframe, or copy an image from the survey dataframe to my marginals dataframes? I'd prefer to pull the original marginals dataframe(s) from the survey dataframe so that I can automate that in production. If that's not my problem, where might I look for enlightenment? Neither ?why nor ?whatamimissing return citations. :-) ** How I fail Now ** SurveyData - read.spss(C:/Data/R/orange_delivery.sav, use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE) #=== temp - sub(' +$', '', SurveyData$direction_) SurveyData$direction_ - temp #=== SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyData$lineoff)) mean(SurveyData$NumStn) [1] 6.785276 ### Kludge SurveyData$NumStn - pmax(1,SurveyData$NumStn) mean(SurveyData$NumStn) [1] 6.789877 SurveyData$NumStn - as.factor(SurveyData$NumStn) ### EBSurvey - subset(SurveyData, direction_ == EASTBOUND ) XTTable - xtabs(~direction_ , EBSurvey) XTTable direction_ EASTBOUND 345 WBSurvey - subset(SurveyData, direction_ == WESTBOUND ) XTTable - xtabs(~direction_ , WBSurvey) XTTable direction_ WESTBOUND 307 # EBDesign - svydesign(id=~sampn, weights=~expwgt, data=EBSurvey) # svytable(~lineon+lineoff, EBDesign) StnName - c( Warner Center, De Soto, Pierce College, Tampa, Reseda, Balboa, Woodley, Sepulveda, Van Nuys, Woodman, Valley College, Laurel Canyon, North Hollywood) EBOnNewTots - c(1000, 600, 1200, 500, 1000, 500, 200, 250, 1000, 300, 100, 123.65,0 ) StnTraveld - c(as.character(1:12)) EBNumStn- c(673.65, 800, 1000, 1000, 800, 700, 600, 500, 400, 200, 50, 50 ) ByEBOn - data.frame(StnName, Freq=EBOnNewTots) ByEBNum - data.frame(StnTraveld, Freq=EBNumStn) RakedEBSurvey - rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn, ByEBNum) ) Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], : Stratifying variables don't match str(EBSurvey$lineon) Factor w/ 13 levels Warner Center,..: 3 1 1 1 2 13 1 5 1 5 ... EBSurvey$lineon[1:5] [1] Pierce College Warner Center Warner Center Warner Center De Soto Levels: Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood str(ByEBOn$StnName) Factor w/ 13 levels Balboa,De Soto,..: 11 2 5 8 6 1 12 7 10 13 ... ByEBOn$StnName[1:5] [1] Warner Center De SotoPierce College Tampa Reseda Levels: Balboa De Soto Laurel Canyon North Hollywood Pierce College Reseda Sepulveda Tampa Valley College Van Nuys Warner Center Woodley Woodman str(EBSurvey$NumStn) Factor w/ 12 levels 1,2,3,4,..: 10 12 4 12 8 1 8 8 12 4 ... EBSurvey$NumStn[1:5] [1] 10 12 4 12 8 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 str(ByEBNum$StnTraveld) Factor w/ 12 levels 1,10,11,..: 1 5 6 7 8 9 10 11 12 2 ... ByEBNum$StnTraveld[1:5] [1] 1 2 3 4 5 Levels: 1 10 11 12 2 3 4 5 6 7 8 9 ** ** Robert Farley Metro www.Metro.net -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Thursday, August 21, 2008 13:55 To: Farley, Robert Cc: r-help@r-project.org Subject: Re: [R] Survey Design / Rake questions On Tue, 19 Aug 2008, Farley, Robert wrote: While I'm trying to catch up on the statistical basis of my task, could someone point me to how I should fix my R error? The variables in the formula in rake() need to be the raw variables in the design object, not summary tables. -thomas Thanks library(survey) SurveyData - read.spss(C:/Data/R/orange_delivery.sav, use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE) #=== temp - sub(' +$', '', SurveyData$direction_) SurveyData$direction_ - temp #=== SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyDat a$lineoff)) EBSurvey - subset(SurveyData, direction_ == EASTBOUND ) XTTable - xtabs(~direction_ , EBSurvey)
Re: [R] WinBUGS with R
artimon wrote: Dear Users, I am new to both of things, so do not blame me too much... I am busy with semiparametric regression and use WinBUGS to sample posteriors. The code to call Winbugs is as follows: data- list(y,X,n,m) #My variables inits.beta - rep(0,K) inits.beta0 - 0 inits - function(){list(beta=inits.beta,beta0=inits.beta0,taueps=1.0E-3)} parameters - list(sigma,beta,beta0,y.star) fitm- bugs(data,inits,parameters,model.file=model.bug, n.chains=3, n.iter=n.iter, n.burnin=n.burnin, n.thin = n.thin, debug=FALSE,DIC=FALSE,digit=5,codaPkg=FALSE, bugs.directory=C:/Program Files/WinBUGS14/) but I always get the following error: Error in FUN(X[[1L]], ...) : .C(..): 'type' must be real for this format I tried the web, but failed. Could anyone give me a clue? Best! - Your example is not reproducible for us (without the data). - Is WinBUGS runnign correctly (see its output with debug=TRUE)? - What does traceback() tell us? Which versions of R, WinBUGS and R2WinBUGS are in use? Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read malformed csv files with read.table?
Hi folks, thank you for your friendly and immediate help! Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley: Or, better, use header=FALSE, skip=1 and the col.names arg of read.table(). My solution is reading the files without the headers (skip = 1) and seperately reading the headers with scan (scan(myfile.CSV, what = character, sep = \t, nlines = 1). After throwing out the first two columns it should be possible to assign the scanned colnames to the data.frame colnames. Cheers Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep, gsub and metacharacters
Try this: myvector-c(ajkhfkiuwe,Variable(kg/min)) gsub( \\(kg/min\\), va, myvector ) Does that come close to what you want? If not, maybe an example of what you want the result to look like, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Judith Flores Sent: Friday, August 22, 2008 10:57 AM To: RHelp Subject: [R] grep, gsub and metacharacters Hello, I have an expression that I want to substitute for something else. myvector-c(ajkhfkiuwe,Variable(kg/min)) if I use the following code to extract variable(kg/min) and substitute it for va gsub(myvector[grep(var, myvector, ignore=T)], va, myvector) grep identifies the element of the vector that matches my query, but it won't substitute the value. I have been reading the help pages for regular expression, but still can't figure out the syntax to read parenthesis and forward slashes, which are considered metacharacters. As usual, thank you for your help, Judith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read malformed csv files with read.table?
On Fri, 22 Aug 2008, Martin Ballaschk wrote: Hi folks, thank you for your friendly and immediate help! Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley: Or, better, use header=FALSE, skip=1 and the col.names arg of read.table(). My solution is reading the files without the headers (skip = 1) and seperately reading the headers with scan (scan(myfile.CSV, what = character, sep = \t, nlines = 1). After throwing out the first two columns it should be possible to assign the scanned colnames to the data.frame colnames. Yes, but if you read the header first you can set the col.names via the arg to read.table(). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple generation of artificial data with defined features
On the general question on how to create a dataset that matches the frequencies in a table, function as.data.frame can be useful. It takes as argument an object of a class 'table' and returns a data frame of frequencies. Consider for example table 6.1 of Fleiss et al (3rd Ed): birth.weight - c(10,15,40,135) attr(birth.weight, class) - table attr(birth.weight, dim) - c(2,2) attr(birth.weight, dimnames) - list(c(A, Ab), c(B, Bb)) birth.weight B Bb A 10 40 Ab 15 135 summary(birth.weight) Number of cases in table: 200 Number of factors: 2 Test for independence of all factors: Chisq = 3.429, df = 1, p-value = 0.06408 bw.dt - as.data.frame(birth.weight) Observations (rows) in this table can then be replicated according to their corresponding frequencies to yield the expanded dataset that conforms with the original table. bw.dt.exp - bw.dt[rep(1:nrow(bw.dt), bw.dt$Freq), -ncol(bw.dt)] dim(bw.dt.exp) [1] 200 2 table(bw.dt.exp) Var2 Var1 B Bb A 10 40 Ab 15 135 The above approach is not restricted to 2x2 tables, and should be straightforward generate datasets that conform to arbitrary nxm frequency tables. -Christos Hatzis -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Snow Sent: Friday, August 22, 2008 12:41 PM To: drflxms; r-help@r-project.org Subject: Re: [R] simple generation of artificial data with defined features I don't think that the election data is the right data to demonstrate Kappa, you need subjects that are classified by 2 or more different raters/methods. The election data could be considered classifying the voters into which party they voted for, but you only have 1 rater. Maybe if you had some survey data that showed which party each voter voted for in 2 or more elections, then that may be a good example dataset. Otherwise you may want to stick with the sample datasets. There are other packages that compute Kappa values as well (I don't know if others calculate this particular version), but some of those take the summary data as input rather than the raw data, which may be easier if you just have the summary tables. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of drflxms Sent: Friday, August 22, 2008 6:12 AM To: r-help@r-project.org Subject: [R] simple generation of artificial data with defined features Dear R-colleagues, I am quite a newbie to R fighting my stupidity to solve a probably quite simple problem of generating artificial data with defined features. I am conducting a study of inter-observer-agreement in child-bronchoscopy. One of the most important measures is Kappa according to Fleiss, which is very comfortable available in R through the irr-package. Unfortunately medical doctors like me don't really understand much of statistics. Therefore I'd like to give the reader an easy understandable example of Fleiss-Kappa in the Methods part. To achieve this, I obtained a table with the results of the German election from 2005: partynumber of votespercent SPD1619466534,2 CDU1313674027,8 CSU34943097,4 Gruene38383268,1 FDP46481449,8 PDS41181948,7 I want to show the agreement of voters measured by Fleiss-Kappa. To calculate this with the kappam.fleiss-function of irr, I need a data.frame like this: (id of 1st voter) (id of 2nd voter) party spd cdu Of course I don't plan to calculate this with the million of cases mentioned in the table above (I am working on a small laptop). A division by 1000 would be more than perfect for this example. The exact format of the table is generally not so important, as I could reshape nearly every format with the help of the reshape-package. Unfortunately I could not figure out how to create such a fictive/artificial dataset as described above. Any data.frame would be nice, that keeps at least the percentage. String-IDs of parties could be substituted by numbers of course (would be even better for function kappam.fleiss in irr!). I would appreciate any kind of help very much indeed. Greetings from Munich, Felix Mueller-Sarnowski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] How to read malformed csv files with read.table?
Hi Brian, Am 22.08.2008 um 19:22 schrieb Prof Brian Ripley: On Fri, 22 Aug 2008, Martin Ballaschk wrote: My solution is reading the files without the headers (skip = 1) and seperately reading the headers with scan (scan(myfile.CSV, what = character, sep = \t, nlines = 1). After throwing out the first two columns it should be possible to assign the scanned colnames to the data.frame colnames. Yes, but if you read the header first you can set the col.names via the arg to read.table(). Thanks! I plan to do it like that (actually it will be stuffed into a loop to read a bunch of files), seems to work: headernames - scan(test.CSV, what = character, sep = \t, nlines = 1, skip = 4) my.table - read.table(test.CSV, header=F, skip = 5, col.names = c(crap.1, crap.2, headernames)) head(my.table) crap.1 crap.2 time.ms C550.KMS Cyt_b559.KMS [...] etc. 1 0 Point1 -599.50.0000.000 2 0 Point2 -598.00.019 -0.014 3 0 Point3 -596.50.025 -0.023 4 0 Point4 -595.00.034 -0.029 5 0 Point5 -593.50.049 -0.033 6 0 Point6 -592.00.068 -0.033 Cheers Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Homogeneity of Variances
Have you read the documentation to either of the functions you are using? ?bartlett.test Performs Bartlett's test of the null that the variances in each of the groups (samples) are the same. This explicitly tells you what is being tested, i.e. the null tested is that var1 = var2. ?rnorm Generates random deviates, so the answer [i.e. p-value] will (almost certainly) differ each time. Only by setting a seed for the random number generator to use will rnorm() generate the same number-set/distribution and therefore the same p-value. ?set.seed ## set.seed(7) bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) set.seed(7) bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) set(23) bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) HTH, Mark. Daren Tan wrote: I am testing whether the sample variances are equal. When p-value 0.05 (alpha), should accept null hypothesis (sample variances are equal) or reject it ? The two new examples with each having same sample variances also puzzle me. Why are the p-values different ? bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) Bartlett test of homogeneity of variances data: rep(rnorm(5), times = 4) and rep(1:5, each = 4) Bartlett's K-squared = 0.8681, df = 4, p-value = 0.929 bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) Bartlett test of homogeneity of variances data: rep(rnorm(5), times = 4) and rep(1:5, each = 4) Bartlett's K-squared = 3.5599, df = 4, p-value = 0.4688 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Date: Fri, 22 Aug 2008 11:25:36 -0400 Subject: RE: [R] Test of Homogeneity of Variances What are your hypotheses? Once you state what they are, interpretation should be straightforward. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan Sent: Friday, August 22, 2008 11:18 AM To: [EMAIL PROTECTED] Subject: [R] Test of Homogeneity of Variances I am testing the homogeneity of variances via bartlett.test and fligner.test. Using the following example, how should I interpret the p-value in order to accept or reject the null hypothesis ? set.seed(5) x - rnorm(20) bartlett.test(x, rep(1:5, each=4)) Bartlett test of homogeneity of variances data: x and rep(1:5, each = 4) Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778 fligner.test(x, rep(1:5, each=4)) Fligner-Killeen test of homogeneity of variances data: x and rep(1:5, each = 4) Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for ...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Test-of-Homogeneity-of-Variances-tp19109383p19112613.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Homogeneity of Variances
You don't need to test that the _sample_ variances are different: They already are. Statistical tests of hypotheses are not about sample statistics, but distributional charateristics. It seems to me that reinforcement of some basic stat concept may do you quite a bit of good. If you don't have the basics, you're just going to get more and more puzzled every step of the way. Just a frank suggestion. Best, Andy From: Daren Tan I am testing whether the sample variances are equal. When p-value 0.05 (alpha), should accept null hypothesis (sample variances are equal) or reject it ? The two new examples with each having same sample variances also puzzle me. Why are the p-values different ? bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) Bartlett test of homogeneity of variances data: rep(rnorm(5), times = 4) and rep(1:5, each = 4) Bartlett's K-squared = 0.8681, df = 4, p-value = 0.929 bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4)) Bartlett test of homogeneity of variances data: rep(rnorm(5), times = 4) and rep(1:5, each = 4) Bartlett's K-squared = 3.5599, df = 4, p-value = 0.4688 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Date: Fri, 22 Aug 2008 11:25:36 -0400 Subject: RE: [R] Test of Homogeneity of Variances What are your hypotheses? Once you state what they are, interpretation should be straightforward. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan Sent: Friday, August 22, 2008 11:18 AM To: [EMAIL PROTECTED] Subject: [R] Test of Homogeneity of Variances I am testing the homogeneity of variances via bartlett.test and fligner.test. Using the following example, how should I interpret the p-value in order to accept or reject the null hypothesis ? set.seed(5) x - rnorm(20) bartlett.test(x, rep(1:5, each=4)) Bartlett test of homogeneity of variances data: x and rep(1:5, each = 4) Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778 fligner.test(x, rep(1:5, each=4)) Fligner-Killeen test of homogeneity of variances data: x and rep(1:5, each = 4) Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for ...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inner margins for lattice
I would like to control the inner margins of a lattice graph. The graph is a single superposed panel, but there is too much white space around the data for my liking. I've played around with some of the layout options, but so far have found only how to control outer margins. R version 2.7.1 (2008-06-23) MS Vista OS Thanks for any help. Todd Hatfield __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R and GNUPLOT
Hello, I am trying to send commands to GNUPLOT to load a .plt file that was generated with a C++ software module called with R, and to replot. I am able to open the program with shell.exec (below) but I am at a loss at what do next. Any suggestions?? shell.exec(C:\\ gnuplot \\ bin \\ wgnuplot.exe) Thanks --- Tony Fristachi Battelle, Statistics Information Analysis 505 King Avenue, Rm 11-7-056 Columbus, OH 43201 Email: [EMAIL PROTECTED]mailto:[EMAIL PROTECTED] Phone: 614/424.4910 Fax: 614/458.4910 Cell: 513/375.3263 Web: http://www.battelle.org/solutions/default.aspx?Nav_Area=TechNav_SectionID=3 We are what we repeatedly do. Excellence then, is not an act, but a habit. Aristotle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and GNUPLOT
There are some interface commands in the TeachingDemos package (see ?gp.open) for communicating between R and Gnuplot. These commands are pretty basic, but looking at the code may give you some ideas of what to do next in your project (or maybe what not to do). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fristachi, Anthony Sent: Friday, August 22, 2008 12:43 PM To: r-help@R-project.org Subject: [R] R and GNUPLOT Hello, I am trying to send commands to GNUPLOT to load a .plt file that was generated with a C++ software module called with R, and to replot. I am able to open the program with shell.exec (below) but I am at a loss at what do next. Any suggestions?? shell.exec(C:\\ gnuplot \\ bin \\ wgnuplot.exe) Thanks --- Tony Fristachi Battelle, Statistics Information Analysis 505 King Avenue, Rm 11-7-056 Columbus, OH 43201 Email: [EMAIL PROTECTED]mailto:[EMAIL PROTECTED] Phone: 614/424.4910 Fax: 614/458.4910 Cell: 513/375.3263 Web: http://www.battelle.org/solutions/default.aspx?Nav_Area=TechN av_SectionID=3 We are what we repeatedly do. Excellence then, is not an act, but a habit. Aristotle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sending ... to a C external
I'm trying to figure this out with Writing R Extensions but there's not a lot of detail on this issue. I want to write a (very simple really) C external that will be able to take ... as an argument. (It's for optimizing a function that may have several parameters besides the ones being optimized.) I got the showArgs code (from R-exts) to compile and install, but it only gives me error messages when I run it. I think I'm supposed to pass it different arguments from what I'm doing, but I have no idea which ones. What exactly are CAR, CDR, and CADR anyway? Why did the R development team choose this very un-C-like set of commands? They are not explained much in R-exts. Thanks in advance for any info. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme questions re: repeated measures covariance structure
Great, thanks for your reply! I've just tracked down a copy of Pinheiro Bates (2000) so I'll look at that, too. Thanks again, Mark On Fri, Aug 22, 2008 at 9:56 AM, Christoph Scherber [EMAIL PROTECTED] wrote: Dear Mark, I would include the repeated measure as the smallest stratum in the random effects specification: random=~1|sampleunit/year Setting up user-defined variance structures should be possible using for example: weights=varPower(form=~habitat) or also try out the available corStruct() classes (found in Pinheiro and Bates 2000) HTH Christoph Mark Na schrieb: Hello, We are attempting to use nlme to fit a linear mixed model to explain bird abundance as a function of habitat: lme(abundance~habitat-1,data=data,method=ML,random=~1|sampleunit) The data consist of repeated counts of birds in sample units across multiple years, and we have two questions: 1) Is it necessary (and, if so, how) to specify the repeated measure (years)? As written, the above code does not. 2) How can we specify a Toeplitz heterogeneous covariance structure for this model? We have searched the help file for lme, and the R-help archives, but cannot find any pertinent information. If that's not possible, can we adapt an existing covariance structure, and if so how? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. . -- Dr. rer.nat. Christoph Scherber University of Goettingen DNPW, Agroecology Waldweg 26 D-37073 Goettingen Germany phone +49 (0)551 39 8807 fax +49 (0)551 39 8806 Homepage http://www.gwdg.de/~cscherb1 http://www.gwdg.de/%7Ecscherb1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sd(NA)
Hi All: This was discussed in the R-developers list (see the thread starting at http://tolstoy.newcastle.edu.au/R/e3/devel/ 07/12/0560.html). It has to do with the behavior of sd() when the entire vector is NA. The behavior has changed between 2.6 and 2.7.1 as follows: Run in Version 2.7.1 tt-rep(NA, 10) mean(tt, na.rm=T) [1] NaN sd(tt, na.rm=T) Error in var(x, na.rm=na.rm) : no complete element pairs Run in Version 2.6.1 tt-rep(NA, 10) mean(tt, na.rm=T) [1] NaN sd(tt, na.rm=T) Na If I understand the discussion, 2.7.1 fails because 'rm=T' removes the NA's so that it is now empty. What I have been unable to find in any of the mail lists is how you are suppose to program this in 2.7.1. when you are looping through a large number of variables (we have time series on a grid and are calculating the mean and sd of the time series at each grid point). Any suggestions much appreciated. -Roy M. ** The contents of this message do not reflect any position of the U.S. Government or NOAA. ** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center 1352 Lighthouse Avenue Pacific Grove, CA 93950-2097 e-mail: [EMAIL PROTECTED] (Note new e-mail address) voice: (831)-648-9029 fax: (831)-648-8440 www: http://www.pfeg.noaa.gov/ Old age and treachery will overcome youth and skill. From those who have been given much, much will be expected __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sd(NA)
Here's one approach: try_default - function(expr, default = NA, quiet = FALSE) { result - default if (quiet) { tryCatch(result - expr, error = function(e) {}) } else { try(result - expr) } result } failwith - function(default = NULL, f, quiet = FALSE) { function(...) try_default(f(...), default, quiet = quiet) } sd2 - failwith(NA, sd) sd2(NA, na.rm=T) sd3 - failwith(NA, sd, quiet = T) sd3(NA, na.rm=T) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing Rcmdr Plugins
Bernhard and Irina, While tailoring the Rcmdr menu directly does work, I strongly do not recommend it. It will lead to mass confusion, most likely for the author, several months from now when it is installed on a different machine or when R or Rcmdr is updated. It is much cleaner to have a separate package. It is also much easier to write and maintain a separate package than it is to edit the original Rcmdr-menus.txt. While the startup cost to the author of writing an RcmdrPlugin.xxx is about the same as the cost for modifying Rcmdr itself, the maintenance is fantastically simpler for a Plugin package. It is clear to the user of a Plugin package that your changes are not the original. When you modify the original Rcmdr, users will not be told explicitly that they are using a modified version. That will lead to confusion on this list. I distribute a Plugin (RcmdrPlugin.HH) on CRAN. I originally distributed it as a modification of the Rcmdr. John Fox invented plugins partially as a result of my experience. It is much better for both of us have a clear separation of his responsibilities as designer of the Rcmdr, and mine as designer of several specific functions. Rich -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pfaff, Bernhard Dr. Sent: Thursday, August 21, 2008 06:01 To: G. Jay Kerns; Irina Ursachi Cc: r-help@r-project.org Subject: Re: [R] Writing Rcmdr Plugins Dear Irina, though you asked explicitly for writing a RCommander-plugin package; I just wanted to add that the former approach of tailor-making menues in the Commander still works. That is, just copy your R file with the tcl/tk functions into the /etc directory of the RCommander and include your menu structure into the file Rcmdr-menus.txt Best, Bernhard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] History pruning
We have different starting points. Please be sure that your modularity allows a cleaned region as well as a history log to be the input to your next step. The history log is incomplete; lines sent to the *R* buffer by C-c C-n are explicitly excluded from history. Lines picked up from a saved transcript file aren't in history. Will your program handle this correctly: aa - bb + cc ? It is valid code. Suppressing the line with cc will make it not valid code. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Building colorspace on RHEL5
Dear all, I'm having problems installing the colorspace package on Red Hat Enterprise Linux 5: * Installing *source* package 'colorspace' ... ** libs gcc -std=gnu99 -I/usr/lib/R/include -I/usr/lib/R/include -I/usr/local/include-fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -c colorspace.c -o colorspace.o In file included from /usr/include/features.h:352, from /usr/include/stdlib.h:25, from /usr/lib/R/include/R.h:28, from colorspace.c:1: /usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory colorspace.c: In function 'CheckHex': colorspace.c:411: warning: implicit declaration of function 'isxdigit' colorspace.c: In function 'RGB_to_RColor': colorspace.c:1024: warning: unused variable 'Zn' colorspace.c:1024: warning: unused variable 'Yn' colorspace.c:1024: warning: unused variable 'Xn' colorspace.c: In function 'hex_to_RGB': colorspace.c:1115: warning: passing argument 1 of 'decodeHexStr' discards qualifiers from pointer target type colorspace.c: In function 'polarLAB_to_LAB': colorspace.c:218: warning: control reaches end of non-void function colorspace.c: In function 'XYZ_to_LUV': colorspace.c:314: warning: control reaches end of non-void function make: *** [colorspace.o] Error 1 I'm running 2.6.1 because lab computers can only have official rpms and uname -a gives: Linux rl102-07.lab.rice.edu 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Any ideas as to why the compile is failing? Thanks, Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building colorspace on RHEL5
Hi Hadley, on 08/22/2008 03:52 PM hadley wickham wrote: Dear all, I'm having problems installing the colorspace package on Red Hat Enterprise Linux 5: * Installing *source* package 'colorspace' ... ** libs gcc -std=gnu99 -I/usr/lib/R/include -I/usr/lib/R/include -I/usr/local/include-fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -c colorspace.c -o colorspace.o In file included from /usr/include/features.h:352, from /usr/include/stdlib.h:25, from /usr/lib/R/include/R.h:28, from colorspace.c:1: /usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory It looks like the above (gnu/stubs-32.h) is the key missing file with the required headers preventing compilation. colorspace.c: In function 'CheckHex': colorspace.c:411: warning: implicit declaration of function 'isxdigit' colorspace.c: In function 'RGB_to_RColor': colorspace.c:1024: warning: unused variable 'Zn' colorspace.c:1024: warning: unused variable 'Yn' colorspace.c:1024: warning: unused variable 'Xn' colorspace.c: In function 'hex_to_RGB': colorspace.c:1115: warning: passing argument 1 of 'decodeHexStr' discards qualifiers from pointer target type colorspace.c: In function 'polarLAB_to_LAB': colorspace.c:218: warning: control reaches end of non-void function colorspace.c: In function 'XYZ_to_LUV': colorspace.c:314: warning: control reaches end of non-void function make: *** [colorspace.o] Error 1 I'm running 2.6.1 because lab computers can only have official rpms and uname -a gives: Linux rl102-07.lab.rice.edu 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Any ideas as to why the compile is failing? At least on F9: $ locate stubs-32.h /usr/include/gnu/stubs-32.h $ rpm -q --whatprovides /usr/include/gnu/stubs-32.h glibc-devel-2.8-8.i386 So: # yum install glibc-devel should take care of it, presuming that you are not missing anything else... HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Newbie programming help
All - Not sure if this is a real programming question, but here goes: I have data that looks like LakeLength Weight 1 158 45 1 179 70 1 200 125 1 202 150 1 206 145 1 209 165 1 210 140 1 215 175 1 216 152 1 220 150 1 221 165 ... where lake goes from 1 - 84 and the number of rows for each lake is variable (but ~20). I'm trying to do two things: 1) build a simple linear model of the form {lm(log10(Weight)~log10(Length)} for every separate lake in the data set; 2) I'd like to save the intercepts and slopes from each of these linear regressions into a seperate data frame. Any ideas? I think it would probably require some kind of 'for' statement, but I'm just not that smart. Thanks for your help, SR Steven H. Ranney Graduate Research Assistant (Ph.D) USGS Montana Cooperative Fishery Research Unit Montana State University PO Box 173460 Bozeman, MT 59717-3460 phone: (406) 994-6643 fax: (406) 994-7479 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help needed for HWE.exact in library genetics
Sun, Ying asked: I have a genotype data for both case and controls and would like to calculate the HW p-value. However, since the number of one genotype is 0, I got wierd result. Would someone help me to figure it out? Or confirm it's right? Thanks a lot. ... HWE.exact(g1) Exact Test for Hardy-Weinberg Equilibrium data: g1 N11 = 71, N12 = 9, N22 = 0, N1 = 151, N2 = 9, p-value = 1 Yes, that is correct. Double check it by calculating the goodness-of-fit chi-square for the same hypothesis: p - 151/160; q - 1-p; 80*c(p^2,2*p*q,q^2) [1] 71.253125 8.493750 0.253125 ... David Duffy. -- | David Duffy (MBBS PhD) ,-_|\ | email: [EMAIL PROTECTED] ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie programming help
On 8/22/2008 5:34 PM, Ranney, Steven wrote: All - Not sure if this is a real programming question, but here goes: I have data that looks like Lake Length Weight 1 158 45 1 179 70 1 200 125 1 202 150 1 206 145 1 209 165 1 210 140 1 215 175 1 216 152 1 220 150 1 221 165 ... where lake goes from 1 - 84 and the number of rows for each lake is variable (but ~20). I'm trying to do two things: 1) build a simple linear model of the form {lm(log10(Weight)~log10(Length)} for every separate lake in the data set; 2) I'd like to save the intercepts and slopes from each of these linear regressions into a seperate data frame. Any ideas? I think it would probably require some kind of 'for' statement, but I'm just not that smart. Assuming the data are in a data frame called mydf: library(nlme) fm1 - lmList(log10(Weight)~log10(Length) | Lake, mydf) coef(fm1) ?lmList or t(sapply(split(mydf, mydf$Lake), function(x){coef(lm(log10(Weight)~log10(Length), data=x))})) Thanks for your help, SR Steven H. Ranney Graduate Research Assistant (Ph.D) USGS Montana Cooperative Fishery Research Unit Montana State University PO Box 173460 Bozeman, MT 59717-3460 phone: (406) 994-6643 fax: (406) 994-7479 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using interactive plots to get information about data points
I have been experimenting with interactive packages such iplots and playwith. Consider the following sample dataset: A B C D 1 5 5 9 3 2 8 4 1 7 3 0 7 2 2 6 Let's say I make a plot of variable A. I would like to be able to click on a data point (e.g. 3) and have a pop-up window tell me the corresponding value for variable D (e.g. 4). I am also trying to produce multiple small plots. For example, four side-by-side boxplots for each of the four variables A, B, C, D. Any help would be greatly appreciated! Thanks! -- View this message in context: http://www.nabble.com/Using-interactive-plots-to-get-information-about-data-points-tp19116114p19116114.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to handle ~ character after csv importation
Dear R users, I have to import some csv files in which column headers contain the character ~. Following the following import call, the character seems to be replaced by dots in the column names of my data frame. Plus, I cannot query names(mydata) to find the column index which header should contain ~ or . mydata - data.frame(read.table(mycsv.csv, sep = ,, header = TRUE, na.strings = ., as.is = TRUE)) mydata P1 P2 P3 P1.P2 P1.P3 P2.P3 A B C 1 1 2 3 4 5 6 7 8 9 names(mydata) [1] P1P2P3P1.P2 P1.P3 P2.P3 A B C grep(~,names(mydata)) integer(0) grep(.,names(mydata)) [1] 1 2 3 4 5 6 7 8 9 How can I check for this character ? And by the way, can anyone explain to me the result of the last grep call? Thank you in advance. Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dimension values of an array
Question: I created an array with 164 columns and 70 rows. I would like every column to have unit value 1 and the rows 0.1, 0.2, 0.3 up to 7. How can I define that? Thanks for your help! Good weekend! Akko _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlme gls front end crash
Hi, when I try to run GLS, R crashes. I have seen on the mailing list that somebody has encountered the problem before ( http://tolstoy.newcastle.edu.au/R/help/05/01/10915.html ), but there seemed to be no solution / the thread appears incomplete. The program runs gls1 (without correlation structure; code below) and crashes when I run gls2 (with correlation structure; code further below). Windows then shows a standard crash window (the one where Windows asks to send an error report to Microsoft) saying something like: R GUI has encountered a problem and has to close. We are sorry for the inconvenience... The crash seems to depend on the specific correlation structure. It does not crash with all correlation structures. My data has about 2000 data rows. Is this a memory problem? Thanks for any advice. Daniel The R code I am running is the following newdata.new=groupedData(cash.end~rs|subj,data=data.new) cs1 - corARMA(value=c(0.9),p=1,form=~rs|subj) cs1 - Initialize(cs1, data = newdata.new) corMatrix(cs1) gls1=gls(log(cash.end+1)~factor(rs)+nogs+nols+ret+factor(poc)+posh+posl+negh +negl,data=newdata.new,na.action=na.omit) gls2=gls(log(cash.end+1)~factor(rs)+nogs+nols+ret+factor(poc)+posh+posl+negh +negl,data=newdata.new,na.action=na.omit,corr=cs1) And the session info is: R version 2.6.2 (2008-02-08) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] nlme_3.1-87 loaded via a namespace (and not attached): [1] grid_2.6.2 lattice_0.17-4 - cuncta stricte discussurus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Visualization of two-mode matrix/ networks/ graphs
Jakob, Could you please tell me how you got your 2-mode data INTO R/statnet/sna, etc... ? I can do many things with this data in ucinet, and even export it to pajek, but I cannot get it to import even using the read.paj routine. I have about 620 actors involved in 230 events. When I convert it to an actor only network, I also cannot get rid of the arrows! I am trying to import from a Pajek .net file Success so far with drawing the two modes differently in UCInet, but I want to do more statnet work on these networks, and also use the event timestamps to show dynamic features of the network down the line. Anyone? -Brendan Casey Jakob Mumm wrote: Hey all, I'm looking for a plotting method for two-mode networks. Having a n x m matrix where n is the first set of entities/ actors and m representing the second set, I want to colour both sets of actors differently. I tried plot.network from the network-package, but I did not succeed. Even the proposed solution from Gabor Grothendiek (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/87003.html ) seems not well suited for the problem. Further, I'm wondering how to interpret vertex.col : color for vertices; may be given as a vector or a vertex attribute name, if vertices are to be of different colors. . For the latter solution via attribute name the documentation gives the following example vertex.col=2+(network.vertex.names(nflo)==Medici which is not a problem, but how to use the former solution via vector? The command gplot out of the sna-package should have incorporated a way of considering two-mode matrix via gmode==twomode (see help for gplot), but then one has to work with graphs and will loose (so far I have caught it) information which is stored in the class of network. For example, take the following adjacency matrix, where 1-6 (rows) is the first set of entities and 7-9 (cols) the second: 7 8 9 1 0 0 0 2 1 0 0 3 1 0 0 4 0 0 0 5 0 0 0 6 0 0 0 Thanks in advance, Jakob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Visualization-of-two-mode-matrix--networks--graphs-tp12959659p19114539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sending ... to a C external
Le vendredi 22 août 2008 à 15:16 -0400, John Tillinghast a écrit : I'm trying to figure this out with Writing R Extensions but there's not a lot of detail on this issue. I want to write a (very simple really) C external that will be able to take ... as an argument. (It's for optimizing a function that may have several parameters besides the ones being optimized.) !!! That's a hard one. I have never undertaken this kind of job, but I expect that your ... argument, if you can reach it from C (which I don't know) will be bound to a Lisp-like structure, notoriously hard to decode in C. Basically, you'll have to create very low level code (an duplicate a good chunk of the R parser-interpreter...). I'd rather treat the ... argument in a wrapper that could call the relevant C function with all arguments interpreted and bound... This wrapper would probably be an order of magnitude slower than C code, but two orders of magnitude easier to write (and maintain !). Since ... argument parsing would be done *once* before the grunt work is accomplished by C code, the slowdown would (probably) be negligible... I got the showArgs code (from R-exts) to compile and install, but it only gives me error messages when I run it. I think I'm supposed to pass it different arguments from what I'm doing, but I have no idea which ones. What exactly are CAR, CDR, and CADR anyway? Why did the R development team choose this very un-C-like set of commands? 'Cause they are fundamental to the Lisp-like language that is S/R. Read on... They are not explained much in R-exts. At the risk of incurring the R-help deities wrath : In the beginning, John Mc Carthy created Lisp 1.5, who begat MACLISP who begat ..., who begat Scheme ... . Nonwhistanding a long inheritance story, full of sound, fury, holy wars and bastardry (sometimes evoking European royalty lines...), all Lisp-like language bear one mark (dominant allele) of their common ancestor, which is the list structure. This structure is implemented as a pair of pointers, the first one pointing to the first element of the list, the second pointing to ... the list of the rest of the members (or nothing, aka NIL). In McCarthy's implementation, which was machine code on an IBM 704 (we were in 1957, mind you...) such word was spanned among different parts of a CPU word, known in the technical documentation of this dinosaur as address register and decrement register respectively. Thus the mnemonic names of Content of the Address Register and Content of Decrement Register (aka CAR and CDR) for these two pointers, names which were used for the (lisp) functions allowing to access them. Those names have stuck mainly for historical (sentimental ? hysterical ? ) reasons. Even when reasonable synonyms were introduced (e. g. first and rest in Common Lisp), all the old hands (and young dummies who wanted to emulate them) kept car and cdr close to their hearts. So, fifty one years after McCarthy stroke of genius, this piece of CS snobbery is still with us (and probably will 'til 64-bit Linux time counters roll over...). Despite its C-like syntax, its huge collection of array and array-like structures, its loops, S and R are fundamentally Lisp-like languages. Most notably, the representation of executable code is accessible by the code itself : one can create a function which computes another function. Lisp was the first language explicitly created for this purpose, and it is no happenstance that it (or one of his almost uncountable dialects) that many (most ?) of such language use Lisp fundamentals. Many of R/S common way of doing things have a Lisp smell : for example, it is no chance that {s|t|l}apply(), outer() and suchlike are *way* more efficient than for(), and they are strongly reminescent of Lisp's mapcar... BTW, this ability to compute the language is probably the most fundamental point distinguishing S/R from all the rest of statistical packages (SPSS, Stata, SAS and the rest of the crowd... Now I have to say that I'm not old enough to have first-hand knowledge of this history. I was born, but not weaned, when McCarthy unleashed Lisp on an unsuspecting world ... I learned that ca. 1978-9, while discovering VLisp. While I can't really help you (I still think that processing ... at C level is either hubris of the worst sort, pure folly or a desperate case), I hope to have entertained you. Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Deleting NA in list()
Dear useRs, I would like to know the way of deleting NA in list(). Following is a example. lt - list(a = 1:3, b = NA, c = letters[1:3], d = NA) for(i in length(lt):1) { if(is.na(lt[[i]])) lt[[i]] - NULL } How to simplify for() loop by using (l)apply family? Thank you in advance. = Dong-hyun Oh Center of Excellence for Science and Innovation Studies Royal Institute or Technology, Sweden e-mail: [EMAIL PROTECTED] cel: +46 73 563 45 22 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R
Hi my name is leidy. I am a student of sistem ingenier, i do not speak english. Now I am working in a proyect using R but with a interface of java, I have a problem because a need acces to p s-box.cox.powers(resp,2) s Box-Cox Transformation to Normality Est.Power Std.Err. Wald(Power=0) Wald(Power=1) -0.0419 0.0274-1.528 -38.0037 L.R. test, power = 0: 2.3469 df = 1 p = 0.1255 L.R. test, power = 1: 1732.1631 df = 1 p = 0 This is the code then I need get the p value in each test when the power is 0 and 1 hi my name is Leidy, I am a student of engineering systems, actualmete I'm working on a project using academic R but I have had some drawbacks. I ask in advance excuse because I do not speak English my problem is as follows, I need to get values p testing with power and power = 0 = 1 ie in this case I need to get the 0.1255 and 0 but not with the instruction that I get. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with rbind
I am trying to use rbind to have the two data on the top of each other but I am getting an extra X on the column header and the rows are numberd , How to get rid of this problem? I appreciate your help x1- read.table(file=data1.txt, header=T, sep=\t) x2-read.table(file=data2.txt, header=T, sep=\t) y-rbind(x1,x2) y X0 X0.0.05 X0.05.0.1 X0.1.0.2 X0.2.0.3 X0.3.0.4 X0.4.0.5 1 0.1971 0.108 0.1 0.1 0.156 0.38 0.22 2 0.2024 0.106 0.0 0.1 0.15 0.1408173 0.55 -- View this message in context: http://www.nabble.com/problem-with-rbind-tp19116234p19116234.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple look up
WTF mate. you think im about wasting time. I been seeking a solution for this for two weeks. excuse my lack of programming savvy, but this is why i posted. bartjoosen wrote: Please take a look at the manuals before posting (DO read the posting guide!) Bart PS: yourdataframe[yourdataframe$TAZ==103,2] PDXRugger wrote: So i am hoping this solution is simple, which i beleive it is. I would like to look up a value in one column of a data set and display the corresponding value in the 2nd column. For example TAZVACANT ACRES 100 45 101 62 102 23 103 64 104 101 105 280 So if i want to find TAZ 103 it would give me 64 as the corresponding value. Hope you guys can help Cheers -- View this message in context: http://www.nabble.com/Simple-look-up-tp19098777p19113270.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.