[R] rgamma function
Hi, Has anyone encountered the problem of rgamma function in C? The following simplified program always dies for me, and I wonder if anyone can tell me the reason. #include Rmath.h #include time.h #include Rinternals.h SEXP generateGamma () { srand(time(NULL)); return (rgamma(5000,1)); } Has anyone encountered a similar problem before? Is there another way of generating Gamma random variable in C? P.S. I have no problem compiling and loading this function in R. Thanks for suggestions in advance! --Chandler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] set Rprofile.site,can't work
my system :debian in console: nano /home/tiger/R-2.15.1/etc/Rprofile.site here is my content: .First - function(){ cat(\nWelcome at, date(), \n) } # .Last - function(){ cat(\nGoodbye at , date(), \n) } when i save it ,reopen my R , why there is no Welcome at Sun Jul 15 07:53:58 2012 in my R? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can't understand syntax
OK, I need help!! I've been searching, but I don't understand the logic of some this dataframe addressing syntax. What is this type of code called? test [[v3]] [is.na(test[[v2]])] -10 #choose column v3 where column v2 is == 4 and replace with 10 and where is it documented? The code below works for what I want to do (find the non-missing value in a row), but why? test - read.table(text= v1 v2 v3 result 3 NA NA NA NA 3 NA NA NA NA 3 NA , header=TRUE) test [[result]] [!(is.na(test[[v1]]))] - test [[v1]] [!(is.na (test[[v1]]))] test [[result]] [!(is.na(test[[v2]]))] - test [[v2]] [!(is.na (test[[v2]]))] test [[result]] [!(is.na(test[[v3]]))] - test [[v3]] [!(is.na (test[[v3]]))] thanks! On Fri, Jul 13, 2012 at 6:41 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Check the structure of what you have, df and newdf. You will see that in df dateTime is of class POSIXlt and in newDf newDateTime is of class POSIXct. Solution: [...] df$dateTime - strptime(df$dateTime,%m/%d/%Y %H:%M) df$dateTime - as.POSIXct(df$dateTime) [...] Hope this helps, Rui Barradas Em 13-07-2012 10:24, vioravis escreveu: I have the following dataframe with the first column being of type datetime: dateTime - c(10/01/2005 0:00, 10/01/2005 0:20, 10/01/2005 0:40, 10/01/2005 1:00, 10/01/2005 1:20) var1 - c(1,2,3,4,5) var2 - c(10,20,30,40,50) df - data.frame(dateTime = dateTime, var1 = var1, var2 = var2) df$dateTime - strptime(df$dateTime,%m/%d/%Y %H:%M) I want to create 10 minute interval data as follows: minTime - min(df$dateTime) maxTime - max(df$dateTime) newTime - seq(minTime,maxTime,600) newDf - data.frame(newDateTime = newTime) newDf - merge(newDf,df,by.x = newDateTime,by.y = dateTime,all.x = TRUE) The objective here is to create a data frame with values from df for the datetime in df and NA for the missing ones. However, I am getting the following data frame with both Var1 and Var2 having all NAs. newDf newDateTime var1 var2 1 2005-10-01 00:00:00 NA NA 2 2005-10-01 00:10:00 NA NA 3 2005-10-01 00:20:00 NA NA 4 2005-10-01 00:30:00 NA NA 5 2005-10-01 00:40:00 NA NA 6 2005-10-01 00:50:00 NA NA 7 2005-10-01 01:00:00 NA NA 8 2005-10-01 01:10:00 NA NA 9 2005-10-01 01:20:00 NA NA Can someone help me on how to do the merge based on the two datetime columns? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/** Merging-on-Datetime-Column-**tp4636417.htmlhttp://r.789695.n4.nabble.com/Merging-on-Datetime-Column-tp4636417.html Sent from the R help mailing list archive at Nabble.com. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Charles Stangor Professor and Associate Chair [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROC curves with ROCR
Hi, I don't really understand how ROCR works. Here's another example with a randomforest model: I have the training dataset(bank_training) and testing dataset(bank_testing) and I ran a randomForest as below: bankrf-randomForest(y~., bank_training, mtry=4, ntree=2, keep.forest=TRUE,importance=TRUE) bankrf.pred-predict(bankrf, bank_testing) library(ROCR) pred-prediction(bankrf.pred$y, bank_testing$y) Here I get the error that the prediction format is incorrect? Where is the mistake? Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/ROC-curves-with-ROCR-tp4636435.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] OT: Where's the new Tukey?
I'm looking for a single book that provides a deep, yet readable introduction to applied data analysis for general readers. I'm looking for coverage on things like understanding randomness, natural experiments, confounding, causality and correlation, data cleaning and transforms, lagging, residuals, exploratory graphics, curve fitting, descriptive stats Preferably with examples/case studies that illustrate the art and craft of data analysis. No proofs or heavy math. What have you got? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loading in Large Dataset + variables via loop
Hello, I'm new to R with a (probably elementary) question. Suppose I have a dataset called /A/ with /n/ locations, and each location contains within it 3 time series of different variables (all of 100 years length); each time series is of a weather variable (for each location there is a temperature, precipitation, and pressure). For instance, location 1 has a temperature1 time series, a precip1 time series, and a pressure1 time series; location two has a temperature2, precip2, and pressure2 timeseries...That is, there are 100 rows, and (/n/*3)+1 columns. The extra column is the time. I want to load in this dataset and declare a variable for each time series. The columns are in order of location, so it goes temp1, precip1,pressure1, temp2,... and so forth in increasing column order. There are always 100 rows. Manually, Id have to do: temp1=A[,1] precip1=A[,2] pressure1=A[,3] temp2=A[,4] precip2=A[,5] pressure2=A[,6] temp3=A[,7] and so forth. Problem is, n is large, so I don't want to repeat this pattern forever. I figure I need a loop both for the variable name (ie.., the variable at a particular location) as well as for what column it reads from. Any help...? -- View this message in context: http://r.789695.n4.nabble.com/Loading-in-Large-Dataset-variables-via-loop-tp4636501.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PLSR AND PCR ISSUES
Dear all, Please I am working on PCR and PLSR with pls package and my issue is the command to extract components. Please help with a solution. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help for Fisher's exact test
Hi, R-help, I have a group of data from RNA-seq want to be analyzed by Fisher's exact test in R. I want to compare the significant difference of about 30, individuals in two different samples, and I have no idea how to use R, so could you please give me some suggestions or the scripts for Fisher's exact test? Thank you very much. Best, Guanfeng Wang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM Chi-Square Difference Test
We are using GAM in mgcv (Wood), relatively new users, and wonder if anyone can advise us on a problem we are encountering as we analyze many short time series datasets. For each dataset, we have four models, each with intercept, predictor x (trend), z (treatment), and int (interaction between x and z). Our models are Model 1: gama1.1 - gam(y~x+z+int, family=quasipoisson) ##no smooths Model 2: gama1.2 - gam(y~x+z+s(int, bs=cr), family=quasipoisson) ##smooth the interaction Model 3: gama1.3 - gam(y~s(x, bs=cr)+z+int, family=quasipoisson) ##smooth the trend Model 4: gama1.4 - gam(y~s(x, bs=cr)+z+s(int, bs=cr), family=quasipoisson) ##smooth trend and interaction We have three questions. One question is simple. We occasionally obtain edf =1 and Ref.df=1 for some smoothed predictors (x, int). Because Wood says that edf can be interpreted roughly as functional form (quadratic, cubic etc) + 1, this would imply x^0 functional form for the predictor, and that doesn't make a lot of sense. Does such a result for edf and rdf indicate a problem (e.g., collinearity) or any particular interpretation? The other two questions concern which model fits the data best. We do look at the usual various fit statistics (R^2, Dev, etc), but our question concerns using the anova function to do model comparisons, e.g., anova(gama2.1,gama2.2, test=Chisq). 1. Is there research on the power of the model comparison test? Anecdotally, the test seems to reject the null even in cases that would appear to have only small differences. These are not hugely long time series, ranging from about 17 to about 49, so we would not have thought them to yield large power. 2. More important, in a few cases, we are getting a result that looks like this: anova(gamb1.1,gamb1.2, test=Chisq) Analysis of Deviance Table Model 1: y ~ x + z + int Model 2: y ~ x + z + s(int, bs = cr) Resid. Df Resid. Dev Df Deviance P(|Chi|) 130 36.713 230 36.713 1.1469e-05 1.0301e-05 6.767e-05 *** We are inclined to think that the significance p value here is simply a result of rounding error in the computation of the df difference and deviance difference, and that we should treat this as indicating the models are not different from each other. Has anyone experienced this before? Is our interpretation reasonable? Thanks to anyone who is able to offer advice. Will Shadish -- View this message in context: http://r.789695.n4.nabble.com/GAM-Chi-Square-Difference-Test-tp4636523.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] alternate tick labels and tick marks with lattice xyplot
Hi, I would like to use xyplot to create a figure. Unfortunately, I cannot find documentation in xyplot to specify alternating the x-axis tick labels with the x-axis tick marks. I can do this with the regular R plot function as follows. #A small version of my data looks like this data-data.frame(matrix(ncol=3,nrow=12)) data[,1]-rep(c(1,2,3),c(4,4,4)) data[,2]-rep(c(1,2,3,4),3) data[,3]-runif(12,0,1) names(data)-c(Chromosome, BasePair, Pvalue) #using R's plot function, I would place the the chromosome label between the #tick marks as follows: v1-c(4,8) v2-c(2,6,10) data$indice-seq(1:12) plot(data$indice, -log10(data$Pvalue), type=l, xaxt=n, main=Result, xlab=Chromosome, ylab=expression(paste(-log[10], p-value))) axis(1, v1,labels=FALSE ) axis(1, v2, seq(1:3), tick=FALSE, cex.axis=.6) Can this be done with lattice xyplot? -- Leah Preus Biostatistician Roswell Park Cancer Institute [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] functions of vectors : loop or vectorization
I have a read a lot about the benefits of vectorization in R. I have a program that takes almost forever to run. A good way to see if I have learned something ... My problem can be summarized like this : I have a nonlinear function of several variables that I want to optimize over one letting the other describe a family of curves. In short, I wan't to optimize f(x,a,b) for several values of a and b. It is easily done with a loop. Here's an example : a = 1:5; b = 1:5; myfunction = function(x){y*x-(x+z)^2}; myresults = array(dim=c(length(a),length(b))); for(y in a){ for(z in b) { myresults[y,z] = optimize(myfunction,c(-10,10),maximum=TRUE)$maximum }}; myresults; [,1] [,2] [,3] [,4] [,5] [1,] -0.5 -1.5 -2.5 -3.5 -4.5 [2,] 0.0 -1.0 -2.0 -3.0 -4.0 [3,] 0.5 -0.5 -1.5 -2.5 -3.5 [4,] 1.0 0.0 -1.0 -2.0 -3.0 [5,] 1.5 0.5 -0.5 -1.5 -2.5 Of course, my real life problem is a bit more complicated and runs in days ... I didn't find a straightforward way to do this using the apply family. I did a small script that works. Here it is : c = 1:5; d = 1:5; myfunction2 = function(c,d){optimize(function(x){c*x-(x+d)^2},c(-10,10),maximum=TRUE)$maximum}; v.myfunction2 = Vectorize(myfunction2, c(c,d)); outer(c, d, v.myfunction2); all.equal(myresults,outer(c, d, v.myfunction2)); [1] TRUE I was quite happy with my trick of separating and wrapping the functions until I increased the size of the two input vectors and checked for the processing time. I made no gain. In that case : time.elapsed; time.elapsed2; Time difference of 0.0816 secs Time difference of 0.0792 secs When I changed the size of the vectors and added a logarithm here and there to complicate a bit, it doesn't change the problem. The two methods perform identically. Am I missing something ? Is there a better way to vectorize the problem to gain time ? How is it that my loop performs as well as outer ? Thanks in advance for your help. All the best, Julien -- View this message in context: http://r.789695.n4.nabble.com/functions-of-vectors-loop-or-vectorization-tp4636494.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R combining many vectors of predictable name into one date frame
G'day R (power) users, I have a many vectors, called: ib1 ib2 ib3 ... ib100 and I would like them in one data frame (df) such that: df ib1 ib2 ib3 ib4 . ib100 x x xxx x x xxx x x xxx I have attempted: hold.list - list(objects(pattern=ib)) df - data.frame(hold.list) but that didn't work also: do.call(rbind, (objects(pattern=ib))) and that also didn't work. I tried a whole pile of other things, where I also failed. The number of vectors might differ each time I want to make the data frame, so that in the example above, I have ib1 : ib100, but next time, I might only have ib1 : ib2 Below is my (probably somewhat embarrassing) example script for generating the vectors in the first place. Commented out toward the end are a few attempts at doing the job I wanted to do. temp - runif(100) tripID - rep(1:10, 10) uni - rep(1:4, 25) temp - data.frame(temp, tripID, uni) trips - unique(temp$tripID) uni - unique(temp$uni[temp$tripID==trips[1]]) for (jj in 1:length(uni)){ a - c() for (ii in 1:10){ a - c(a, IQR(temp$temp[temp$uni %in% sample(uni,jj)])) assign(paste(ib,jj,sep=), a) # ib is short for ibuttons. The number is how many were used to calc IQR } # hold.list - list(objects(patter=ib)) # trip - data.frame(list=hold.list # I am trying to put everything into a dataframe # do.call(rbind, list=hold.list) # do.call(rbind, list(objects(pattern=ib))) } thanks heaps if you can help. And sorry if this is mostly garble. This is my first crack at soliciting help from the list. cheers, mat -- Mathew Vickers PhD Student James Cook University CSRIO Australia, mate. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quantile Regression - Testing for Non-causalities in quantiles
Dear all, I am searching for a way to compute a test comparable to Chuang et al. (Causality in Quantiles and Dynamic Stock Return-Volume Relations). The aim of this test is to check wheter the coefficient of a quantile regression granger-causes Y in a quantile range. I have nearly computed everything but I am searching for an estimator of the density of the distribution at several points of the distribution. As the quantreg-package of Roger Koenker is also able to compute confidence intervalls for quantile regression (which also contain data concerning the estimated density) I wanted to ask wether someone could tell me if it is possible to extract the density of the underlying distribution by using the quantreg package. I hope my question is not to confusing, thank you very, very much in adavanve I appreciate every comment=) Cheers Stefan -- View this message in context: http://r.789695.n4.nabble.com/Quantile-Regression-Testing-for-Non-causalities-in-quantiles-tp4636511.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] set Rprofile.site,can't work
On 12-07-14 7:54 PM, 水静流深 wrote: my system :debian in console: nano /home/tiger/R-2.15.1/etc/Rprofile.site here is my content: .First- function(){ cat(\nWelcome at, date(), \n) } # .Last- function(){ cat(\nGoodbye at , date(), \n) } when i save it ,reopen my R , why there is no Welcome at Sun Jul 15 07:53:58 2012 in my R? That works for me, so I'd guess you've put the changes in the wrong place. What do you have in your R_PROFILE environment variable? What about R_HOME? You should look at these from within R, using Sys.getenv(). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to pool imputed data sets with latent class analysis and binary logistic regression
Dear All, I've used mice package for my latent class analysis and binary logistic regression I've imputed five data sets and with long format I've added new variable that shows latent class membership. And then in addition to other variables, I'll use binary logistic regression and try to pool the estimates. However I couldn't create data.frame to mids objects, and therefore it produced the error below: Error in pool(fit) : The object must have class 'mira' Do you have any suggestions? I'd appreciated if you have time and respond my e-mail. Bests, Niklas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HLOOKUP in R
Hi, Is there a function similar to excel's hlookup in R ? Thanks, Silje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
Dear Peter, thanks for your clarifications. Sample size is around 200 in each group. Would that justify your approach? I found a couple of more tests for scale on continous variables, ie. Mood Test Ansari-Bradley Test (that one is also implemented in R) Klotz Test Conover Test Would one of those be suitable to test for different dispersion (e.g. IQR or the like) in non-normal distributions? thanks, joerg Von: peter dalgaard [pda...@gmail.com] Gesendet: Samstag, 14. Juli 2012 10:01 Bis: Prof Brian Ripley Cc: Greg Snow; R-help; Schaber, Jörg Betreff: Re: [R] significance test interquartile ranges On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote: On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). (Brian knows this, of course, but I though it useful to insert a little quibbling.) Sensitive is perhaps a little misleading here. The test statistic in the Wilcoxon test is essentially an estimate of the probability that a random observation in one group is bigger than a random observation in the other group. It isn't hard to imagine situation where that quantity is unaffected by a dispersion change so the test is not sensitive in the sense that it can detect dispersion changes between sufficiently large samples. However, the point is that p values _rely on_ the null hypothesis that two distributions are exactly the same. This is mostly uncontroversial if you are testing for an irrelevant grouping, but if you need confidence intervals for the difference, you are implicitly assuming a location-shift model. The same thing is true for permutation tests in general: You need to be rather careful about what the assumptions are that allows you to interchange things. Asymptotically, the distribution of the IQR depends on the values of the density at the true quartiles. These could be different in the two groups, and easily completely unrelated to those of a pooled sample. I think that I would suggest finding an error estimate for the IQR (or maybe log IQR) in each group separately, perhaps by bootstrapping, and then compare between groups with an asymptotic z test. The main caveat is whether you have sufficiently large sample sizes for asymptotics to hold. Peter D. I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list
[R] read mignight as 24:00 and not as 0:00
Dear all, I have dataset which contains date and time in the format yearmonthdayhour. I can read in these data correctly as follows: mydata - read.csv(pm10_corine_gridcel_hourly_2011.csv, header = TRUE) mydata$date - as.POSIXct(strptime(mydata$date, format = %Y%m%d%H, tz=UTC)) However, midnight is defined as 24:00 in my original file (so the end of the day), while the POSIXct function changes this to 0:00 (the beginning of the next day). So, my data now go from January 1 2011 1:00 to Januari 1 2012 0:00, in stead of December 31 2011 24:00. summary(mydata$date) Min. 1st Qu.Median 2011-01-01 01:00:00 2011-04-02 06:45:00 2011-07-02 12:30:00 Mean 3rd Qu. Max. 2011-07-02 12:30:00 2011-10-01 18:15:00 2012-01-01 00:00:00 I would like to change this 0:00 to 24:00 again since I want to include these values in daily averages of the previous day (and not of the next day). So the day of the month should also be diminished by 1. I have tried extracting the hours which are 0 and converting them to 24, but then I can't paste them back in the date/time of the original data.fram again. Are there maybe other solutions? Thanks in advance, Sandy ifelse (as.POSIXlt(mydata[24,1])$hour = 0,as.POSIXlt(mydata[24,1])$hour = 24 -- View this message in context: http://r.789695.n4.nabble.com/read-mignight-as-24-00-and-not-as-0-00-tp4636423.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum number of patterns and speed in grep
Here's some data (which should give you the error messages): # read in data data - read.csv(https://dl.dropbox.com/u/13631687/data.csv;, header = T, sep = ,) # first paste all data data1 - paste(data[,1], collapse = |) # second paste subsets of the data data2a - paste(data[1:750,1], collapse = |) data2b - paste(data[751:1500,1], collapse = |) # define the object to be searched text - c(the first is Santa Fe Gold Corp, the second is Starpharma Holdings) # match strapplyc(text, data1) strapplyc(text, data2a) strapplyc(text, data2b) Thanks in advance! Math Gabor Grothendieck wrote On Fri, Jul 13, 2012 at 9:40 AM, mdvaan lt;mathijsdevaan@gt; wrote: Thanks, I see that it is working in the sample data. My data, however, gives me an error message: data - strapplyc(text, batch[[l]]) Error in structure(.External(dotTcl, ..., PACKAGE = tcltk), class = tclObj) : [tcl] couldn't compile regular expression pattern: parentheses () not balanced. batch[[l]] is similar to your re string except that there is a larger variety of characters. I haven't been able to figure out which characters are causing trouble here. Any thoughts? Thank you very much. Math ... __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Note part on last line about posting reproducible code. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4636472.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
Thanks for your suggestions! The Siegel Tukey test and the permutation test sound promising, indeed. I applied the wilcoxon test already, but understood that it mainly tests differences in the medians (location), even though being sensitive to all kinds of differences between distributions, similar to the K-S test. I once heard that the K-S test is more sensitive to differences in the tails between distributions, whereas the U-test is more sensitive to differences in location in general. Can some knowledgeable statistician comment on that? I do not understand the concern of Brian, saying that the permutation test suggested by Greg tests equality in distribution. When the test statistic is the ratio of IQRs, the permutation test calucates the p-value of this ratio under the null hypothesis that group label does not matter, i.e. that they are equal, right? But I am probable not knowledgeable statistician enough to judge that. best, joerg Von: Prof Brian Ripley [rip...@stats.ox.ac.uk] Gesendet: Samstag, 14. Juli 2012 08:16 Bis: Greg Snow Cc: Schaber, Jörg; R-help Betreff: Re: [R] significance test interquartile ranges On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compilation Error with Rcpp
Hello, I am trying to reproduce a code example from http://www.babelgraph.org/wp/?p=358 babelgraph when compiling the function to call the C++ code I get the following error: Error in compileCode(f, code, language = language, verbose = verbose) : Compilation ERROR, function(s)/method(s) not created! In addition: Warning message: running command 'C:/PROGRA~1/R/R-215~1.0/bin/i386/R CMD SHLIB file141c7ac23195.cpp 2 file141c7ac23195.cpp.err.txt' had status 1 Has anyone an idea what this means? Its not clear to me what the error would be. I doubt its a source code error, but am happy to provide the source if necessary. My sessioninfo: R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] inline_0.3.8 Rcpp_0.9.10 plyr_1.7.1 loaded via a namespace (and not attached): [1] tools_2.15.0 Thanks Sven -- View this message in context: http://r.789695.n4.nabble.com/Compilation-Error-with-Rcpp-tp4636522.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HLOOKUP in R
What does Excel's HLOOKUP do? On Saturday, July 14, 2012, Silje Nord wrote: Hi, Is there a function similar to excel's hlookup in R ? Thanks, Silje __ R-help@r-project.org javascript:; mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Hobart, Australia e-mail: mdsum...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
On Jul 14, 2012, at 19:58 , Schaber, Jörg wrote: Dear Peter, thanks for your clarifications. Sample size is around 200 in each group. Would that justify your approach? It's certainly better than 10... I did a small check on the IgM data from the ISwR package (298 obs.) and found something somewhat amusing: Discretization effects can kick in rather profoundly with data sets of that magnitude. The IgM data are discretized to 1 decimal digit, which is fairly common for continuous data in practice table(IgM) IgM 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2 2.1 3 7 19 27 32 35 38 38 22 16 16 6 7 9 6 2 3 3 3 2 2.2 2.5 2.7 4.5 1 1 1 1 summary(IgM) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.100 0.500 0.700 0.803 1.000 4.500 IQR(IgM) [1] 0.5 However, if we want to look at the sample distribution of a quantile, we get some curious effects as the variation of the estimate is close to the discretization error. Try a simple bootstrap sample from the empirical CDF: medians - replicate(1,median(sample(IgM,replace=T))) table(medians) medians 0.6 0.65 0.7 0.75 0.8 136 9035 179 767 However, if we smoothen the empirical CDF by adding a little noise, we do get something that does look passably (although not perfectly) gaussian: x - IgM + runif(IgM, -.05,.05) medians2 - replicate(1,median(sample(x,replace=T))) hist(medians2) qqnorm(medians2) Interestingly, adding noise has the counterintuitive effect of reducing the standard error of the medians: sd(medians) [1] 0.02748966 sd(medians2) [1] 0.02347363 (It's not _that_ counterintuitive given that the definition of the median isn't quite the same for discrete data.) Back to the IQR. You can do much the same thing: iqrs - replicate(1,IQR(sample(IgM,replace=T))) table(iqrs) iqrs 0.3 0.375 0.4 0.45 0.475 0.5 0.55 0.575 0.6 6042 3885 7 640 5100 387 176 or, use the smoothed one replacing IgM by x (defined above). Now, what if we wanted to compare two IQRs? I'll cheat and reuse the same ECDF for both groups. i1 - replicate(1,IQR(sample(IgM,replace=T))) i2 - replicate(1,IQR(sample(IgM,replace=T))) qqnorm((i1-i2)/sd(i1-i2)) mean(abs(i1-i2)/sd(i1-i2) 2) [1] 0.9698 So, not really all that bad, but it is a bit fortuitous given the discreteness of the distribution. Same thing with the x comes out quite a bit nicer ix1 - replicate(1,IQR(sample(x,replace=T))) ix2 - replicate(1,IQR(sample(x,replace=T))) qqnorm((ix1-ix2)/sd(ix1-ix2)) mean(abs(ix1-ix2)/sd(ix1-ix2) 2) [1] 0.9546 So, my conclusion would be that yes, you can use bootstrap techniques with data of that size, but you need to watch out for discretization effects by checking the bootstrap sample distributions and you might want to add a little smoothing-noise for stability. As always with bootstrapping, beware that the simulation is never done under the null hypothesis, one merely hopes that the distribution of the resampled estimates around the observed estimate is sufficiently similar to that of the estimator around the true estimate that it can be used for tests and confidence intervals, implicitly using a location-shift argument. This gets particularly dubious when there are discretization effects because the jumps occur at values that do not depend on the parameters. (Pragmatically speaking, you might not be interested at all in differences in IQR which are comparable to discretization error, though.) I found a couple of more tests for scale on continous variables, ie. Mood Test Ansari-Bradley Test (that one is also implemented in R) Klotz Test Conover Test Would one of those be suitable to test for different dispersion (e.g. IQR or the like) in non-normal distributions? That is what they were designed to do... I'm not all that well acquainted with them, but given what I have seen from that general area and period, they should likely be studied with a critical eye to hidden assumptions. Quite a lot of work has been published with the general structure of let's do some sensible transformations of data and apply a nonparametric test, then call the whole procedure assumption-free (in those days, 1950s and 1960s, essentially, computer simulations were not readily available to show people the error of their ways...). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NaN in hurdle model please?
Simplify your model. Does your TandemRepeat have a lot of levels? Or is your sample size very small? Alain Dear all, I am fitting a hurdle model in the following way: HNB - hurdle(chro ~ as.factor(TandemRepeat)| as.factor(TandemRepeat), data =data_negbin_fin, dist = negbin) But the std. error for log(theta) = NA Count model coefficients (truncated negbin with log link): Estimate Std. Error z value Pr(|z|) Log(theta)13.5062 NANA NA And it gives the following error: In sqrt(diag(vc_count)[kx + 1]) : NaNs Can somebody help me please. Thanks so much Ana -- Dr. Alain F. Zuur First author of: 1. Analysing Ecological Data (2007). Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p. URL: www.springer.com/0-387-45967-7 2. Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer. http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9 3. A Beginner's Guide to R (2009). Zuur, AF, Ieno, EN, Meesters, EHWG. Springer http://www.springer.com/statistics/computational/book/978-0-387-93836-3 4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno. http://www.highstat.com/book4.htm Other books: http://www.highstat.com/books.htm Statistical consultancy, courses, data analysis and software Highland Statistics Ltd. 6 Laverock road UK - AB41 6FN Newburgh Tel: 0044 1358 788177 Email: highs...@highstat.com URL: www.highstat.com URL: www.brodgar.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting objects from quantmod ticker list
# Thank you, Michael: it works fine! -- View this message in context: http://r.789695.n4.nabble.com/Getting-objects-from-quantmod-ticker-list-tp4635708p4636440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate apply.rolling()
# I've read that rollapply, and its wrapper apply.rolling() # from PerformanceAnalytics package, do not work with multivariate # time series neither their output can be a multivariate time series. # Then I was wondering if any other function like those exists, or # if I need to write my own function to perform multivariate # time serie rolling analysis. # Something like this: # Let 'X' be your multivariate time series: # output - matrix(NA, ncol = ncol(X), nrow = nrow(X)) # width - 199 # for(i in 1:(nrow(output) - width) { # data - X[i:(i + width),] # output[i,] - function(data) #} # rownames(output) - rownames(as.timeSeries(X)) # ...and this should be a (probably not efficient) way to do it. # Any better idea? # Thanks, -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-apply-rolling-tp4636442.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Column create and Update using function
Hi Antony, There is still some confusion as to what you actually want as result. For example, your statement -In this i need to check each particular column values are between Max and Min value. If the coulmn value not coming between Max and Min, then i need to create another coulmn This implies to check for each columns in the dataset and create colname_QF My reply was based on these facts. As the Min and Max were assigned to 3 and 6, I was under the assumption that this is for the whole dataset. Then, you mentioned that it was only for column ABC. I guess the Min and Max were the ones you found from the original dataset for the first column. So, based on Min and Max for each column, your condition should be met. -If the coulmn value not coming between Max and Min, then i need to create another coulmn by adding column name header with _QF. and assign a string like RC for that particular row. If that is the case, try the code below: dat1-read.table(text= ABC XYZ PQR RDN SQT 2 4 3 2 8 5 4 8 3 9 7 1 3 4 1 3 2 4 3 2 4 6 5 7 4 ,sep=,header=TRUE) mindat1-apply(dat1,2,min) maxdat1-apply(dat1,2,max) minmaxdat-data.frame(rbind(mindat1,maxdat1)) rownames(minmaxdat)-1:nrow(minmaxdat) func1-function(x,y,z) {ifelse(y[[x]] max(z[[x]]) y[[x]] min(z[[x]]),y[[x]]-,RC)} dat2-data.frame(sapply(names(dat1),function(x) func1(x,dat1,minmaxdat))) colnames(dat2)-paste(colnames(dat2),QF,sep=_) dat3-data.frame(cbind(dat1,dat2)) dat3 ABC XYZ PQR RDN SQT ABC_QF XYZ_QF PQR_QF RDN_QF SQT_QF 1 2 4 3 2 8 RC RC RC 2 5 4 8 3 9 RC RC 3 7 1 3 4 1 RC RC RC RC 4 3 2 4 3 2 5 4 6 5 7 4 RC RC ## A.K. - Original Message - From: Rantony antony.akk...@ge.com To: r-help@r-project.org Cc: Sent: Friday, July 13, 2012 2:42 AM Subject: [R] Column create and Update using function Hi, here i have a Max and Min values Min -3 Max -6 and also a matrix like this, ABC XYZ PQR -- --- --- 2 4 3 5 4 8 7 1 3 In this i need to check each particular column values are between Max and Min value. If the coulmn value not coming between Max and Min, then i need to create another coulmn by adding column name header with _QF. and assign a string like RC for that particular row. For eg:- i need to checkout coulmn ABC. Here 2,5,6 are the values we need to checkout with Min,Max values. and here Min -3,Max -6 First need to create a new column called ABC_QF with current matrix. ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 Next, for 5 , it coming in between 3 to 6. so nothing to do. ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 Next, for 7 , its not coming in between 3 to 6. so put RC ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 RC --- This is the requirement. i did it using for-loop,it will check each value and it taking time when bulk of data come. Any hope to do using lappy,appy kind of functions ? Because at a time complete coulmn should get update. Could you please help me urgently ? - Thanks Antony. -- View this message in context: http://r.789695.n4.nabble.com/Column-create-and-Update-using-function-tp4636400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Column create and Update using function
Hi, Try this: dat1-read.table(text= ABC XYZ PQR 2 4 3 5 4 8 7 1 3 ,sep=,header=TRUE) newdat-apply(dat1,2,function(x) ifelse(x6 x3,,RC)) colnames(newdat)-paste(colnames(newdat),QF,sep=_) dat2-data.frame(cbind(dat1,newdat)) ABC XYZ PQR ABC_QF XYZ_QF PQR_QF 1 2 4 3 RC RC 2 5 4 8 RC 3 7 1 3 RC RC RC dat2 A.K. - Original Message - From: Rantony antony.akk...@ge.com To: r-help@r-project.org Cc: Sent: Friday, July 13, 2012 2:42 AM Subject: [R] Column create and Update using function Hi, here i have a Max and Min values Min -3 Max -6 and also a matrix like this, ABC XYZ PQR -- --- --- 2 4 3 5 4 8 7 1 3 In this i need to check each particular column values are between Max and Min value. If the coulmn value not coming between Max and Min, then i need to create another coulmn by adding column name header with _QF. and assign a string like RC for that particular row. For eg:- i need to checkout coulmn ABC. Here 2,5,6 are the values we need to checkout with Min,Max values. and here Min -3,Max -6 First need to create a new column called ABC_QF with current matrix. ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 Next, for 5 , it coming in between 3 to 6. so nothing to do. ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 Next, for 7 , its not coming in between 3 to 6. so put RC ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 RC --- This is the requirement. i did it using for-loop,it will check each value and it taking time when bulk of data come. Any hope to do using lappy,appy kind of functions ? Because at a time complete coulmn should get update. Could you please help me urgently ? - Thanks Antony. -- View this message in context: http://r.789695.n4.nabble.com/Column-create-and-Update-using-function-tp4636400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Arrange two columns into a five variable dataframe
Hi, You could use either one of these methods: #Method 1: #dat1 : data list1-split(dat1,dat1$group) dat2-data.frame(list1) dat2-data.frame(list1[[5]][1],list1[[4]][1],list1[[3]][1],list1[[2]][1],list1[[1]][1]) colnames(dat2)-c(rev(levels(dat1$group))) head(dat2) Group 1 Group 2 Group 3 Group 4 Group 5 1 40 46 21 35 16 2 37 42 40 37 19 3 44 65 44 49 19 4 47 46 54 46 32 5 47 58 36 63 33 6 47 42 40 39 33 #Method 2: #dat1:data library(reshape) dat3-data.frame(dat1,ID=rep(1:25,5)) dat4-reshape(dat3,idvar=ID,timevar=group,direction=wide) dat4-dat4[,-1] colnames(dat4)-rev(levels(dat3$group)) head(dat4) Group 1 Group 2 Group 3 Group 4 Group 5 1 40 46 21 35 16 2 37 42 40 37 19 3 44 65 44 49 19 4 47 46 54 46 32 5 47 58 36 63 33 6 47 42 40 39 33 #Method 3: #dat1: data dat3-data.frame(dat1,ID=rep(1:25,5)) library(reshape2) dat5-dcast(melt(dat3,id.vars=c(ID,group)),ID~variable+group) dat5-dat5[,-1] colnames(dat5)-levels(dat3$group) dat5-dat5[,c(5:1)] head(dat5) Group 1 Group 2 Group 3 Group 4 Group 5 1 40 46 21 35 16 2 37 42 40 37 19 3 44 65 44 49 19 4 47 46 54 46 32 5 47 58 36 63 33 6 47 42 40 39 33 identical(dat2,dat4) [1] TRUE identical(dat2,dat5) [1] TRUE A.K. - Original Message - From: darnold dwarnol...@suddenlink.net To: r-help@r-project.org Cc: Sent: Friday, July 13, 2012 11:37 PM Subject: [R] Arrange two columns into a five variable dataframe Hi, I hope that folks can give me some simple approaches to taking the data set below, which is accumulated in two columns called long and group, then arrange the data is the long column into a data frame containing five variables: Group 1, Group 2, Group 3, Group 4, and Group 5. I am hoping for a few different techniques which I can pass on to my students. Thanks David Arnold College of the Redwoods dput(flies) structure(list(long = c(40L, 37L, 44L, 47L, 47L, 47L, 68L, 47L, 54L, 61L, 71L, 75L, 89L, 58L, 59L, 62L, 79L, 96L, 58L, 62L, 70L, 72L, 74L, 96L, 75L, 46L, 42L, 65L, 46L, 58L, 42L, 48L, 58L, 50L, 80L, 63L, 65L, 70L, 70L, 72L, 97L, 46L, 56L, 70L, 70L, 72L, 76L, 90L, 76L, 92L, 21L, 40L, 44L, 54L, 36L, 40L, 56L, 60L, 48L, 53L, 60L, 60L, 65L, 68L, 60L, 81L, 81L, 48L, 48L, 56L, 68L, 75L, 81L, 48L, 68L, 35L, 37L, 49L, 46L, 63L, 39L, 46L, 56L, 63L, 65L, 56L, 65L, 70L, 63L, 65L, 70L, 77L, 81L, 86L, 70L, 70L, 77L, 77L, 81L, 77L, 16L, 19L, 19L, 32L, 33L, 33L, 30L, 42L, 42L, 33L, 26L, 30L, 40L, 54L, 34L, 34L, 47L, 47L, 42L, 47L, 54L, 54L, 56L, 60L, 44L ), group = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(Group 5, Group 4, Group 3, Group 2, Group 1), class = factor)), .Names = c(long, group), row.names = c(NA, -125L), class = data.frame) -- View this message in context: http://r.789695.n4.nabble.com/Arrange-two-columns-into-a-five-variable-dataframe-tp4636503.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Column create and Update using function
- Forwarded Message - From: arun smartpink...@yahoo.com To: Akkara, Antony (GE Energy, Non-GE) antony.akk...@ge.com Cc: R help r-help@r-project.org Sent: Friday, July 13, 2012 10:05 AM Subject: Re: [R] Column create and Update using function Hi, I thought you want to check all the columns at once using apply or similar functions. dat1[!(dat1$ABC6 dat1$ABC3),ABC_QF]-RC dat1[is.na(dat1)]- dat1 ABC XYZ PQR ABC_QF 1 2 4 3 RC 2 5 4 8 3 7 1 3 RC A.K. - Original Message - From: Akkara, Antony (GE Energy, Non-GE) antony.akk...@ge.com To: arun smartpink...@yahoo.com Cc: Sent: Friday, July 13, 2012 9:46 AM Subject: RE: [R] Column create and Update using function Hi Arun, Here I need to check only with one particular column, not with all columns. I tried with newdat-apply(dat1[,1],2,function(x) ifelse(x6 x3,,RC)) then I getting an error like this: Error in apply(dat1[, 1], 1, function(x) ifelse(x 6 x 3, , RC)) : dim(X) must have a positive length - thanks Antony. -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Friday, July 13, 2012 6:00 PM To: Akkara, Antony (GE Energy, Non-GE) Cc: R help Subject: Re: [R] Column create and Update using function Hi, Try this: dat1-read.table(text= ABC XYZ PQR 2 4 3 5 4 8 7 1 3 ,sep=,header=TRUE) newdat-apply(dat1,2,function(x) ifelse(x6 x3,,RC)) colnames(newdat)-paste(colnames(newdat),QF,sep=_) dat2-data.frame(cbind(dat1,newdat)) ABC XYZ PQR ABC_QF XYZ_QF PQR_QF 1 2 4 3 RC RC 2 5 4 8 RC 3 7 1 3 RC RC RC dat2 A.K. - Original Message - From: Rantony antony.akk...@ge.com To: r-help@r-project.org Cc: Sent: Friday, July 13, 2012 2:42 AM Subject: [R] Column create and Update using function Hi, here i have a Max and Min values Min -3 Max -6 and also a matrix like this, ABC XYZ PQR -- --- --- 2 4 3 5 4 8 7 1 3 In this i need to check each particular column values are between Max and Min value. If the coulmn value not coming between Max and Min, then i need to create another coulmn by adding column name header with _QF. and assign a string like RC for that particular row. For eg:- i need to checkout coulmn ABC. Here 2,5,6 are the values we need to checkout with Min,Max values. and here Min -3,Max -6 First need to create a new column called ABC_QF with current matrix. ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 Next, for 5 , it coming in between 3 to 6. so nothing to do. ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 Next, for 7 , its not coming in between 3 to 6. so put RC ABC XYZ PQR ABC_QF -- --- --- --- 2 4 3 RC 5 4 8 7 1 3 RC --- This is the requirement. i did it using for-loop,it will check each value and it taking time when bulk of data come. Any hope to do using lappy,appy kind of functions ? Because at a time complete coulmn should get update. Could you please help me urgently ? - Thanks Antony. -- View this message in context: http://r.789695.n4.nabble.com/Column-create-and-Update-using-function-tp4636400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table with numeric row names
Hi Peter, I copied the data from your email and run it again. dat1-read.table(text= 2.5 3.6 7.1 7.9 100 3 4 2 3 200 3.1 4 3 3 300 2.2 3.3 2 4 ,sep=,header=TRUE) dat1 X2.5 X3.6 X7.1 X7.9 100 3.0 4.0 2 3 200 3.1 4.0 3 3 300 2.2 3.3 2 4 colnames(dat1)-gsub(^[X](.*),\\1,colnames(dat1)) I am not sure what happened with your end. May be you could try readtable(, fill=TRUE) I guess Chi was able to read it as I understood from his email: (Thanks. It works very good.Chi A.K. - Original Message - From: peter dalgaard pda...@gmail.com To: arun smartpink...@yahoo.com Cc: kexinz zhangchic...@gmail.com; R help r-help@r-project.org Sent: Friday, July 13, 2012 10:27 AM Subject: Re: [R] read.table with numeric row names On Jul 13, 2012, at 04:27 , arun wrote: Hello, I saw your reply in nabble. Sorry about that. I thought the dataset had only few columns. #You can read first line of a file using: readLines(foo.txt,n=1)[1] #The more generic colname substitution dat1-read.table(text= 2.5 3.6 7.1 7.9 100 3 4 2 3 200 3.1 4 3 3 300 2.2 3.3 2 4 ,sep=,header=TRUE) (This didn't survive too well in mail: dat1-read.table(text= + 2.5 3.6 7.1 7.9 + 100 3 4 2 3 + 200 3.1 4 3 3 + 300 2.2 3.3 2 4 + ,sep=,header=TRUE) Error in read.table(text = \n 2.5 3.6 7.1 7.9 \n 100 3 4 2 3 \n 200 3.1 4 3 3 \n 300 2.2 3.3 2 4 \n , : more columns than column names Not sure exactly what happened there...) #The code should remove the X from the column names (row names?) However, adding check.names=FALSE should be more expedient. colnames(dat1)-gsub(^[X](.*),\\1,colnames(dat1)) dat1 2.5 3.6 7.1 7.9 100 3.0 4.0 2 3 200 3.1 4.0 3 3 300 2.2 3.3 2 4 plot(colMeans(dat1)~as.numeric(names(dat1)),xlab=Column_Name,ylab=Column_Mean) A.K. - Original Message - From: kexinz zhangchic...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, July 12, 2012 2:50 PM Subject: [R] read.table with numeric row names I have a text file like this 2.5 3.6 7.1 7.9 100 3 4 2 3 200 3.1 4 3 3 300 2.2 3.3 2 4 I used r - read.table(a.txt, header=T) The row names becomes X2.5, X3.6... What I need is the row names are numeric, so I can use the row names as numbers on x-axis for plotting. e.g. plot(colMeans(r)~names(r)), something like this. How to do this? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/read-table-with-numeric-row-names-tp4636342.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R combining many vectors of predictable name into one date frame
Hello, Try the following. ib1 - 1:10 ib2 - rnorm(10) hold.list - objects(pattern=ib) df - sapply(hold.list, get) df Note that you don't need list(), and that sapply() returns a data.frame if possible. Also, 'df' is the name of an R function, use something else like 'df1'. Hope this helps Rui Barradas Em 14-07-2012 00:51, Mathew Vickers escreveu: G'day R (power) users, I have a many vectors, called: ib1 ib2 ib3 ... ib100 and I would like them in one data frame (df) such that: df ib1 ib2 ib3 ib4 . ib100 x x xxx x x xxx x x xxx I have attempted: hold.list - list(objects(pattern=ib)) df - data.frame(hold.list) but that didn't work also: do.call(rbind, (objects(pattern=ib))) and that also didn't work. I tried a whole pile of other things, where I also failed. The number of vectors might differ each time I want to make the data frame, so that in the example above, I have ib1 : ib100, but next time, I might only have ib1 : ib2 Below is my (probably somewhat embarrassing) example script for generating the vectors in the first place. Commented out toward the end are a few attempts at doing the job I wanted to do. temp - runif(100) tripID - rep(1:10, 10) uni - rep(1:4, 25) temp - data.frame(temp, tripID, uni) trips - unique(temp$tripID) uni - unique(temp$uni[temp$tripID==trips[1]]) for (jj in 1:length(uni)){ a - c() for (ii in 1:10){ a - c(a, IQR(temp$temp[temp$uni %in% sample(uni,jj)])) assign(paste(ib,jj,sep=), a) # ib is short for ibuttons. The number is how many were used to calc IQR } # hold.list - list(objects(patter=ib)) # trip - data.frame(list=hold.list # I am trying to put everything into a dataframe # do.call(rbind, list=hold.list) # do.call(rbind, list(objects(pattern=ib))) } thanks heaps if you can help. And sorry if this is mostly garble. This is my first crack at soliciting help from the list. cheers, mat __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgamma function
On Jul 14, 2012, at 04:55 , Chandler Zuo wrote: Hi, Has anyone encountered the problem of rgamma function in C? The following simplified program always dies for me, and I wonder if anyone can tell me the reason. #include Rmath.h #include time.h #include Rinternals.h SEXP generateGamma () { srand(time(NULL)); return (rgamma(5000,1)); } Has anyone encountered a similar problem before? Is there another way of generating Gamma random variable in C? P.S. I have no problem compiling and loading this function in R. It doesn't even give off a warning?? The prototype in Rmath.h is double rgamma(double, double); and you should be returning an SEXP. As soon as something tries to interpret the double value as a pointer -- Poof! Notice that rgamma in C is not the same function as the R counterpart, in particular it isn't vectorized, so only generates one random number at a time. The long and the short of it is that you need to read up on sections 5.9 and 5.10 of Writing R Extensions. Thanks for suggestions in advance! --Chandler __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading in Large Dataset + variables via loop
Hello, Why do you need 9 variables in your environment if they are time series that correspond to the same period? You should use time series functions. #install.packages('zoo') library(zoo) # Make up a dataset Year - seq(from=as.Date(1901-01-01), by=year, length.out=100) dat - data.frame(matrix(rnorm(100*9), ncol=9), Year) # assign names. varNames - expand.grid(c(temp, precip, pressure), 1:3, stringsAsFactors=FALSE) varNames - as.vector(apply(colNames, 1, paste, collapse=)) varNames - c(varNames, Year) names(dat) - varNames head(dat) # and transform it into a time series of class 'zoo' z - zoo(dat[, 1:9], order.by=dat$Year) str(z) head(z) Another way would be, like you say, to use a loop to put the variables in a list. Something like lst - list() for(i in 1:9) lst[[i]] - dat[, i] names(lst) - varNames Note that I've used a dataset called 'dat' n place of your 'A'. You should post a data example, like the posting guide says. Using dput(). Hope this helps, Rui Barradas Em 14-07-2012 03:44, cmc0605 escreveu: Hello, I'm new to R with a (probably elementary) question. Suppose I have a dataset called /A/ with /n/ locations, and each location contains within it 3 time series of different variables (all of 100 years length); each time series is of a weather variable (for each location there is a temperature, precipitation, and pressure). For instance, location 1 has a temperature1 time series, a precip1 time series, and a pressure1 time series; location two has a temperature2, precip2, and pressure2 timeseries...That is, there are 100 rows, and (/n/*3)+1 columns. The extra column is the time. I want to load in this dataset and declare a variable for each time series. The columns are in order of location, so it goes temp1, precip1,pressure1, temp2,... and so forth in increasing column order. There are always 100 rows. Manually, Id have to do: temp1=A[,1] precip1=A[,2] pressure1=A[,3] temp2=A[,4] precip2=A[,5] pressure2=A[,6] temp3=A[,7] and so forth. Problem is, n is large, so I don't want to repeat this pattern forever. I figure I need a loop both for the variable name (ie.., the variable at a particular location) as well as for what column it reads from. Any help...? -- View this message in context: http://r.789695.n4.nabble.com/Loading-in-Large-Dataset-variables-via-loop-tp4636501.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum number of patterns and speed in grep
On Fri, Jul 13, 2012 at 1:41 PM, mdvaan mathijsdev...@gmail.com wrote: Here's some data (which should give you the error messages): # read in data data - read.csv(https://dl.dropbox.com/u/13631687/data.csv;, header = T, sep = ,) # first paste all data data1 - paste(data[,1], collapse = |) # second paste subsets of the data data2a - paste(data[1:750,1], collapse = |) data2b - paste(data[751:1500,1], collapse = |) # define the object to be searched text - c(the first is Santa Fe Gold Corp, the second is Starpharma Holdings) # match strapplyc(text, data1) strapplyc(text, data2a) strapplyc(text, data2b) Thanks in advance! Although it seems that strapplyc can handle larger regular expressions than grep in R it seems neither can handle as many as in your example so process it in chunks: k - 3000 # chunk size f - function(from, text) { to - min(from + k - 1, nrow(data)) r - paste(data[seq(from, to), 1], collapse = |) r - gsub([().*?+{}], , r) strapply(text, r) } ix - seq(1, nrow(data), k) out - lapply(text, function(text) unlist(lapply(ix, f, text))) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HLOOKUP in R
On Fri, Jul 13, 2012 at 9:25 PM, Silje Nord silje.nordg...@gmail.com wrote: Is there a function similar to excel's hlookup in R ? Try match(). I think it provides hlookup() functionality. Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable (column) in a data frame
To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##-- a - c(1,2,3) b - c(11,22,33) df - data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##-- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata - Paulo Barata ENSP - Fundação Oswaldo Cruz Rua Leopoldo Bulhões 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.bar...@ensp.fiocruz.br __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
This seems more or less correct to me. 1 sum(df$a==1) [1] 1 1 sum(df$a==2) [1] 1 1 sum(df$aaa==2) [1] 0 There is no df$aaa so the length is 0 which is what I think you are asking. What am I missing? John Kane Kingston ON Canada -Original Message- From: paulo.bar...@ensp.fiocruz.br Sent: Sun, 15 Jul 2012 11:30:37 -0300 To: r-help@r-project.org Subject: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##-- a - c(1,2,3) b - c(11,22,33) df - data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##-- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata - Paulo Barata ENSP - Fundação Oswaldo Cruz Rua Leopoldo Bulhões 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.bar...@ensp.fiocruz.br __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: ## a - c(1,2,3) ## the variable exists, we get a correct answer a==1 ## the variable does not exist, R rightly points this out aaa==1 ## My point is, if we make a spelling mistake in a program when referring to a variable inside a data frame, using the df$var notation, there seems to be no way of getting warned about that. Thank you once again. Paulo Barata - -- Original Message --- From: peter dalgaard pda...@gmail.com To: Paulo Barata paulo.bar...@ensp.fiocruz.br Sent: Sun, 15 Jul 2012 16:47:35 +0200 Subject: Re: [R] variable (column) in a data frame On Jul 15, 2012, at 16:30 , Paulo Barata wrote: To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##-- a - c(1,2,3) b - c(11,22,33) df - data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##-- Is there some way to make R issue either a warning or an error message in such a situation? You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. --- End of Original Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
On Jul 15, 2012, at 17:41 , Paulo Barata wrote: Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: You could try reading the 2nd half of my one-line reply You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't understand syntax
Hello, Thank you, I'm gald it helped. Two notes. 1. I don't believe 1t's a problem with the documentation, though many times, and R is not an exception, there are books that explain in simpler terms what the docs alreay explain well. Check out the contributed link in http://cran.r-project.org/ (it's on the left, bottom-most). There are several books that though have specific areas, start with an introduction to R. 2. You've missinterpreted a point in my post, data.frames are list. Stricktly speakng they are also, like any other list, collections of list, but that's NOT the way they should be seen. It's more natural to see them as implementing the statistical concepts of variables and observations. In this case a column is a variable/vector (not list) and whithin a vector, we have observations. In the general case variables need not have the same number of observations; if they do, the list can become tabular, a data.frame. And we can speak of rows. Rule: call the columns variables and call the rows observations or, well, or columns and rows. But think of the columns as vectors. Like I've said above, a column is still a list, and can hold any type of data. Some questions are about keeping entire matrices as elements of a data.frame column, but the answer is yes, it is possible but NO, don't do that. Em 15-07-2012 16:05, Charles Stangor escreveu: Rui, Thank you SO MUCH!! This was exactly the explanation I needed Now I can see that dataframes are collections of lists where each column is a list. I find that R documentation is either very superficial or completely arcane but I'm getting it! Thanks again. Chuck On Sat, Jul 14, 2012 at 7:41 PM, Rui Barradas ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt wrote: Hello, It's more simple than you believe it is. One thing at a time. First, in order to lighten the instructions, create index vectors. test2 - test # save 'test' for later na.v1 - is.na http://is.na(test[[v1]]) na.v2 - is.na http://is.na(test[[v2]]) na.v3 - is.na http://is.na(test[[v3]]) Now use them. test[[ result ]][ !na.v1 ] - test[[ v1 ]][ !na.v1 ] test[[ result ]][ !na.v2 ] - test[[ v2 ]][ !na.v2 ] test[[ result ]][ !na.v3 ] - test[[ v3 ]][ !na.v3 ] Note that above, for instance, n the first line, on each side of '-' we have two different types of indexing, in a certain sense. One, a data.frame is a list of a special type, each list member is a (random?) variable and all variables have the same number of observations. So test[[ result ]] refers to a vector of the data.frame. Another is the indexing of that vectors' elements. Imagine that we had assigned test.res - test[[ result ]] and then accessed the elements of 'test.res' with test.res[ !na.v1 ] - ...etc... That's what we are doing. Considering that a df is a list with a tabular form, we could also use the row/column type of indexing. Maybe this would be more intuitive. Equivalent, exactly equivalent to the code above is: test2[ !na.v1 , result ] - test2[ !na.v1 , v1 ] test2[ !na.v2 , result ] - test2[ !na.v2 , v2 ] test2[ !na.v3 , result ] - test2[ !na.v3 , v3 ] all.equal(test, test2) # TRUE Hope this helps, Rui Barradas Em 14-07-2012 21:22, Charles Stangor escreveu: OK, I need help!! I've been searching, but I don't understand the logic of some this dataframe addressing syntax. What is this type of code called? test [[v3]] [is.na http://is.na http://is.na(test[[v2]])] -10 #choose column v3 where column v2 is == 4 and replace with 10 and where is it documented? The code below works for what I want to do (find the non-missing value in a row), but why? test - read.table(text= v1 v2 v3 result 3 NA NA NA NA 3 NA NA NA NA 3 NA , header=TRUE) test [[result]] [!(is.na http://is.na http://is.na(test[[v1]]))] - test [[v1]] [!(is.na http://is.na http://is.na(test[[v1]]))] test [[result]] [!(is.na http://is.na http://is.na(test[[v2]]))] - test [[v2]] [!(is.na http://is.na http://is.na(test[[v2]]))] test [[result]] [!(is.na http://is.na http://is.na(test[[v3]]))] - test [[v3]] [!(is.na http://is.na http://is.na(test[[v3]]))] thanks! On Fri, Jul 13, 2012 at 6:41 AM, Rui Barradas ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt wrote: Hello, Check the structure of what you have, df and newdf. You will see that in df dateTime is of class POSIXlt and in newDf newDateTime is of class POSIXct. Solution: [...] df$dateTime -
Re: [R] variable (column) in a data frame
On 2012-07-15 08:41, Paulo Barata wrote: Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: ## a - c(1,2,3) ## the variable exists, we get a correct answer a==1 ## the variable does not exist, R rightly points this out aaa==1 ## My point is, if we make a spelling mistake in a program when referring to a variable inside a data frame, using the df$var notation, there seems to be no way of getting warned about that. You could wean yourself from the $-habit. It's convenient but can lead to the problems you're experiencing (and this has been discussed before). For programming, if you're prone to make spelling errors, you should prefer df[, aaa]. See ?Extract. Peter Ehlers Thank you once again. Paulo Barata - -- Original Message --- From: peter dalgaard pda...@gmail.com To: Paulo Barata paulo.bar...@ensp.fiocruz.br Sent: Sun, 15 Jul 2012 16:47:35 +0200 Subject: Re: [R] variable (column) in a data frame On Jul 15, 2012, at 16:30 , Paulo Barata wrote: To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##-- a - c(1,2,3) b - c(11,22,33) df - data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##-- Is there some way to make R issue either a warning or an error message in such a situation? You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. --- End of Original Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Imposing more than one condition to if
Hi, I have a dataset which contains several time records for a number of days, plus a variable (light) that allows to determine night time (lihgt= 0) and daytime (light 0). I need to obtain get dusk time and dawn time for each day and place them in two columns. This is the starting point (d): day time light 1 1 20 1 12 10 1 11 6 1 9 0 1 6 0 1 12 0 ... 30 8 0 30 3 0 30 8 0 30 3 0 30 8 8 30 9 20 And this what I want to get: day time light dusk dawn 1 1 20 11 10 1 1210 11 10 1 11 6 11 10 1 9 0 11 10 1 6 0 11 10 1 12 0 11 10 ... 30 8 0 9 5 30 3 0 9 5 30 8 0 9 5 30 3 0 9 5 30 8 8 9 5 30 9 20 9 5 This is the code for data frame d: day= rep(1:30, each=10) n= length(dia); x= c(1:24) time= sample(x, 300, replace= T) light= rep(c(20,10,6,0,0,0,0,0,8,20), 30) d=data.frame(day,time,light) I'd need to impose a double condition like the next but if does not take more than one: attach(d) for (i in 1: n){ if (light[i-1]2 light[i]2){ d$dusk- time[i-1] } if (light[i-1]2 light[i]2){ d$dawn- time[i] } } detach(d) d Thank you for your help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't understand syntax
Hello, again. Inline. Em 15-07-2012 16:17, Charles Stangor escreveu: Rui, Since you are so generous, may I ask you one more question? What is the deal with the text after the semicolon in the statement below? Is this an ifelse or something? Why is it needed ? obrigado. df1 - read.table(text= cola colb colc cold cole 1NA59 NA 17 226 NA 14 NA 33NA 11 15 19 448 12 NA NA , header=TRUE) df2 - read.table(text= cola colb colc cold cole 1 8 10 12 14 16 , header=TRUE) df1[[cola]][is.na(df1[[cola]])] - df2[[cola]]; df1[[cola]] #?? what's happening after the semi? It's printing df1[[cola]]. Just that. The sem-colon ends an instruction and starts a new one. If it's confusing, put what follows it in a new line. Rui Barradas df1 On Sat, Jul 14, 2012 at 7:41 PM, Rui Barradas ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt wrote: Hello, It's more simple than you believe it is. One thing at a time. First, in order to lighten the instructions, create index vectors. test2 - test # save 'test' for later na.v1 - is.na http://is.na(test[[v1]]) na.v2 - is.na http://is.na(test[[v2]]) na.v3 - is.na http://is.na(test[[v3]]) Now use them. test[[ result ]][ !na.v1 ] - test[[ v1 ]][ !na.v1 ] test[[ result ]][ !na.v2 ] - test[[ v2 ]][ !na.v2 ] test[[ result ]][ !na.v3 ] - test[[ v3 ]][ !na.v3 ] Note that above, for instance, n the first line, on each side of '-' we have two different types of indexing, in a certain sense. One, a data.frame is a list of a special type, each list member is a (random?) variable and all variables have the same number of observations. So test[[ result ]] refers to a vector of the data.frame. Another is the indexing of that vectors' elements. Imagine that we had assigned test.res - test[[ result ]] and then accessed the elements of 'test.res' with test.res[ !na.v1 ] - ...etc... That's what we are doing. Considering that a df is a list with a tabular form, we could also use the row/column type of indexing. Maybe this would be more intuitive. Equivalent, exactly equivalent to the code above is: test2[ !na.v1 , result ] - test2[ !na.v1 , v1 ] test2[ !na.v2 , result ] - test2[ !na.v2 , v2 ] test2[ !na.v3 , result ] - test2[ !na.v3 , v3 ] all.equal(test, test2) # TRUE Hope this helps, Rui Barradas Em 14-07-2012 21:22, Charles Stangor escreveu: OK, I need help!! I've been searching, but I don't understand the logic of some this dataframe addressing syntax. What is this type of code called? test [[v3]] [is.na http://is.na http://is.na(test[[v2]])] -10 #choose column v3 where column v2 is == 4 and replace with 10 and where is it documented? The code below works for what I want to do (find the non-missing value in a row), but why? test - read.table(text= v1 v2 v3 result 3 NA NA NA NA 3 NA NA NA NA 3 NA , header=TRUE) test [[result]] [!(is.na http://is.na http://is.na(test[[v1]]))] - test [[v1]] [!(is.na http://is.na http://is.na(test[[v1]]))] test [[result]] [!(is.na http://is.na http://is.na(test[[v2]]))] - test [[v2]] [!(is.na http://is.na http://is.na(test[[v2]]))] test [[result]] [!(is.na http://is.na http://is.na(test[[v3]]))] - test [[v3]] [!(is.na http://is.na http://is.na(test[[v3]]))] thanks! On Fri, Jul 13, 2012 at 6:41 AM, Rui Barradas ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt mailto:ruipbarra...@sapo.pt wrote: Hello, Check the structure of what you have, df and newdf. You will see that in df dateTime is of class POSIXlt and in newDf newDateTime is of class POSIXct. Solution: [...] df$dateTime - strptime(df$dateTime,%m/%d/%Y %H:%M) df$dateTime - as.POSIXct(df$dateTime) [...] Hope this helps, Rui Barradas Em 13-07-2012 10:24, vioravis escreveu: I have the following dataframe with the first column being of type datetime: dateTime - c(10/01/2005 0:00, 10/01/2005 0:20, 10/01/2005 0:40, 10/01/2005 1:00, 10/01/2005 1:20) var1 - c(1,2,3,4,5) var2 - c(10,20,30,40,50) df - data.frame(dateTime = dateTime, var1 = var1, var2 = var2) df$dateTime - strptime(df$dateTime,%m/%d/%Y %H:%M) I want to create 10 minute interval
Re: [R] minor axis ticks in trellis graphics?
On 2012-07-13 01:05, Martin Ivanov wrote: Dear R users, I need to add minor axis ticks to my graph. In traditional R this is easily achievable by simply adding a second axis with the minor ticks. But how to do that in trellis? I am already out of ideas. Any suggestions will be appreciated. Haven't seen a response yet, so I'll give it a shot, sure to be replaced by something much simpler by Deepayan when he finds the time. Here are two ways: 1. Assign appropriate values to the elements of the xscale.components list. I prefer this. ## make some data d - data.frame(x = 1:12, y = rnorm(12)) at.ticks - c(4,8) at.labels - c(2,6,10) the_labels - letters[1:3] library(lattice) ## define a function to modify the xscale components; ## this function will be used inside xyplot(). myxscale.components - function(...) { ans - xscale.components.default(...) ans$bottom$ticks$at - at.ticks ans$bottom$labels$at - at.labels ans$bottom$labels$labels - the_labels ans } ## do the plot xyplot(y ~ x, data = d, scales = list(tck = c(1,0)), xscale.components = myxscale.components) You can put the modifying function inside the xyplot call. See ?axis.components. 2. This is more like the base graphics way. We create the plot without the x-axis and then use the trellis.focus/unfocus functions in conjunction with the panel.axis() function. See ?panel.axis for details. Here's the function to apply after the xyplot call: myfocus - function(){ trellis.focus(panel, 1, 1, clip.off = TRUE, highlight = FALSE) ## put the ticks in panel.axis(side = bottom, at = at.ticks, labels = FALSE, ticks = TRUE, tck = 1, outside = TRUE ) ## put the labels in panel.axis(side = bottom, at = at.labels, labels = the_labels, ticks = FALSE, tck = 0, outside = TRUE, rot = 0 # optional; try it without ) trellis.unfocus() } xyplot(y ~ x, data = d, scales = list( y = list(tck = c(1,0)), x = list(tck = c(0,0), at = 1, label = # to give us some bottom space ))) ## Now add the axis ticks and labels myfocus() Peter Ehlers Best regards, Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
Dr. Dalgaard, Thank you. You are right, with() is able to catch spelling errors in the name of variables inside a data frame. But couldn't some error or warning be included in R when referring to a non-existent variable inside a data frame with the df$var notation, without the use of with()? Is there any reason why R does not have such a kind of error message? Thank you again. Paulo Barata - -- Original Message --- From: peter dalgaard pda...@gmail.com To: Paulo Barata paulo.bar...@ensp.fiocruz.br Cc: r-help@r-project.org Sent: Sun, 15 Jul 2012 18:14:22 +0200 Subject: Re: [R] variable (column) in a data frame On Jul 15, 2012, at 17:41 , Paulo Barata wrote: Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: You could try reading the 2nd half of my one-line reply You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. --- End of Original Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LiblineaR: read/write model files?
* Sam Steingold f...@tah.bet [2012-07-13 15:51:46 -0400]: How do I read/write liblinear models to files? E.g., if I train a model using the command line interface, I might want to load it into R to look the histogram of the weights. Or I might want to train a model in R and then apply it using a command line interface. read.liblinear - function (file) { cat(read.liblinear(,file,)\n) lines - readLines(file) stopifnot(lines[6]==w) parsed - strsplit(lines[1:5], ,fixed=TRUE) stopifnot(parsed[[1]][1] == solver_type) stopifnot(parsed[[2]][1] == nr_class) stopifnot(parsed[[3]][1] == label) stopifnot(parsed[[4]][1] == nr_feature) stopifnot(parsed[[5]][1] == bias) stopifnot(as.numeric(parsed[[2]][2]) + 1 == length(parsed[[3]])) stopifnot(as.numeric(parsed[[4]][2]) + 6 == length(lines)) ret - list(solver.type=parsed[[1]][2], label=parsed[[3]][2:length(parsed[[3]])], bias=as.numeric(parsed[[5]][2]), weight=as.numeric(lines[7:length(lines)])) nattr - length(ret$weight) n0 - length(which(ret$weight==0)) cat(solver.type:,ret$solver.type,\nlabel:,ret$label,\nbias:,ret$bias, \nweight(total:,nattr,; 0:,n0,=,(100*n0/nattr),%)\n) print(summary(ret$weight)) ret } -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://palestinefacts.org http://honestreporting.com http://www.PetitionOnline.com/tap12009/ http://americancensorship.org Incorrect time synchronization. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imposing more than one condition to if
No idea of how to do what you want but your data set is not working. I think that you want x= c(1:24) day= rep(1:30, each=10) time= sample(x, 300, replace= T) light= rep(c(20,10,6,0,0,0,0,0,8,20), 30) d=data.frame(day,time,light) n= length(day) John Kane Kingston ON Canada -Original Message- From: sgual...@yahoo.com Sent: Sun, 15 Jul 2012 09:32:33 -0700 (PDT) To: r-help@r-project.org Subject: [R] Imposing more than one condition to if Hi, I have a dataset which contains several time records for a number of days, plus a variable (light) that allows to determine night time (lihgt= 0) and daytime (light 0). I need to obtain get dusk time and dawn time for each day and place them in two columns. This is the starting point (d): day time light 1 1 20 1 12 10 1 11 6 1 9 0 1 6 0 1 12 0 ... 30 8 0 30 3 0 30 8 0 30 3 0 30 8 8 30 9 20 And this what I want to get: day time light dusk dawn 1 1 20 11 10 1 1210 11 10 1 11 6 11 10 1 9 0 11 10 1 6 0 11 10 1 12 0 11 10 ... 30 8 0 9 5 30 3 0 9 5 30 8 0 9 5 30 3 0 9 5 30 8 8 9 5 30 9 20 9 5 This is the code for data frame d: day= rep(1:30, each=10) n= length(dia); x= c(1:24) time= sample(x, 300, replace= T) light= rep(c(20,10,6,0,0,0,0,0,8,20), 30) d=data.frame(day,time,light) I'd need to impose a double condition like the next but if does not take more than one: attach(d) for (i in 1: n){ if (light[i-1]2 light[i]2){ d$dusk- time[i-1] } if (light[i-1]2 light[i]2){ d$dawn- time[i] } } detach(d) d Thank you for your help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imposing more than one condition to if
Hello, There are obvious bugs in your code, you are testing for light 2 or ligth 2 but this would mean that dusk and dawn are undetermined for light == 2 and that they happen at light == 1. Without loops or compound logical conditions: f - function(x){ x$dawn - x$time[ which.min(x$light) ] x$dusk - x$time[ max(which(x$light == 0)) + 1 ] x } do.call(rbind, by(d, d$day, f)) Hope this helps, Rui Barradas Em 15-07-2012 17:32, Santiago Guallar escreveu: Hi, I have a dataset which contains several time records for a number of days, plus a variable (light) that allows to determine night time (lihgt= 0) and daytime (light 0). I need to obtain get dusk time and dawn time for each day and place them in two columns. This is the starting point (d): day time light 1 1 20 1 12 10 1 11 6 1 9 0 1 6 0 1 12 0 ... 30 8 0 30 3 0 30 8 0 30 3 0 30 8 8 30 9 20 And this what I want to get: day time light dusk dawn 1 1 20 11 10 1 1210 11 10 1 11 6 11 10 1 9 0 11 10 1 6 0 11 10 1 12 0 11 10 ... 30 8 0 9 5 30 3 0 9 5 30 8 0 9 5 30 3 0 9 5 30 8 8 9 5 30 9 20 9 5 This is the code for data frame d: day= rep(1:30, each=10) n= length(dia); x= c(1:24) time= sample(x, 300, replace= T) light= rep(c(20,10,6,0,0,0,0,0,8,20), 30) d=data.frame(day,time,light) I'd need to impose a double condition like the next but if does not take more than one: attach(d) for (i in 1: n){ if (light[i-1]2 light[i]2){ d$dusk- time[i-1] } if (light[i-1]2 light[i]2){ d$dawn- time[i] } } detach(d) d Thank you for your help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading in Large Dataset + variables via loop
Hello, Right, it should be 'varNames' in the apply. I guess I had something called colNames in my environment. I've just rm(list=ls()) and rerun the code, corrected. No errors this time. varNames is the result of expand.grid, therefore does have a dim attribute. The faulty instruction corrected is: varNames - as.vector(apply(varNames, 1, paste, collapse=)) Rui Barradas Em 15-07-2012 18:12, arun escreveu: Hi Rui, Getting some error messages: varNames - as.vector(apply(colNames, 1, paste, collapse=)) Error in apply(colNames, 1, paste, collapse = ) : dim(X) must have a positive length A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: cmc0605 colos...@gmail.com Cc: r-help@r-project.org Sent: Sunday, July 15, 2012 8:12 AM Subject: Re: [R] Loading in Large Dataset + variables via loop Hello, Why do you need 9 variables in your environment if they are time series that correspond to the same period? You should use time series functions. #install.packages('zoo') library(zoo) # Make up a dataset Year - seq(from=as.Date(1901-01-01), by=year, length.out=100) dat - data.frame(matrix(rnorm(100*9), ncol=9), Year) # assign names. varNames - expand.grid(c(temp, precip, pressure), 1:3, stringsAsFactors=FALSE) varNames - as.vector(apply(colNames, 1, paste, collapse=)) varNames - c(varNames, Year) names(dat) - varNames head(dat) # and transform it into a time series of class 'zoo' z - zoo(dat[, 1:9], order.by=dat$Year) str(z) head(z) Another way would be, like you say, to use a loop to put the variables in a list. Something like lst - list() for(i in 1:9) lst[[i]] - dat[, i] names(lst) - varNames Note that I've used a dataset called 'dat' n place of your 'A'. You should post a data example, like the posting guide says. Using dput(). Hope this helps, Rui Barradas Em 14-07-2012 03:44, cmc0605 escreveu: Hello, I'm new to R with a (probably elementary) question. Suppose I have a dataset called /A/ with /n/ locations, and each location contains within it 3 time series of different variables (all of 100 years length); each time series is of a weather variable (for each location there is a temperature, precipitation, and pressure). For instance, location 1 has a temperature1 time series, a precip1 time series, and a pressure1 time series; location two has a temperature2, precip2, and pressure2 timeseries...That is, there are 100 rows, and (/n/*3)+1 columns. The extra column is the time. I want to load in this dataset and declare a variable for each time series. The columns are in order of location, so it goes temp1, precip1,pressure1, temp2,... and so forth in increasing column order. There are always 100 rows. Manually, Id have to do: temp1=A[,1] precip1=A[,2] pressure1=A[,3] temp2=A[,4] precip2=A[,5] pressure2=A[,6] temp3=A[,7] and so forth. Problem is, n is large, so I don't want to repeat this pattern forever. I figure I need a loop both for the variable name (ie.., the variable at a particular location) as well as for what column it reads from. Any help...? -- View this message in context: http://r.789695.n4.nabble.com/Loading-in-Large-Dataset-variables-via-loop-tp4636501.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-implementation of Local Outlier Probabilities (LoOP)?
Dear all, Is anyone aware of an R implementation of LoOF (H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek; LoOP: Local Outlier Probabilities; In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China: 1649–1652, 2009.)? I found http://cran.r- project.org/web/packages/Rlof/index.html, but would prefer the p-value'ish measure provided by LoOP. Alternatives implemented in R would also be valuable ... Thank you for your consideration. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
Paulo Barata-3 wrote Dr. Dalgaard, Thank you. You are right, with() is able to catch spelling errors in the name of variables inside a data frame. But couldn't some error or warning be included in R when referring to a non-existent variable inside a data frame with the df$var notation, without the use of with()? Is there any reason why R does not have such a kind of error message? See this discussion: https://stat.ethz.ch/pipermail/r-help/2012-July/317562.html Berend -- View this message in context: http://r.789695.n4.nabble.com/variable-column-in-a-data-frame-tp4636561p4636579.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HLOOKUP in R
Depending on what options of hlookup you want, 'match' will do exact matching and 'findInterval' will determine range/interval matching. What you need to do is follow the posting guide and provide an example of exactly what you data looks like and what you expect the result to be. On Fri, Jul 13, 2012 at 3:25 PM, Silje Nord silje.nordg...@gmail.com wrote: Hi, Is there a function similar to excel's hlookup in R ? Thanks, Silje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HLOOKUP in R
Try ?match Adapt it to your need On Saturday, July 14, 2012 12:55:33 AM UTC+5:30, Silje Nord wrote: Hi, Is there a function similar to excel's hlookup in R ? Thanks, Silje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutation test on paired samples
Holger, Thanks for providing a reproducible example. However, since your space key only works sporadically, the below is a little hard to read... ;) On 2012-07-12 20:26, Holger Taschenberger wrote: Hi, I'm trying to run a permutation test on paired samples. First I tried the package exactRankTests: require(exactRankTests) x - c(1.83,0.50,1.62,2.48,1.68,1.88,1.55,3.06,1.30) y - c(0.878,0.647,0.598,2.05,1.06,1.29,1.06,3.14,1.29) The relevant output missing here is wilcox.test(x,y,paired = TRUE,alternative = greater) Wilcoxon signed rank test data: x and y V = 40, p-value = 0.01953 alternative hypothesis: true location shift is greater than 0 perm.test(y,x,paired = TRUE,exact = TRUE,alternative = greater) 1-sample Permutation Test (scores mapped into 1:m using rounded scores) data: y and x T = 41, p-value = 0.003906 alternative hypothesis: true mu is greater than 0 Firstly, you've interchanged the 'x' and 'y' in the second call. Secondly, and more important, the output says that (scores mapped into 1:m using rounded scores). In this case this can easily be avoided, and note the interchange of 'x' and 'y' to match your 'wilcox.test' call, using: yy - 1000 * y xx - 1000 * x perm.test(xx, yy, paired = TRUE, exact = TRUE, + alternative = greater) 1-sample Permutation Test data: xx and yy T = 4114, p-value = 0.01367 alternative hypothesis: true mu is greater than 0 So, now that we've computed the correct p-value, let's see how to obtain this using the 'coin' package: Then I wanted to use the package 'coin': require(coin) x - c(1.83,0.50,1.62,2.48,1.68,1.88,1.55,3.06,1.30) y - c(0.878,0.647,0.598,2.05,1.06,1.29,1.06,3.14,1.29) xydat - data.frame(y = c(y,x),x = gl(2,length(x)),block = factor(rep(1:length(x),2))) The relevant output missing here is wilcoxsign_test(y ~ x | block,data = xydat,alternative = greater,distribution = exact()) Exact Wilcoxon-Signed-Rank Test data: y by x (neg, pos) stratified by block Z = 2.0732, p-value = 0.01953 alternative hypothesis: true mu is greater than 0 oneway_test(y ~ x | block,data = xydat,alternative = greater,distribution = exact()) Exact 2-Sample Permutation Test data: y by x (1, 2) stratified by block Z = -2.1948, p-value = 0.6982 alternative hypothesis: true mu is greater than 0 Using 'oneway_test' in this way does *not* correspond to a paired test. The raw scores version of the Wilcoxon signed-rank test can be constructed using diff - x - y y - as.vector(t(cbind(abs(diff) * (diff 0), +abs(diff) * (diff = 0 x - factor(rep(c(neg, pos), length(diff)), + levels = c(pos, neg)) b - gl(length(diff), 2) oneway_test(y ~ x | b, alternative = greater, distr = exact) Exact 2-Sample Permutation Test data: y by x (pos, neg) stratified by b Z = 2.1948, p-value = 0.01367 alternative hypothesis: true mu is greater than 0 And, as you can see, this is equal to the 'perm.test' result. HTH, Henric While the results of the Wilcoxon test are the same for both packages are the same, those of the permutation test are very different. So, obviously I'm doing something wrong here. Can somebody please help? Thanks a lot, Holger __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for Cox regression with a time-varying covariate
Hi Greg, Thanks for your response. So far I've just been asked to investigate what the analysis likely would involve. The hope was that there were be some sort of quick and easy canned approach. I don't really think this is the case though. If I'm asked to do the actual analysis itself, I'll start out using the steps you've listed and see where that takes me. Paul --- On Fri, 7/13/12, Greg Snow 538...@gmail.com wrote: From: Greg Snow 538...@gmail.com Subject: Re: [R] Power analysis for Cox regression with a time-varying covariate To: Paul Miller pjmiller...@yahoo.com Cc: r-help@r-project.org Received: Friday, July 13, 2012, 3:29 PM For something like this the best (and possibly only reasonable) option is to use simulation. I have posted on the general steps for using simulation for power studies in this list and elsewhere before, but probably never with coxph. The general steps still hold, but the complicated part here will be to simulate the data. I would recommend something along the lines of: 1. generate a value for the censoring time, possibly exponential or weibull (for simplicity I would make this not dependent on the covariates if reasonable). 2. generate a value for the covariate for the given time period (sample function possibly), then generate a survival time for this covariate value (possibly weibull distribution, or lognormal, exponential, etc.) If the survival time is less than the time period and censoring time then you have an event and a time to the event. If the survival time is longer than the censoring time, but not longer than the time period (for the covariate), then you have censoring and you can record the time to censoring. If the survival time is longer than the time period then you have the row information for that time period and can move on to the next time period where you will first randomly choose the covariate value again, then generate another survival time based on the covariate and given that they have already survived a given amount. Continue with this until you have an event or censoring time for each subject. On Fri, Jul 13, 2012 at 9:17 AM, Paul Miller pjmiller...@yahoo.com wrote: Hello All, Does anyone know where I can find information about how to do a power analysis for Cox regression with a time-varying covariate using R or some other readily available software? I've done some searching online but haven't found anything. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] computing a subset using a loop
Dear all, I have a data frame with different variables and I want to build different subsets out of this data frame using some conditions and I want to use a loop because there will be a lot of subsets and this would be saving a lot of time. I try to give you an overview about my data frame. I have a data frame named Baumdaten and it has one column named transectID with different IDs (A_SEF ,A_LEF, B_SEF etc.) there is another column named Baumart with different species like Abies alba, Betula pendula, etc. I want to build now subsets and the first subset should be named A_2_SEF_Abies_alba and should contain all Abies alba that are living in A_2_SEF. So the normal code would be A_2_SEF_Abies_alba-subset(Baumdaten,Baumart==Abies albapointID==A_2_SEF) The following step would be to replace Abies alba with Betula pendula and so on after doing this for A_SEF I have to start with A_LEF so a lot of time is needing thats why I want to ask if it is possible doing this by using a loop? Hope you can understand my problem... -- View this message in context: http://r.789695.n4.nabble.com/computing-a-subset-using-a-loop-tp4636564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
Hi, I guess you can try this: #You will get the same result here: df$aaa==2 logical(0) !df$aaa==2 logical(0) #But it is different for the variable present in the dataframe df$a==4 [1] FALSE FALSE FALSE !df$a==4 [1] TRUE TRUE TRUE identical(df$aaa==2,!df$aaa==2) [1] TRUE identical(df$a==4,!df$a==4) [1] FALSE A.K. - Original Message - From: Paulo Barata paulo.bar...@ensp.fiocruz.br To: r-help@r-project.org Cc: Sent: Sunday, July 15, 2012 10:30 AM Subject: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##-- a - c(1,2,3) b - c(11,22,33) df - data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##-- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata - Paulo Barata ENSP - Fundação Oswaldo Cruz Rua Leopoldo Bulhões 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.bar...@ensp.fiocruz.br __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing a subset using a loop
I looks like you want to use the 'split' function which would create a list of dataframes with the various conditions: result - split(Baumdaten, list(Baumdaten$transectID, Baumdaten$Baumart), drop = TRUE) On Sun, Jul 15, 2012 at 11:31 AM, burton030 burto...@hotmail.de wrote: Dear all, I have a data frame with different variables and I want to build different subsets out of this data frame using some conditions and I want to use a loop because there will be a lot of subsets and this would be saving a lot of time. I try to give you an overview about my data frame. I have a data frame named Baumdaten and it has one column named transectID with different IDs (A_SEF ,A_LEF, B_SEF etc.) there is another column named Baumart with different species like Abies alba, Betula pendula, etc. I want to build now subsets and the first subset should be named A_2_SEF_Abies_alba and should contain all Abies alba that are living in A_2_SEF. So the normal code would be A_2_SEF_Abies_alba-subset(Baumdaten,Baumart==Abies albapointID==A_2_SEF) The following step would be to replace Abies alba with Betula pendula and so on after doing this for A_SEF I have to start with A_LEF so a lot of time is needing thats why I want to ask if it is possible doing this by using a loop? Hope you can understand my problem... -- View this message in context: http://r.789695.n4.nabble.com/computing-a-subset-using-a-loop-tp4636564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting rows and columns from a big matrix
Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows and columns from a big matrix
For a start, you are missing a quote and a parenthese on the statement; probably should be: (another quote was also missing) n-subset(m, select=c(X1, X7, X12,X15, X22, X26, X31, X34, X39, X44, X51, X58)) Not sure what you want with the rownames; an example would help and post with 'dput'. On Sun, Jul 15, 2012 at 2:47 PM, A J anxu...@hotmail.com wrote: Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows and columns from a big matrix
You're missing a ) You close c() but not subset(). That's the most common cause of incomplete commands: it often works to just keep typing ) return until you get the regular prompt. Sarah On Sun, Jul 15, 2012 at 2:47 PM, A J anxu...@hotmail.com wrote: Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imposing more than one condition to if
try this: set.seed(1) day= rep(1:30, each=10) n= length(day); x= c(1:24) time= sample(x, 300, replace= T) light= rep(c(20,10,6,0,0,0,0,0,8,20), 30) d=data.frame(day,time,light) # create a dawn/dusk column to mark where it happens d$dawn - c(FALSE, (head(d$light, -1) 2) (tail(d$light, -1) 2)) d$dusk - c((head(d$light, -1) 2) (tail(d$light, -1) 2), FALSE) # now split and recombine to get the values for the day result - do.call(rbind, lapply(split(d, d$day), function(.day){ + # create new dataframe with the values + cbind(.day[, c('day', 'time', 'light')] + , dawn = .day$time[.day$dawn] + , dusk = .day$time[.day$dusk] + ) + })) result day time light dawn dusk 1.1 1720 16 14 1.2 1910 16 14 1.3 1 14 6 16 14 1.4 1 22 0 16 14 1.5 15 0 16 14 1.6 1 22 0 16 14 1.7 1 23 0 16 14 1.8 1 16 0 16 14 1.9 1 16 8 16 14 1.10 1220 16 14 2.11 2520 10 17 2.12 2510 10 17 2.13 2 17 6 10 17 2.14 2 10 0 10 17 2.15 2 19 0 10 17 2.16 2 12 0 10 17 2.17 2 18 0 10 17 2.18 2 24 0 10 17 2.19 2 10 8 10 17 2.20 2 1920 10 17 3.21 3 2320 21 16 3.22 3610 21 16 3.23 3 16 6 21 16 3.24 34 0 21 16 3.25 37 0 21 16 3.26 3 10 0 21 16 3.27 31 0 21 16 3.28 3 10 0 21 16 3.29 3 21 8 21 16 3.30 3920 21 16 4.31 4 1220 18 12 4.32 4 1510 18 12 4.33 4 12 6 18 12 4.34 45 0 18 12 4.35 4 20 0 18 12 4.36 4 17 0 18 12 4.37 4 20 0 18 12 4.38 43 0 18 12 4.39 4 18 8 18 12 4.40 4 1020 18 12 On Sun, Jul 15, 2012 at 12:32 PM, Santiago Guallar sgual...@yahoo.com wrote: Hi, I have a dataset which contains several time records for a number of days, plus a variable (light) that allows to determine night time (lihgt= 0) and daytime (light 0). I need to obtain get dusk time and dawn time for each day and place them in two columns. This is the starting point (d): day time light 1 1 20 1 12 10 1 11 6 1 9 0 1 6 0 1 12 0 ... 30 8 0 30 3 0 30 8 0 30 3 0 30 8 8 30 9 20 And this what I want to get: day time light dusk dawn 1 1 20 11 10 1 1210 11 10 1 11 6 11 10 1 9 0 11 10 1 6 0 11 10 1 12 0 11 10 ... 30 8 0 9 5 30 3 0 9 5 30 8 0 9 5 30 3 0 9 5 30 8 8 9 5 30 9 20 9 5 This is the code for data frame d: day= rep(1:30, each=10) n= length(dia); x= c(1:24) time= sample(x, 300, replace= T) light= rep(c(20,10,6,0,0,0,0,0,8,20), 30) d=data.frame(day,time,light) I'd need to impose a double condition like the next but if does not take more than one: attach(d) for (i in 1: n){ if (light[i-1]2 light[i]2){ d$dusk- time[i-1] } if (light[i-1]2 light[i]2){ d$dawn- time[i] } } detach(d) d Thank you for your help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows and columns from a big matrix
Sorry so much for mistakes. It was an example code and I commited some mistakes typing it. But meaning the original code is right (I have checked several times) I am not sure about how to solve the problem of extracting columns and rows using labels from a squared matrix. I have enclosed a text file with the idea in order to understand it better. Thanks again, and sorry for the inconvenience. Best, AJ Date: Sun, 15 Jul 2012 14:53:47 -0400 Subject: Re: [R] extracting rows and columns from a big matrix From: jholt...@gmail.com To: anxu...@hotmail.com CC: r-help@r-project.org For a start, you are missing a quote and a parenthese on the statement; probably should be: (another quote was also missing) n-subset(m, select=c(X1, X7, X12,X15, X22, X26, X31, X34, X39, X44, X51, X58)) Not sure what you want with the rownames; an example would help and post with 'dput'. On Sun, Jul 15, 2012 at 2:47 PM, A J anxu...@hotmail.com wrote: Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. Original Square Matrix X1 X7 X12 X15 X22 X26 X31 X34 X39 X44 X51 X1 1 2 3 4 5 6 7 8 9 10 11 X7 11 9 7 5 3 1 10 8 6 4 2 X12 3 4 7 8 5 7 2 9 1 3 2 X15 9 9 8 4 7 1 1 3 2 5 3 X22 6 7 7 4 4 2 9 8 8 1 1 X26 3 9 4 8 5 7 6 1 2 3 8 X31 1 2 1 3 1 4 1 5 1 6 1 X34 6 7 8 5 2 9 5 1 6 8 9 X39 4 8 7 4 6 5 1 9 2 7 5 X44 2 2 2 8 6 7 9 5 3 7 7 X51 9 9 9 6 6 4 8 7 2 1 3 Final Square Submatrix X1 X12 X22 X31 X1 1 3 5 7 X12 3 7 5 2 X22 6 7 4 9 X31 1 1 1 1__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read mignight as 24:00 and not as 0:00
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sandy Adriaenssens Sent: Friday, July 13, 2012 3:52 AM To: r-help@r-project.org Subject: [R] read mignight as 24:00 and not as 0:00 Dear all, I have dataset which contains date and time in the format yearmonthdayhour. I can read in these data correctly as follows: mydata - read.csv(pm10_corine_gridcel_hourly_2011.csv, header = TRUE) mydata$date - as.POSIXct(strptime(mydata$date, format = %Y%m%d%H, tz=UTC)) However, midnight is defined as 24:00 in my original file (so the end of the day), while the POSIXct function changes this to 0:00 (the beginning of the next day). So, my data now go from January 1 2011 1:00 to Januari 1 2012 0:00, in stead of December 31 2011 24:00. summary(mydata$date) Min. 1st Qu.Median 2011-01-01 01:00:00 2011-04-02 06:45:00 2011-07-02 12:30:00 Mean 3rd Qu. Max. 2011-07-02 12:30:00 2011-10-01 18:15:00 2012-01-01 00:00:00 I would like to change this 0:00 to 24:00 again since I want to include these values in daily averages of the previous day (and not of the next day). So the day of the month should also be diminished by 1. I have tried extracting the hours which are 0 and converting them to 24, but then I can't paste them back in the date/time of the original data.fram again. Are there maybe other solutions? Thanks in advance, Sandy ifelse (as.POSIXlt(mydata[24,1])$hour = 0,as.POSIXlt(mydata[24,1])$hour = 24 Sandy, You really haven't given us enough information to provide a solution, but here are some questions and suggestions. Do you have any times less than 01:00:00 ? You mention going from 01:00:00 to 24:00:00 in you data. I presume these are text fields and not time objects. Do you have fractional hours represented in your data, or are all times on the hour? 1. If your times are always on the hour no minutes or second, i.e. 01:00 to 24:00, then you could read them as is and then just subtract 1 hour from all date/time values. 2. If you have fractional hours, e.g. 00:32:00 or 11:45, then you could possible just read the date/time values and whenever the time is exactly 00:00:00, subtract 1 second from the value. this will at least get you just before midnight on the previous day. Whether either of these approaches will work for you depends on what your actual needs are. If this doesn't work for you, you will need to write back to R-help and explain more about what your actual needs are, and and provide more detail about you actual dates and times (see questions above. Hope this is somewhat helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows and columns from a big matrix
Is this what you want: x - read.table(text = X1 X7 X12 X15 X22 X26 X31 X34 X39 X44 X51 + X1 1 2 3 4 5 6 7 8 9 10 11 + X7 11 9 7 5 3 1 10 8 6 4 2 + X12 3 4 7 8 5 7 2 9 1 3 2 + X15 9 9 8 4 7 1 1 3 2 5 3 + X22 6 7 7 4 4 2 9 8 8 1 1 + X26 3 9 4 8 5 7 6 1 2 3 8 + X31 1 2 1 3 1 4 1 5 1 6 1 + X34 6 7 8 5 2 9 5 1 6 8 9 + X39 4 8 7 4 6 5 1 9 2 7 5 + X44 2 2 2 8 6 7 9 5 3 7 7 + X51 9 9 9 6 6 4 8 7 2 1 3, header = TRUE) indx - c(X1, X12, X22, X31) x[indx, indx] X1 X12 X22 X31 X1 1 3 5 7 X12 3 7 5 2 X22 6 7 4 9 X31 1 1 1 1 On Sun, Jul 15, 2012 at 3:43 PM, A J anxu...@hotmail.com wrote: Sorry so much for mistakes. It was an example code and I commited some mistakes typing it. But meaning the original code is right (I have checked several times) I am not sure about how to solve the problem of extracting columns and rows using labels from a squared matrix. I have enclosed a text file with the idea in order to understand it better. Thanks again, and sorry for the inconvenience. Best, AJ Date: Sun, 15 Jul 2012 14:53:47 -0400 Subject: Re: [R] extracting rows and columns from a big matrix From: jholt...@gmail.com To: anxu...@hotmail.com CC: r-help@r-project.org For a start, you are missing a quote and a parenthese on the statement; probably should be: (another quote was also missing) n-subset(m, select=c(X1, X7, X12,X15, X22, X26, X31, X34, X39, X44, X51, X58)) Not sure what you want with the rownames; an example would help and post with 'dput'. On Sun, Jul 15, 2012 at 2:47 PM, A J anxu...@hotmail.com wrote: Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgamma function
On Fri, Jul 13, 2012 at 7:55 PM, Chandler Zuo z...@stat.wisc.edu wrote: Hi, Has anyone encountered the problem of rgamma function in C? The following simplified program always dies for me, and I wonder if anyone can tell me the reason. #include Rmath.h #include time.h #include Rinternals.h SEXP generateGamma () { srand(time(NULL)); return (rgamma(5000,1)); } rgamma doesn't return an SEXP, it returns a double. Also, the srand() call is pointless. Has anyone encountered a similar problem before? Is there another way of generating Gamma random variable in C? P.S. I have no problem compiling and loading this function in R. Strange. You should get compiler warnings that the return type is incompatible. I get foo.c: In function ‘generateGamma’: foo.c:7: warning: implicit declaration of function ‘srand’ foo.c:8: error: incompatible types in return I thought the ANSI standard actually *required* a diagnostic for the incompatible return types. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] About dpik function
Hi there and thanks in advance. Nowadays I am working on the plug-in bandwidth selection with R. Firstly, my 1010 data is the return rate from Yahoo Finance. Secondly, my code is following: r=read.table(/Users/user/Desktop/research/a.txt,sep=,,header=TRUE) x-r[8:1010,] library(KernSmooth) dpik(x,scalest=minim,level=2L,kernel=normal,canonical=FALSE,gridsize=401L,range.x=range(x),truncate=TRUE) But the error happens like this: Error in Summary.factor(c(233L, 917L, 381L, 748L, 272L, 242L, 269L, 963L, : range not meaningful for factors I don't know what's wrong and i am a rookie, please help with that. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/About-dpik-function-tp4636590.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows and columns from a big matrix
Hello, Try this: dat1-read.table(text= X1 X7 X12 X15 X22 X26 X31 X34 X39 X44 X51 X1 1 2 3 4 5 6 7 8 9 10 11 X7 11 9 7 5 3 1 10 8 6 4 2 X12 3 4 7 8 5 7 2 9 1 3 2 X15 9 9 8 4 7 1 1 3 2 5 3 X22 6 7 7 4 4 2 9 8 8 1 1 X26 3 9 4 8 5 7 6 1 2 3 8 X31 1 2 1 3 1 4 1 5 1 6 1 X34 6 7 8 5 2 9 5 1 6 8 9 X39 4 8 7 4 6 5 1 9 2 7 5 X44 2 2 2 8 6 7 9 5 3 7 7 X51 9 9 9 6 6 4 8 7 2 1 3 ,sep=, header=TRUE) #Inorder to get your final submatrix: #Either this: dat1[c(1,3,5,7),c(1,3,5,7)] # or dat1[(select=c(X1,X12,X22,X31)),(select=c(X1,X12,X22,X31))] X1 X12 X22 X31 X1 1 3 5 7 X12 3 7 5 2 X22 6 7 4 9 X31 1 1 1 1 #You can convert this data.frame to matrix dat2-as.matrix(dat1[(select=c(X1,X12,X22,X31)),(select=c(X1,X12,X22,X31))]) is.matrix(dat2) [1] TRUE A.K. - Original Message - From: A J anxu...@hotmail.com To: jholt...@gmail.com Cc: r-help@r-project.org Sent: Sunday, July 15, 2012 3:43 PM Subject: Re: [R] extracting rows and columns from a big matrix Sorry so much for mistakes. It was an example code and I commited some mistakes typing it. But meaning the original code is right (I have checked several times) I am not sure about how to solve the problem of extracting columns and rows using labels from a squared matrix. I have enclosed a text file with the idea in order to understand it better. Thanks again, and sorry for the inconvenience. Best, AJ Date: Sun, 15 Jul 2012 14:53:47 -0400 Subject: Re: [R] extracting rows and columns from a big matrix From: jholt...@gmail.com To: anxu...@hotmail.com CC: r-help@r-project.org For a start, you are missing a quote and a parenthese on the statement; probably should be: (another quote was also missing) n-subset(m, select=c(X1, X7, X12,X15, X22, X26, X31, X34, X39, X44, X51, X58)) Not sure what you want with the rownames; an example would help and post with 'dput'. On Sun, Jul 15, 2012 at 2:47 PM, A J anxu...@hotmail.com wrote: Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing a subset using a loop
http://r.789695.n4.nabble.com/file/n4636585/Baumdaten_aufbereitet.csv Baumdaten_aufbereitet.csv Here you have an overview about my data frame... -- View this message in context: http://r.789695.n4.nabble.com/computing-a-subset-using-a-loop-tp4636564p4636585.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing a subset using a loop
Hi, thanks for your reply but this code just gives me a list but no subsets but I need subsets because I want to do some calculations with these subsets and want do make some plots etc. Is there a solution for my problem? I ve posted an example for the first subset... http://r.789695.n4.nabble.com/file/n4636591/A_SEF_Abies_alba.csv A_SEF_Abies_alba.csv -- View this message in context: http://r.789695.n4.nabble.com/computing-a-subset-using-a-loop-tp4636564p4636591.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to extract p-value in GenMatch function
Dear R-Users, I have a problem on extracting T-Stat and P-Value. I have written R-code below library(Matching) data(lalonde) attach(lalonde) names(lalonde) Y - lalonde$re78 Tr - lalonde$treat glm1 - glm(Tr~age+educ+black+hisp+married+nodegr+re74+re75,family=binomial,data=lalonde) pscore.predicted - predict(glm1) rr1 - Match(Y=Y,Tr=Tr,X=glm1$fitted,estimand=ATT, M=1,ties=TRUE,replace=TRUE) summary(rr1) summary(rr1) Estimate... 2624.3 AI SE.. 802.19 T-stat. 3.2714 p.val.. 0.0010702 Original number of observations.. 445 Original number of treated obs... 185 Matched number of observations... 185 Matched number of observations (unweighted). 344 In above output, I can extract Estimate and AI SE with below code: rr1$est rr1$se But the problem is I could not extract T-statistic and P-value from the above output. Could you please someone help me to resolve this problem? Thanking you, Best Regards, Shyam Basnet SLU, Uppsala, Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable (column) in a data frame
On 2012-07-15 10:01, Paulo Barata wrote: Dear Peter, Thank you. I will try to modify my programming habits. But it seems there is a flaw in R, when it accepts a reference to a non-existent variable inside a data frame with the df$var notation. This should be corrected somehow. Paulo Barata Paulo, I understand your concerns and I do think that the best thing would be to excise the $ shortcut from the language or, at least, make y$x equivalent to y[[x, exact = TRUE]]. But, as has been pointed out before, that might not be easy. Nevertheless, even y[[x]] may not be the ultimate panacea. Consider your own example: df - data.frame(a = 1:3, b=11:13) sum(df[[aaa]] == 2) #[1] 0 which results from df[[aaa]] == 2 #logical(0) The safest extraction is y[ , x]: sum(df[ , aaa] == 2) #Error in `[.data.frame`(df, , aaa) : undefined columns selected But then, this comes down to whether one thinks that addressing a nonexistent variable should result in an error or should return NULL. The bottom line probably is that the $ behaviour will not change in the near future and one would simply be well advised to be aware of its behaviour. Every language has its quirks. Just be thankful that the R language isn't as big a mess as the English language (which I do love dearly). Peter Ehlers - -- Original Message --- From: Peter Ehlersehl...@ucalgary.ca To: Paulo Baratapaulo.bar...@ensp.fiocruz.br Cc: r-help@r-project.orgr-help@r-project.org, peter dalgaard pda...@gmail.com Sent: Sun, 15 Jul 2012 09:29:11 -0700 Subject: Re: [R] variable (column) in a data frame On 2012-07-15 08:41, Paulo Barata wrote: Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: ## a- c(1,2,3) ## the variable exists, we get a correct answer a==1 ## the variable does not exist, R rightly points this out aaa==1 ## My point is, if we make a spelling mistake in a program when referring to a variable inside a data frame, using the df$var notation, there seems to be no way of getting warned about that. You could wean yourself from the $-habit. It's convenient but can lead to the problems you're experiencing (and this has been discussed before). For programming, if you're prone to make spelling errors, you should prefer df[, aaa]. See ?Extract. Peter Ehlers Thank you once again. Paulo Barata - -- Original Message --- From: peter dalgaardpda...@gmail.com To: Paulo Baratapaulo.bar...@ensp.fiocruz.br Sent: Sun, 15 Jul 2012 16:47:35 +0200 Subject: Re: [R] variable (column) in a data frame On Jul 15, 2012, at 16:30 , Paulo Barata wrote: To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##-- a- c(1,2,3) b- c(11,22,33) df- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##-- Is there some way to make R issue either a warning or an error message in such a situation? You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. --- End of Original Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. --- End of Original Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract p-value in GenMatch function
On 2012-07-15 14:37, shyam basnet wrote: Dear R-Users, I have a problem on extracting T-Stat and P-Value. I have written R-code below library(Matching) data(lalonde) attach(lalonde) names(lalonde) Y- lalonde$re78 Tr- lalonde$treat glm1- glm(Tr~age+educ+black+hisp+married+nodegr+re74+re75,family=binomial,data=lalonde) pscore.predicted- predict(glm1) rr1- Match(Y=Y,Tr=Tr,X=glm1$fitted,estimand=ATT, M=1,ties=TRUE,replace=TRUE) summary(rr1) summary(rr1) Estimate... 2624.3 AI SE.. 802.19 T-stat. 3.2714 p.val.. 0.0010702 Original number of observations.. 445 Original number of treated obs... 185 Matched number of observations... 185 Matched number of observations (unweighted). 344 In above output, I can extract Estimate and AI SE with below code: rr1$est rr1$se But the problem is I could not extract T-statistic and P-value from the above output. Could you please someone help me to resolve this problem? You could look at the code for summary.Match to see that T-stat (not surprisingly) is calculated as est/se and p.val is calculated as (1 - pnorm(abs(est/se))) * 2. summary.Match() doesn't return these values, it just prints them. Peter Ehlers Thanking you, Best Regards, Shyam Basnet SLU, Uppsala, Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing a subset using a loop
Here is an example of using your data to split it into the subsets and then computing a summary of each subset. You have to remember that what is returned from 'split' is a 'list' of 'data.frames' that as the subsets that you want and then use use 'lapply' to process each of the subsets in the list. df - read.table(C:\\Documents and Settings\\kon9407\\My Documents\\Downloads\\Baumdaten_aufbereitet (1).csv + , sep = ';' + , as.is = TRUE + , header = TRUE + ) # split the data into a list of dataframe df.s - split(df, list(df$Plot..ID., df$Baumart), drop = TRUE) head(names(df.s), 20) [1] A_2_1.Abies alba A_2_2.Abies alba A_2_3.Abies alba A_2_4.Abies alba [5] A_2_5.Abies alba A_2_6.Abies alba A_3_1.Abies alba A_3_4.Abies alba [9] A_3_5.Abies alba A_3_6.Abies alba A_4_1.Abies alba A_4_3.Abies alba [13] A_4_4.Abies alba A_4_5.Abies alba A_4_6.Abies alba B_1_2.Abies alba [17] B_1_4.Abies alba B_1_5.Abies alba B_1_6.Abies alba B_2_4.Abies alba df.s[1] $`A_2_1.Abies alba` X Plot..ID. Alter.Neuer.Wald Hoehe..m. Radius..cm. Familie BaumartDeutsch 1 1 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 2 2 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 3 3 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 4 4 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 5 5 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 6 6 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 7 7 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 8 8 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne 9 9 A_2_12 6475,64 Kieferngewaechse Abies alba Weisstanne Englisch Umfang..cm.DBH..cm. Gehoelz Bemerkungen Fotos Waldart pointID 1 European silver fir 38 12,09577569 0 NA SEF A_2_SEF 2 European silver fir NANA 1 NA SEF A_2_SEF 3 European silver fir NANA 1 NA SEF A_2_SEF 4 European silver fir NANA 1 NA SEF A_2_SEF 5 European silver fir NANA 1 NA SEF A_2_SEF 6 European silver fir NANA 1 NA SEF A_2_SEF 7 European silver fir NANA 1 NA SEF A_2_SEF 8 European silver fir NANA 1 NA SEF A_2_SEF 9 European silver fir NANA 1 NA SEF A_2_SEF transectID DBH_inch age 1 A_SEF 4,76211641338583 35,7158731003937 2 A_SEF NA NA 3 A_SEF NA NA 4 A_SEF NA NA 5 A_SEF NA NA 6 A_SEF NA NA 7 A_SEF NA NA 8 A_SEF NA NA 9 A_SEF NA NA lapply(df.s, summary) # notice the names of each of the subsets is printed $`A_2_1.Abies alba` X Plot..ID. Alter.Neuer.Wald Hoehe..m. Radius..cm. Min. :1 Length:9 Min. :2Min. :647 Length:9 1st Qu.:3 Class :character 1st Qu.:21st Qu.:647 Class :character Median :5 Mode :character Median :2Median :647 Mode :character Mean :5 Mean :2Mean :647 3rd Qu.:7 3rd Qu.:23rd Qu.:647 Max. :9 Max. :2Max. :647 FamilieBaumartDeutschEnglisch Umfang..cm. Length:9 Length:9 Length:9 Length:9 Min. :38 Class :character Class :character Class :character Class :character 1st Qu.:38 Mode :character Mode :character Mode :character Mode :character Median :38 Mean :38 3rd Qu.:38 Max. :38 NA's :8 DBH..cm.Gehoelz Bemerkungen Fotos Waldart Length:9 Min. :0. Length:9 Mode:logical Length:9 Class :character 1st Qu.:1. Class :character NA's:9 Class :character Mode :character Median :1. Mode :character Mode :character Mean :0.8889 3rd Qu.:1. Max. :1. pointID transectID DBH_inch age Length:9 Length:9 Length:9 Length:9 Class :character Class :character Class :character Class :character Mode :character Mode :character Mode :character Mode :character $`A_2_2.Abies alba` X Plot..ID. Alter.Neuer.Wald Hoehe..m. Radius..cm. Min. :12 Length:1 Min. :2Min. :660 Length:1 1st Qu.:12 Class :character 1st Qu.:2
Re: [R] read mignight as 24:00 and not as 0:00
Extract the date separately from the time initially, and keep it separate. When you want to process daily data, use that column. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Daniel Nordlund djnordl...@frontier.com wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sandy Adriaenssens Sent: Friday, July 13, 2012 3:52 AM To: r-help@r-project.org Subject: [R] read mignight as 24:00 and not as 0:00 Dear all, I have dataset which contains date and time in the format yearmonthdayhour. I can read in these data correctly as follows: mydata - read.csv(pm10_corine_gridcel_hourly_2011.csv, header = TRUE) mydata$date - as.POSIXct(strptime(mydata$date, format = %Y%m%d%H, tz=UTC)) However, midnight is defined as 24:00 in my original file (so the end of the day), while the POSIXct function changes this to 0:00 (the beginning of the next day). So, my data now go from January 1 2011 1:00 to Januari 1 2012 0:00, in stead of December 31 2011 24:00. summary(mydata$date) Min. 1st Qu.Median 2011-01-01 01:00:00 2011-04-02 06:45:00 2011-07-02 12:30:00 Mean 3rd Qu. Max. 2011-07-02 12:30:00 2011-10-01 18:15:00 2012-01-01 00:00:00 I would like to change this 0:00 to 24:00 again since I want to include these values in daily averages of the previous day (and not of the next day). So the day of the month should also be diminished by 1. I have tried extracting the hours which are 0 and converting them to 24, but then I can't paste them back in the date/time of the original data.fram again. Are there maybe other solutions? Thanks in advance, Sandy ifelse (as.POSIXlt(mydata[24,1])$hour = 0,as.POSIXlt(mydata[24,1])$hour = 24 Sandy, You really haven't given us enough information to provide a solution, but here are some questions and suggestions. Do you have any times less than 01:00:00 ? You mention going from 01:00:00 to 24:00:00 in you data. I presume these are text fields and not time objects. Do you have fractional hours represented in your data, or are all times on the hour? 1. If your times are always on the hour no minutes or second, i.e. 01:00 to 24:00, then you could read them as is and then just subtract 1 hour from all date/time values. 2. If you have fractional hours, e.g. 00:32:00 or 11:45, then you could possible just read the date/time values and whenever the time is exactly 00:00:00, subtract 1 second from the value. this will at least get you just before midnight on the previous day. Whether either of these approaches will work for you depends on what your actual needs are. If this doesn't work for you, you will need to write back to R-help and explain more about what your actual needs are, and and provide more detail about you actual dates and times (see questions above. Hope this is somewhat helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is it possible to start R-GUI in a windowed state under Windows OS
Is it possible to start R-Gui in a windowed state under windows? (I am running Windows 7 and Vista) I have the set the property for R icon to normal window option, but that has no effect. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package MuMIn (dredge): Error in ret[, ] - cbind(x, se, rep(if (is.null(df)) NA_real_ else df, : number of items to replace is not a multiple of replacement length.
I have also reinstalled the MuMIn package as suggested at... http://r.789695.n4.nabble.com/Error-message-number-of-items-to-replace-is-not-a-multiple-of-replacement-length-td3257893.html ...however, this made no difference. any help is appreciated. thank you -- View this message in context: http://r.789695.n4.nabble.com/Package-MuMIn-dredge-Error-in-ret-cbind-x-se-rep-if-is-null-df-NA-real-else-df-number-of-items-to-re-tp4636105p4636604.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting rows and columns from a big matrix
Hello, In my previous email, I used index to subset the data. Then, I looked at your code. I guess you wanted to try the subset function to get the same output. Try this: dat1-read.table(text= X1 X7 X12 X15 X22 X26 X31 X34 X39 X44 X51 X1 1 2 3 4 5 6 7 8 9 10 11 X7 11 9 7 5 3 1 10 8 6 4 2 X12 3 4 7 8 5 7 2 9 1 3 2 X15 9 9 8 4 7 1 1 3 2 5 3 X22 6 7 7 4 4 2 9 8 8 1 1 X26 3 9 4 8 5 7 6 1 2 3 8 X31 1 2 1 3 1 4 1 5 1 6 1 X34 6 7 8 5 2 9 5 1 6 8 9 X39 4 8 7 4 6 5 1 9 2 7 5 X44 2 2 2 8 6 7 9 5 3 7 7 X51 9 9 9 6 6 4 8 7 2 1 3 ,sep=, header=TRUE) subset(dat1,subset=row.names(dat1)%in% c(X1,X12,X22,X31),select=c(X1,X12,X22,X31)) X1 X12 X22 X31 X1 1 3 5 7 X12 3 7 5 2 X22 6 7 4 9 X31 1 1 1 1 A.K. - Original Message - From: A J anxu...@hotmail.com To: jholt...@gmail.com Cc: r-help@r-project.org Sent: Sunday, July 15, 2012 3:43 PM Subject: Re: [R] extracting rows and columns from a big matrix Sorry so much for mistakes. It was an example code and I commited some mistakes typing it. But meaning the original code is right (I have checked several times) I am not sure about how to solve the problem of extracting columns and rows using labels from a squared matrix. I have enclosed a text file with the idea in order to understand it better. Thanks again, and sorry for the inconvenience. Best, AJ Date: Sun, 15 Jul 2012 14:53:47 -0400 Subject: Re: [R] extracting rows and columns from a big matrix From: jholt...@gmail.com To: anxu...@hotmail.com CC: r-help@r-project.org For a start, you are missing a quote and a parenthese on the statement; probably should be: (another quote was also missing) n-subset(m, select=c(X1, X7, X12,X15, X22, X26, X31, X34, X39, X44, X51, X58)) Not sure what you want with the rownames; an example would help and post with 'dput'. On Sun, Jul 15, 2012 at 2:47 PM, A J anxu...@hotmail.com wrote: Hi there and thanks in advance. I have a large symmetrical matrix stored in a text file. After load in R I would like to extract the same number of columns and rows (symmetrical submatrix) using their labels. I have tried this code in order to extract columns, but R console gives me the + sign at the end of the code, pointing out incomplete command, so it is not working: m-read.table(C:/backup/symmetrical.csv) n-subset(m, select=c(X1, X7, X12, X15, X22, X26, X31, X34, X39, X44, x51, X58) Therefore, I have no tried with row names yet. Any suggestions? Sorry for the inconvenience. I have read some information about this but always have the same problem with + and I do not have any idea to follow. Best, AJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RSQLite install problem
Hi, I'm trying to install RSQLite on R 2.14 Ubuntu 12.04 i686. The installation always gets stalled and ends up not working. I installed libsqlite3-dev but still no luck. Anyone know how to solve this? $ R CMD INSTALL RSQLite_0.11.1.tar.gz * installing to library /home/ubuntu/R/i686-pc-linux-gnu-library/2.14 * installing *source* package RSQLite ... ** package RSQLite successfully unpacked and MD5 sums checked checking for gcc... gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for gcc... (cached) gcc -std=gnu99 checking whether we are using the GNU C compiler... (cached) yes checking whether gcc -std=gnu99 accepts -g... (cached) yes checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed checking for library containing fdatasync... none required configure: creating ./config.status config.status: creating src/Makevars ** libs gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c RS-DBI.c -o RS-DBI.o gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c RS-SQLite.c -o RS-SQLite.o gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c param_binding.c -o param_binding.o gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c sqlite-all.c -o sqlite-all.o sqlite-all.c:1:35: warning: extra tokens at end of #ifdef directive [enabled by default] ^Cmake: *** wait: No child processes. Stop. make: *** Waiting for unfinished jobs make: *** wait: No child processes. Stop. ** R ^C * removing /home/ubuntu/R/i686-pc-linux-gnu-library/2.14/RSQLite ubuntu@ip-10-99-65-94:~/R/kaggle/diabetes$ ubuntu@ip-10-99-65-94:~/R/kaggle/diabetes$ ubuntu@ip-10-99-65-94:~/R/kaggle/diabetes$ ubuntu@ip-10-99-65-94:~/R/kaggle/diabetes$ R CMD INSTALL RSQLite_0.11.1.tar.gz * installing to library /home/ubuntu/R/i686-pc-linux-gnu-library/2.14 * installing *source* package RSQLite ... ** package RSQLite successfully unpacked and MD5 sums checked checking for gcc... gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for gcc... (cached) gcc -std=gnu99 checking whether we are using the GNU C compiler... (cached) yes checking whether gcc -std=gnu99 accepts -g... (cached) yes checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed checking for library containing fdatasync... none required configure: creating ./config.status config.status: creating src/Makevars ** libs gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c RS-DBI.c -o RS-DBI.o gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c RS-SQLite.c -o RS-SQLite.o gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_SOUNDEX -DSQLITE_MAX_VARIABLE_NUMBER=4 -DSQLITE_MAX_COLUMN=3 -DTHREADSAFE=0 -fpic -O3 -pipe -g -c param_binding.c -o param_binding.o gcc -std=gnu99 -I/usr/share/R/include -DRSQLITE_USE_BUNDLED_SQLITE
[R] enquiry
hi, i am new to r ,i have a xlsx data with me with 12 sheet in it and need to convert it to csv first and then need to convert it into time series ,so if u can pls guide me a little how to do it. Regards karan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in as.xts
Hi I got the following error using as.xts Error in xts(x, order.by = order.by, frequency = frequency, ...) : NROW(x) must match length(order.by) Here is how the data looks like d1 - read.csv(file.path(dataDir,AppendixA-FishCountsTable-2009.csv), as.is=T) d1[1:3,] dive_id date time species count sizesite depth level TRANSECT VIS_M 1 62 10/12/2009 12:44 E. lateralis 2 15 Hopkins15 B 1 4 2 62 10/12/2009 12:44 E. lateralis 1 22 Hopkins15 B 1 4 3 62 10/12/2009 12:44 E. lateralis 1 25 Hopkins15 B 1 4 diveData_2009 - as.xts( d1,order.by=as.POSIXct(strptime(paste(d$date, d$TIME ), %d/%m/%Y %H:%M) )) Error in xts(x, order.by = order.by, frequency = frequency, ...) : NROW(x) must match length(order.by) I could not figure out how to correct it Thank you for your help Yolande [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.