Re: [R] Get means of matrix
Hi: Here's another way to look at the problem. Instead of manually adding a new column after k datasets have been read in, read your individual data files into a list, as long as they all have the same variable names and the same class (in this case, data.frame). Then create a vector of names for the list components and use 'apply family' logic to get the column means, returning the combined results to a data frame or matrix. Here's a toy example to illustrate the point. Firstly, three data frames are created and saved to external files: # Create some artificial data and ship to external files d1 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25)) d2 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25)) d3 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25)) write.csv(d1, file = "d1.csv", row.names = TRUE, quote = FALSE) write.csv(d2, file = "d2.csv", row.names = TRUE, quote = FALSE) write.csv(d3, file = "d3.csv", row.names = TRUE, quote = FALSE) ### # Now, read them back in and store them in a list object # Vector of file names to process files <- paste0("d", 1:3, ".csv") # Create the list of data frames and assign names to list components L <- lapply(files, function(x) read.csv(x, header = TRUE)) names(L) <- paste0("d", 1:3) # Compute column means from each list component and row bind them # Method 1: base R do.call(rbind, lapply(L, colMeans)) # Method 2: plyr package library(plyr) ldply(L, colMeans) Dennis On Wed, Nov 18, 2015 at 2:19 AM, Jesús Para Fernándezwrote: > Hi everyone > > I have a dataframe "data" wich is the result of join multiple csv (400 rows > and 600cols every csv). The "data" dataframe has n rows and m columns (20 > rows and 600 cols) , and I have add a new colum, "csvdata", in which I > specify the number of csv at wich those data belong. > > So, the dataframe "data" looks like: > > x1x2 x3xncsvdata > 21 2332121 > 27 2139141 > 24 2230111 > .. > 21 2432 19 2 > 27 2139142 > .. > 27 22 3011n > > > > I want to store into a matrix the mean values of different substes of data of > every csv, for example: > > region1,1 (rows 1:20,columns 1:20) for every "csvdata" value > region 2,1 (rows 21:40,columns 1:20) para every "csvdata" value > > > And so on for hole data.frame. > > I have tryed: > > area1<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T) > area2<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T) > > But this error is the output I obtain: > > Error in tapply(data[1:30, ], datos$nueva, mean, na.rm = T) : > arguments must have same length > > I´m sure that it is not very complex to do it, but I have no idea of how to > do it. > > Thanks for all. > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting column names with spaces with write.csv() XXXX
...or, when trying to read it back into R, read.csv(header = TRUE, text = ' "no good","no good at all" 1,4 2,5 3,6') no.good no.good.at.all 1 1 4 2 2 5 3 3 6 read.csv(header = TRUE, check.names = FALSE, text = ' "no good","no good at all" 1,4 2,5 3,6') no good no good at all 1 1 4 2 2 5 3 3 6 Thanks for the MWE, Jim. Dennis On Sat, Sep 5, 2015 at 7:57 PM, Jim Lemonwrote: > Hi Dan, > I don't get this behavior with R-3.2.2: > > da.df<-data.frame(1:3,4:6) > names(da.df)<-c("no good","no good at all") >> da.df > no good no good at all > 1 1 4 > 2 2 5 > 3 3 6 > write.csv(da.df,file="gloop.csv",row.names=FALSE) > > gives me this: > > "no good","no good at all" > 1,4 > 2,5 > 3,6 > > Are you sure that R didn't add the periods when you created the data frame? > They would then be written into the CSV file. > > Jim > > > > On Sun, Sep 6, 2015 at 12:09 PM, Dan Abner wrote: > >> Hi all, >> >> I have a number of columns with spaces in the column names exactly as >> I want them in R. When I go to export the data table to csv using >> write.csv(), the fn adds periods in place of the spaces. >> >> Is there a way to suppress the addition of the periods? >> >> Thanks, >> >> Dan >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else statement for rain data to define zero for dry and one to wet
I'm sorry, but I have to take issue with this particular use case of ifelse(). When the goal is to generate a logical vector, ifelse() is very inefficient. It's better to apply a logical condition directly to the object in question and multiply the result by 1 to make it numeric/integer rather than logical. To illustrate this, consider the following toy example. The function f1 replicates the suggestion to apply ifelse() columnwise (with the additional overhead of preallocating storage for the result), whereas the function f2 applies the logical condition on the matrix itself using vectorization, with the recognition that a matrix is an atomic vector with a dim attribute. set.seed(5290) # 1000 x 1000 matrix m - matrix(sample(c(0, 0.05, 0.2), 1e6, replace = TRUE), ncol = 1000) f1 - function(mat) { newmat - matrix(NA, ncol = ncol(mat), nrow = nrow(mat)) for(i in seq_len(ncol(mat))) newmat[, i] - ifelse(mat[, i] 0.1, 1, 0) newmat } f2 - function(mat) 1 * (mat 0.1) On my system, I got system.time(m1 - f1(m)) user system elapsed 0.140.000.14 system.time(m2 - f2(m)) user system elapsed 0.010.000.01 identical(m1, m2) [1] TRUE The all too common practice of using ifelse(condition, 1, 0) on an atomic vector is easily replaced by 1 * (condition), where the result of condition is a logical atomic object coerced to numeric. To reduce memory, one should better define f2 as f2 - function(mat) 1L * (mat 0.1) but doing so in this example no longer creates identical objects since typeof(m1) [1] double Thus, f1 is not only inefficient in terms of execution time, it's also inefficient in terms of storage. Given several recent warnings in this forum about the inefficiency of ifelse() and the dozens of times I've seen the idiom implemented in f1 as a solution over the last several years (to which I have likely contributed in my distant past as an R-helper), I felt compelled to say something about this practice, which BTW extends not just to 0/1 return values but to 0/x return values, where x is a nonzero real number. Dennis On Sat, Jun 6, 2015 at 12:50 AM, Jim Lemon drjimle...@gmail.com wrote: Hi rosalinazairimah, I think the problem is that you are using if instead of ifelse. Try this: wet_dry-function(x,thresh=0.1) { for(column in 1:dim(x)[2]) x[,column]-ifelse(x[,column]=thresh,1,0) return(x) } wet_dry(dt) and see what you get. Also, why can I read your message perfectly while everybody else can't? Jim -Original Message- From: roslina...@gmail.com Sent: Fri, 5 Jun 2015 16:49:08 +0800 To: r-help@r-project.org Subject: [R] if else statement for rain data to define zero for dry and one to wet Dear r-users, I have a set of rain data: X1950 X1951 X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960 X1961 X1962 1 0.0 0.0 14.3 0.0 13.5 13.2 4.0 0 3.3 0 0 0.0 2 0.0 0.0 21.9 0.0 10.9 6.6 2.1 0 0.0 0 0 0.0 3 25.3 6.7 18.6 0.8 2.3 0.0 8.0 0 0.0 0 0 11.0 4 12.7 3.4 37.2 0.9 8.4 0.0 5.8 0 0.0 0 0 5.5 5 0.0 0.0 58.3 3.6 21.1 4.2 3.0 0 0.0 0 0 15.9 I would like to go through each column and define each cell with value greater than 0.1 mm will be 1 and else zero. Hence I would like to attach the rain data and the category side by side: 1950 state 1 0.00 2 0.00 3 25.3 1 4 12.7 1 5 0.00 ... This is my code: wet_dry - function(dt) { cl - length(dt) tresh - 0.1 for (i in 1:cl) { xi - dt[,i] if (xi tresh ) 0 else 1 } dd - cbind(dt,xi) dd } wet_dry(dt) Results: wet_dry(dt) X1950 X1951 X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960 X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 X1973 X1974 X1975 X1976 X1977 10.0 0.0 14.3 0.0 13.5 13.2 4.0 0.0 3.3 0.0 0.0 0.0 4.2 0.0 2.2 0.0 4.4 5.1 0 7.2 0.0 0.0 0.0 5.1 0 0.0 0 0.3 20.0 0.0 21.9 0.0 10.9 6.6 2.1 0.0 0.0 0.0 0.0 0.0 8.4 0.0 4.0 0.0 4.9 0.7 0 0.0 0.0 0.0 0.0 5.4 0 3.3 0 0.3 3 25.3 6.7 18.6 0.8 2.3 0.0 8.0 0.0 0.0 0.0 0.0 11.0 4.2 0.0 2.0 0.0 14.2 17.1 0 0.0 0.0 0.0 0.0 2.1 0 1.7 0 4.4 4 12.7 3.4 37.2 0.9 8.4 0.0 5.8 0.0 0.0 0.0 0.0 5.5 0.0 0.0 5.4 0.0 6.4 14.9 0 10.1 2.9 143.4 0.0 6.1 0 0.0 0 33.5 It does not work and give me the original data. Why is that? Thank you so much for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] geom_text in ggplot (position)
This problem is discussed in Winston Chang's R Graphics Handbook, section 3.9. Adapting his code to this example: test - data.frame(variables = c(PE_35, PE_49), value1=c(13,3), value2=c(75,31), value3=c(7,17), value4 =c(5,49)) library(reshape2) library(ggplot2) library(plyr) m - melt(test, variables) m$cO - gl(2, 2, length = nrow(m), labels = c(A, B)) m$cat - gl(2, 4, labels = c(0, 1)) names(m)[3] - recuento mm - ddply(m, .(variables, cat), mutate, pos = cumsum(recuento) - 0.5 * recuento) ## this is the key part ggplot(mm, aes(x = cat, y = recuento, fill = cO)) + geom_bar(stat = identity) + geom_text(aes(y = pos, label = recuento)) + facet_wrap(~ variables) Dennis On Tue, May 12, 2015 at 9:50 AM, AURORA GONZALEZ VIDAL aurora.gonzal...@um.es wrote: Hello everybody. I have an esthetic question. I have managed to create a stacked and grouped bar plot but I don't manage with putting the text in the middle of the bar plots. Do you know how to write the numbers in that position? Thank you so much. Example code: test - data.frame(variables = c(PE_35, PE_49), value1=c(13,3), value2=c(75,31), value3=c(7,17), value4 =c(5,49)) library(reshape2) # for melt melted - melt(test, variables) melted$cO - c(A,A,B,B,A,A,B,B) melted$cat - '' melted[melted$variable == 'value1' | melted$variable == 'value2',]$cat - 0 melted[melted$variable == 'value3' | melted$variable == 'value4',]$cat - 1 names(melted)[3] - recuento library(ggplot2) ggplot(melted, aes(x = cat, y = recuento,ymax=max(recuento)*1.05, fill = cO)) + geom_bar(stat = 'identity', position = 'stack', col=black) + facet_grid(~ variables)+ geom_text(aes(label = recuento), size = 5, hjust = 0.5, vjust = 1, position =stack) -- Aurora González Vidal Sección Apoyo Estadístico. Servicio de Apoyo a la Investigación (SAI). Vicerrectorado de Investigación. Universidad de Murcia Edif. SACE . Campus de Espinardo. 30100 Murcia @. aurora.gonzal...@um.es T. 868 88 7315 F. 868 88 7302 www.um.es/sai www.um.es/ae [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code/package for calculation of Wasserstein distance between two densities
Hi Ranjan: Try this: library(sos) findFn(Wasserstein) It appears there are three packages that might be relevant: HistDAWass, transport and TDA. HTH, Dennis On Sat, Apr 18, 2015 at 8:06 PM, Ranjan Maitra maitra.mbox.igno...@inbox.com wrote: Dear friends, Before reinventing the wheel, I was wondering if anyone can point me to code for calculating the Wasserstein distance between two densities. I am particularly interested in mixture densities (in functional form). I know that we have the earthmovers distance in R via the emdist package but it appears to me upon a quick look that this can not handle densities in functional form. So, I was wondering if anyone had any ideas on code for this problem. Many thanks and best wishes, Ranjan Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords protects your account. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transpose a data frame according to a specific variable
One way is to use the reshape2 package: library(reshape2) dcast(DF, id ~ Year, value.var = Day) Dennis On Mon, Feb 9, 2015 at 7:47 AM, jeff6868 geoffrey_kl...@etu.u-bourgogne.fr wrote: Dear R-users, I would like to transpose a large data.frame according to a specific column. Here's a reproductible example, it will be more understandable. At the moment, my data.frame looks like this example: DF - data.frame(id=c(A,A,A,B,B,B,C,C,C), Year=c(2001,2002,2003,2002,2003,2004,2000,2001,2002), Day=c(120,90,54,18,217,68,164,99,48)) I would like it being transformed to this (fake example again, still just for being understandable): finalDF - data.frame(id=c(A,B,C),2000=c(NA,NA,164),2001=c(120,NA,99), 2002=c(90,18,48),2003=c(54,217,NA),2004=c(NA,68,NA)) Any ideas for doing this easily? I haven't found any good answer on the web. Thanks for the help! -- View this message in context: http://r.789695.n4.nabble.com/transpose-a-data-frame-according-to-a-specific-variable-tp4702971.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] color heatmap according to value ranges
Hi: You get a gradient for your response variable because it is numeric. You need to convert it to factor. (Discrete color sets need to be mapped to discrete variables - a factor is one way to generate a discrete variable.) Here is one approach to get what you appear to be looking for. DF - data.frame(a = rep(1:3, each = 3), b = 1:3, d = 1:9) DF$e - cut(DF$d, c(0, 2, 3, 5, 10), rightmost = TRUE, labels = c(1-2, 2-3, 3-5, 5)) library(ggplot2) ggplot(DF, aes(x = a, y = b, fill = e)) + geom_tile() + scale_fill_manual(values = c(darkblue, blue, lightblue, white)) You need a border of some kind around the plot since the right side of the graph is the same color as the default background. If you're OK with the default graph above, that's fine. If you want to expand it to the edges, you need to draw your own border, something like ggplot(DF, aes(x = a, y = b, fill = e)) + geom_tile() + scale_fill_manual(values = c(darkblue, blue, lightblue, white)) + # eliminate the padding from each scale scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + # draw the border theme(panel.border = element_rect(colour = black, fill = NA)) The problem that remains is an inability to distinguish the legend key color in the last category from the plot background. The simplest workaround is to change the color of the last category in the scale specification - one suggestion is a light gray, but you can always substitute another color: ggplot(DF, aes(x = a, y = b, fill = e)) + geom_tile() + scale_fill_manual(values = c(darkblue, blue, lightblue, grey90)) + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + theme(panel.border = element_rect(colour = black, fill = NA)) Dennis On Fri, Feb 6, 2015 at 7:13 AM, n.hub...@ncmls.ru.nl wrote: Probably the simplest thing there is, but I can't get it to work: Example for my data: a - c(1,1,1,2,2,2,3,3,3) b - c(1,2,3,1,2,3,1,2,3) c - c(1,2,3,4,5,6,7,8,9) df - data.frame(cbind(a,b,c)) I create a heat map with c being the values: ggplot(df, aes(df$a, df$b, fill = df$c)) + geom_raster() problem: The color coding is automatically a gradient. However, I would like to color in 4 fixed colors dependent on the value in c. For example: if c=2 color darkblue if 2c=3 color blue if 3c=5 color lightblue if c5 color white In addition I would like to show a legend that illustrates this color coding. It must be very easy, but I just can't figure it out. Only find commands that make gradients... Thanks a lot in advance! __ Dr. Nina C. Hubner scientist quantitative proteomics Department of Molecular Biology, Radboud University Nijmegen, The Netherlands e-mail: n.hub...@ncmls.ru.nl tel: +31-24-3613655 Visiting address: Department of Molecular Biology, RIMLS, 2nd floor Geert Grooteplein 26/28 6525 GA Nijmegen The Netherlands The Radboud University Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Paste every two columns together
Hi: Don't know about performance, but this is fairly simple for operating on atomic vectors: x - c(A, A, G, T, C, G) apply(embed(x, 2), 1, paste0, collapse = ) [1] AA GA TG CT GC Check the help page of embed() for details. Dennis On Wed, Jan 28, 2015 at 3:55 PM, Kate Ignatius kate.ignat...@gmail.com wrote: I have genetic data as follows (simple example, actual data is much larger): comb = ID1 A A T G C T G C G T C G T A ID2 G C T G C C T G C T G T T T And I wish to get an output like this: ID1 AA TG CT GC GT CG TA ID2 GC TG CC TG CT GT TT That is, paste every two columns together. I have this code, but I get the error: Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 conc - function(x) { s - seq(2, nchar(x), 2) paste0(x[s], x[s+1]) } combn - as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) Thanks in advance! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to determine power in my analysis?
Hi Kristi: I think this paper elucidates the problem Bert mentioned. A thorough and careful reading of the last two sections should clarify what post-hoc power is and is not. http://www.stat.uiowa.edu/files/stat/techrep/tr378.pdf Dennis On Sat, Nov 8, 2014 at 11:25 AM, Kristi Glover kristi.glo...@hotmail.com wrote: Hi Bert, Thanks for the message. So far I know we can test whether my sample size in my analysis is enough or not. It is also post hoc property. For example, we can calculate standard deviations, error variance etc in the data sets, and then we can use them to determine whether the sample size was enough or not with certain level of alpha and power. we can do it is some of the statistical programs, but I was not aware in R. thanks KG Date: Sat, 8 Nov 2014 10:55:56 -0800 Subject: Re: [R] how to determine power in my analysis? From: gunter.ber...@gene.com To: kristi.glo...@hotmail.com CC: r-h...@stat.math.ethz.ch Kristi: Power is a prespecified property of the design, not a post hoc property of the analysis (SAS procedures notwithstanding). So you're a day late and a dollar short. I suggest you consult with a local statistician about such matters, as you appear to be out of your depth. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Sat, Nov 8, 2014 at 3:49 AM, Kristi Glover kristi.glo...@hotmail.com wrote: Hi R Users, I was trying to determine whether I have enough samples and power in my analysis. Would you mind to provide some hints?. I found a several packages for power analysis but did not find any example data. I have two sites and each site has 4 groups. I wanted to test whether there was an effect of restoration activities and sites on the observed value. I used a two way factorial ANOVA and now I wanted to test the power of the analysis (whether the sample sizes are enough for the analysis? what are the alpha and power in the analysis using this data set? if it is not enough, how much samples should be collected for alpha 0.05 and power=0.8 and 0.9 for the analysis (two way factorial analysis). The example data:data-structure(list(observedValue = c(0.08, 0.53, 0.14, 0.66, 0.37, 0.88, 0.84, 0.46, 0.3, 0.61, 0.75, 0.82, 0.67, 0.37, 0.95, 0.73, 0.74, 0.69, 0.06, 0.97, 0.97, 0.07, 0.75, 0.68, 0.53, 0.72, 0.34, 0.12, 0.49, 0.77, 0.45, 0.07, 0.97, 0.34, 0.68, 0.48, 0.65, 0.7, 0.57, 0.66, 0.4, 0.29, 0.88, 0.36, 0.68, 0.32, 0.8, 0, 0.11, 0.48, 0.85, 0.94, 0.12, 0.12, 0, 0.89, 0.66, 0.2, 0.57, 0.09, 0.27, 0.81, 0.53, 0.09, 0.5, 0.41, 0.89, 0.47, 0.39, 0.85, 0.71, 0.89, 0.01, 0.71, 0.42, 0.72, 0.62, 0.3, 0.56, 0.99, 0.97, 0.03, 0.09, 0.27, 0.27, 0.94, 0.23, 0.97, 0.81, 0.95), condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c(goo! d! , ! medium, poor, verygood), class = factor), areas = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(Restored, unrestored), class = factor)), .Names = c(observedValue, condition, areas), class = data.frame, row.names = c(NA, -90L)) test= aov(observedValue~condition*areas,data=data)summary(test) power of the analysis? thanks for your help. Sincerely, KG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot barplot error
Hi: There are several options, but you shot yourself in the foot by using cbind() to combine objects of different classes into a matrix, so everything ends up being a factor because a matrix is an atomic object with a dim attribute, and atomic objects must have a common class, so Flows and Category are coerced to factors. What you want is to combine your atomic objects into a data frame, since that is what ggplot() requires: #-- Date - c(2014-04-01,2014-04-02,2014-04-03,2014-04-04,2014-04-07, 2014-04-09,2014-04-10,2014-04-01,2014-04-02,2014-04-03, 2014-04-04,2014-04-07,2014-04-09,2014-04-10) Flows - c(479.6, 187.2, 148.6, 41.5, 123.5, 176.3, 68.3, 401.5, -164.2, -195.5, 35.1, -224.0, -58.6, -138.9) Category - c(Equity, Equity, Equity, Equity, Equity, Equity, Equity, Debt, Debt, Debt, Debt, Debt, Debt, Debt) # Use data.frame, not cbind DF - data.frame(Date, Flows, Category) #- Your ggplot() code will work, but probably not the way you intended. Here are a couple of versions of your initial plot: #-- library(ggplot2) # Initial plot, stacked ggplot(data = DF, aes(x = Date, y = Flows, fill = Category)) + geom_bar(stat=identity) # Initial plot, dodged ggplot(data = DF, aes(x = Date, y = Flows, fill = Category)) + geom_bar(stat=identity, position = dodge) #-- I'm guessing what you want is to 'stack' the dodged version above; if so, you need to plot the two groups separately and create a fill factor 'on the fly': #--- # Stacking the dodged plot - requires separate calls # to each subset so that zero is the anchor ggplot(data = DF, mapping = aes(x = Date, y = Flows)) + geom_bar(data = subset(DF, Category == Equity), stat = identity, aes(fill = Equity)) + geom_bar(data = subset(DF, Category == Debt), stat = identity, aes(fill = Debt)) + scale_fill_manual(Category, values = c(Equity = blue, Debt = red)) #--- If you want to plot on the actual dates rather than the given dates, and perhaps to change the date format since the year is common, here is one way to do it: #--- # Convert to Date format so that we can plot by actual date DF$Date - as.Date(as.character(DF$Date)) # Need scales package to change date format library(scales) ggplot(data = DF, mapping = aes(x = Date, y = Flows)) + geom_bar(data = subset(DF, Category == Equity), stat = identity, aes(fill = Equity)) + geom_bar(data = subset(DF, Category == Debt), stat = identity, aes(fill = Debt)) + scale_fill_manual(Category, values = c(Equity = blue, Debt = red)) + scale_x_date(labels = date_format(%m-%d)) #--- I hope this covers your use case. Dennis On Thu, May 15, 2014 at 4:32 AM, Vikram Bahure economics.vik...@gmail.com wrote: Hi, I am facing issue with ggplot plotting. I have given the data and the code below. I have also attached the graph below. Problem: On date *2014-04-07*, we have debt and equity flow, but it does not shows equity flow. Ideally what I would like is the stack for debt/equity starts from X axis. *## Data* *Date - c(2014-04-01,2014-04-02,2014-04-03,2014-04-04,2014-04-07,* * 2014-04-09,2014-04-10,2014-04-01,2014-04-02,2014-04-03,* * 2014-04-04,2014-04-07,2014-04-09,2014-04-10)* *Flows - c(479.6,187.2, 148.6, 41.5, 123.5, 176.3, 68.3, 401.5,* * -164.2, -195.5, 35.1, -224.0, -58.6, -138.9)* *Category - c(Equity, Equity, Equity, Equity, Equity, Equity,* * Equity, Debt, Debt, Debt, Debt, Debt,* * Debt, Debt)* *data - cbind(Date,Flows,Category)* *## GGplot* *ggplot(data = fii.df, aes(x = Date, y = FII, fill = Category)) + geom_bar(stat=identity) * Thanks in advance. Regards Vikram Bahure [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to plot data using ggplot
1. Look into the ggmap package if you want to overlay your data onto a map. 2. Re your color scale representation, define 'appropriately'. Do you mean a continuous range expressible in a colorbar or a discrete range, and if the latter, what intervals did you have in mind? Dennis On Fri, Apr 4, 2014 at 2:42 AM, drunkenphd enrr...@fimif.upt.al wrote: Hi, I have a list of cities and their coordinates, and also for each city I have a variable varA which I want to represent in a map using ggplot. For example : CityA lat 22.93977 lon 46.70663varA 545 CityB lat 23.93977 lon 46.70663varA 122 VarA values begin from 0 to 3000. I want the color scale to represent this range appropriately. Can you help Regards -- View this message in context: http://r.789695.n4.nabble.com/How-to-plot-data-using-ggplot-tp4688168.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating table with sequences of numbers based on the table
Less coding with plyr: tab - read.table(text=pop Freq 1 1 30 2 2 25 3 3 30 4 4 30 5 5 30 6 6 30 7 7 30,sep=,header=TRUE) # Function to do the work on each row f - function(pop, Freq) data.frame(ind = seq_len(Freq)) library(plyr) u - mdply(tab, f)[, -2] Dennis On Thu, Mar 13, 2014 at 8:01 AM, arun smartpink...@yahoo.com wrote: Hi, Try: Either tab - read.table(text=pop Freq 1 1 30 2 2 25 3 3 30 4 4 30 5 5 30 6 6 30 7 7 30,sep=,header=TRUE) indx - rep(1:nrow(tab),tab$Freq) tab1 - transform(tab[indx,],ind=ave(seq_along(indx),indx,FUN=seq_along))[,-2] #or tab2 - transform(tab[indx,],ind=unlist(sapply(tab$Freq,seq)))[,-2] identical(tab1,tab2) #[1] TRUE #or tab3 - transform(tab[indx,], ind= with(tab,seq_len(sum(Freq))-rep(cumsum(c(0L,Freq[-length(Freq)])),Freq)))[,-2] identical(tab1,tab3) #[1] TRUE A.K. I have a problem with transfering one table to another automatically. From table like this: tab pop Freq 1 1 30 2 2 25 3 3 30 4 4 30 5 5 30 6 6 30 7 7 30 I want to use number of individuals (freq) and then in next table just list them with following numbers (depending on total number of individuals) Like this: in popind 1 1 1 2 1 3 1 4 . . . . 1 30 2 1 2 2 2 3 2 4 . . 2 25 3 1 3 2 . . . . How can i do it? I think i have to use loops but so far I failed. Thank you in advance, Best, Malgorzata Gazda __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Math symbols in ggplot facets
Hi: You were close... library(ggplot2) m - mpg # Set the factor labels with plotmath code (note the ==) m$drv - factor(m$drv, labels = c(sigma[0] == sqrt(2), sigma[0] == 2 * sqrt(2), sigma[0] == 3 * sqrt(2))) ggplot(m, aes(x = displ, y = cty)) + geom_point() + facet_grid(. ~ drv, labeller = label_parsed) Dennis On Sat, Feb 22, 2014 at 3:24 AM, Lars Bishop lars...@gmail.com wrote: Hello, I would like to show in my facet labels the equivalent in LaTex of $\sigma_{0}= \sqrt{2}$. I think I'm close below, but not yet as it shows $(\sigma_{0}, \sqrt{2})$ m - mpg levels(m$drv) - c(sigma[0]=sqrt(2), sigma[0]=2 * sqrt(2), sigma[0]= 3 * sqrt(2)) ggplot(m, aes(x = displ, y = cty)) + geom_point() + facet_grid(. ~ drv, labeller = label_parsed) Thanks, Lars. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend text populated with calculated values and super script?
Here's a bquote version: x=c(1,2,3,4); y=c(1,2,3,4); z=c(1.25,1.5,2.5,3.5) # first stats based on data, used to populate legend wdt_n = 50; wdt_mbias = 0.58 wdt_mae = 2.1; wdt_R2 = 0.85 # second stats based on data, used to populate legend spas_n = 50; spas_mbias = 0.58 spas_mae = 2.1; spas_R2 = 0.85 wleg - bquote(paste(WDT (, N == .(wdt_n), , , Bias == .(wdt_mbias), , , MAE == .(wdt_mae), , , R^2 == .(wdt_R2), ))) sleg - bquote(paste(SPAS (, N == .(spas_n), , , Bias == .(spas_mbias), , , MAE == .(spas_mae), , , R^2 == .(spas_R2), ))) plot(x,y, col=red1, pch=1); lines(x,z, type=p, col=green4,pch=3) legend(topleft, legend = as.expression(c(sleg, wleg)), col=c(red1,green4), pch=c(1,3), cex=0.85) Dennis On Fri, Feb 7, 2014 at 4:58 PM, David Winsemius dwinsem...@comcast.net wrote: On Feb 7, 2014, at 7:54 AM, Douglas M. Hultstrand wrote: Hello, I am trying to generate a plot legend that contains calculated summary statistics, one statistic is R^2. I have tried several variations using the commands expression and bqoute as stated on the R help pages. I have not been able to get the R^2 super script correct along with the calculated statistics. I provided an example below, what I want is the legend (wdt_leg and spas_leg) as below but with the R^2 as superscript. # Example Data x=c(1,2,3,4); y=c(1,2,3,4); z=c(1.25,1.5,2.5,3.5) # first stats based on data, used to populate legend wdt_n = 50; wdt_mbias = 0.58 wdt_mae = 2.1; wdt_R2 = 0.85 # second stats based on data, used to populate legend spas_n = 50; spas_mbias = 0.58 spas_mae = 2.1; spas_R2 = 0.85 # create legend wdt_leg - paste(WDT (N = , wdt_n,, Bias = ,wdt_mbias,, MAE = ,wdt_mae, , R2 = , wdt_R2, ), sep=) spas_leg - paste(SPAS (N = , spas_n,, Bias = ,spas_mbias,, MAE = ,spas_mae, , R2 = , spas_R2, ), sep=) # create plot plot(x,y, col=red1, pch=1); lines(x,z, type=p, col=green4,pch=3) leg.txt - c(spas_leg, wdt_leg) legend(topleft, legend = leg.txt, col=c(red1,green4), pch=c(1,3), cex=0.85) sublist - structure(list(wdt_n = 50, wdt_mbias = 0.58, wdt_mae = 2.1, wdt_R2 = 0.85, spas_n = 50, spas_mbias = 0.58, spas_mae = 2.1, spas_R2 = 0.85), .Names = c(wdt_n, wdt_mbias, wdt_mae, wdt_R2, spas_n, spas_mbias, spas_mae, spas_R2)) legends -c( substitute( atop(WDT * group((, list(N == wdt_n, Bias == wdt_mbias, MAE == wdt_mae ), ) ), R^2 == wdt_R2 ) , env=sublist), substitute( atop(SPAS * group((, list(N == spas_n, Bias == spas_mbias, MAE == spas_mae ), )), R^2 == spas_R2 ), env=sublist) ) # I tried with: as.expression( lapply( exprlist, function(e) bquote(e) ) ) but failed repeatedly. # In order to get `substitute` to cook the bacon, you need to use real expressions, not text. plot(x,y, col=red1, pch=1); lines(x,z, type=p, col=green4,pch=3) legend(topleft, legend = as.expression(legends), col=c(red1,green4), pch=c(1,3), cex=0.85) -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing variable names to dplyr
David: I privately suggested he post to manipulatr because Hadley is more likely to see his question there first than in R-help. He originally posted here, noted the cross-posting and referral at manipulatr and responded back to this list when he got a successful reply from Hadley. I don't see that he's done anything wrong; in fact, he's been exceptionally polite. If you want to yell at anybody, yell at me. Dennis On Tue, Jan 28, 2014 at 3:33 PM, David Winsemius dwinsem...@comcast.net wrote: On Jan 27, 2014, at 7:45 AM, Bos, Roger wrote: All, I would like to figure out how to pass variable names to the dplyr function mutate. For example, this works because hp is one of the variable names on mtcars: mutate(mtcars, scale(hp)) Let's says I want to pass in the target variable instead of hard-coding the name, as follows: target - hp mutate(mtcars, scale(target)) That dones't work. I read somewhere about using lapply, but that suggestion didn't work for me either: target - lapply(hp, as.symbol) mutate(mtcars, scale(target)) Does anyone know how to do this? You cross-posted this to the manipulatr newsgroup (where it was addressed). Crossposting is a practice which is not appreciated in R lists. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] demonstrating R in introductory class using point-and-click software
Hi Ranjan: I think this is an important and well-posed question. AFAIR, sciviews provides a fairly sophisticated GUI for R, but I haven't looked at it in a while. The two packages I'd suggest you look at are John Fox's R Commander (package Rcmdr...and note all the plug-ins!) and Ian Fellows' Deducer package, both of which use a more sophisticated GUI than the R GUI console in Windows. Deducer runs on top of Java; it needs both rJava and JGR. If you're using 64-bit R, you also need 64-bit Java installed; if you're content with 32-bit R, then 32-bit Java is all you need. I know that Deducer runs pretty smoothly on 32-bit R, but for 64-bit, you'll need to be a bit more vigilant about the Java interface. For classroom purposes, I would think 32-bit R is sufficient. R Commander uses Tcl/Tk to program the GUI, and since a version of Tcl/Tk comes bundled with binary versions of R, you don't have to worry too much about the interface. Take notice that both packages have plug-in packages that you can load on top of the standard GUI. I find it a little disheartening that in this era, engineers in training are averse to programming, though. Sigh.. HTH, Dennis On Wed, Jan 15, 2014 at 7:22 PM, Ranjan Maitra maitra.mbox.igno...@inbox.com wrote: Dear friends, OK, I did not think that it would ever come down to this, but I am here with a question on what would be the best point-and-click approach to using R in the classroom in a way that the students can also follow and exhibit (on their own). So let me explain: I am teaching an introductory-level statistics class for introductory first- and second-year civil and industrial engineering students. This is a basic class following the book (not important): Basic Engineering Data Collection and Analysis by Stephen B. Vardeman and John Marcus. The class is very basic, and has traditionally relied on JMP and Excel (less prevalent) to illustrate data examples. I don't want to use either because I am a proponent of OSS, and also because I find these two too cumbersome to handle. Also, I don't think I have the time (and the students do not have the inclination, I am told) to handle even basic interactive programming. So, I was wondering if people with more experience would have suggestions on what would be best to use. I apologize if this has been discussed quite a bit here, but as I said before, I did not think that it would come to this, so I basically did not pay much attention. Thanks very much for suggestions and experiences! Best wishes, Ranjan -- Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. Please respond to the mailing list if appropriate. For those needing to send personal or professional e-mail, please use appropriate addresses. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with removing extra legend elements in ggplot
The additional element comes from this code: geom_point(aes(color = VegType, shape = VegType, size = 10)) Take the size argument outside the aes() statement and the legend will disappear: geom_point(aes(color = VegType, shape = VegType), size = 10) The aes() statement maps a variable to a plot aesthetic. In this case you're mapping VegType to color and shape. You want to *set* the size aesthetic to a constant value, and that is done by assigning the value 10 to the size aesthetic outside of aes(). Dennis On Tue, Nov 19, 2013 at 4:35 PM, Matthew Van Scoyoc sco...@gmail.com wrote: No dice. I still get the 10 legend element. Thanks for the quick reply. Cheers, MVS = Matthew Van Scoyoc Graduate Research Assistant, Ecology Wildland Resources Department http://www.cnr.usu.edu/wild/ Ecology Center http://www.usu.edu/ecology/ Quinney College of Natural Resources http://cnr.usu.edu/ Utah State University Logan, UT mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/ = Think SNOW! On Tue, Nov 19, 2013 at 5:12 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 19, 2013, at 3:44 PM, Matthew Van Scoyoc wrote: I can't get the fine tuning right with my legend. I get an extra legend element 10 which is the point size in my plot. Can someone help me get rid of this extra element? Additionally I would also like to reduce the size of the legend. If you want to reproduce my figure you can download my data in csv format here https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv . Here is my code... veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T) nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2)) nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10)) + scale_colour_manual(name = Vegetation Type, values = c(blue, magenta, gray50, red, cyan3, green4, gold)) + scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18, 15, 16, 17)) + theme_bw() + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.key = element_rect(color = white) ) I have been messing around with theme(..., legend.key.size = unit(1, cm)) but I keep getting the error could not find function unit. I'm not sure why, isn't unit supposed to be part of the legend.key argument? Try this workaround to what sounds like a bug: library(grid) # then repeat the call. -- David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization
There are lots of errors in your code. In particular, the optimization routines do not like functions that ignore the parameters. I would like to nominate this delicious riposte as a fortune candidate. Anyone to second the motion? Dennis On Sat, Nov 16, 2013 at 1:26 PM, Prof J C Nash (U30A) nas...@uottawa.ca wrote: There are lots of errors in your code. In particular, the optimization routines do not like functions that ignore the parameters. And you have not provided out or out1 to the optimizer -- they are returned as elements of func(), but not correctly. Please try some of the examples for optim or optimx to learn how to structure your problem. JN On 13-11-16 06:00 AM, r-help-requ...@r-project.org wrote: Message: 19 Date: Fri, 15 Nov 2013 09:17:47 -0800 (PST) From: IZHAK shabsogh ishaqb...@yahoo.com To: r-help@r-project.org r-help@r-project.org Subject: [R] optimization Message-ID: 1384535867.58595.yahoomail...@web142506.mail.bf1.yahoo.com Content-Type: text/plain x1-c(5.548,4.896,1.964,3.586,3.824,3.111,3.607,3.557,2.989,18.053,3.773,1.253,2.094,2.726,1.758,5.011,2.455,0.913,0.890,2.468,4.168,4.810,34.319,1.531,1.481,2.239,4.204,3.463,1.727) y-c(2.590,3.770,1.270,1.445,3.290,0.930,1.600,1.250,3.450,1.096,1.745,1.060,0.890,2.755,1.515,4.770,2.220,0.590,0.530,1.910,4.010,1.745,1.965,2.555,0.770,0.720,1.730,2.860,0.760) x2-c(0.137,2.499,0.419,1.699,0.605,0.677,0.159,1.699,0.340,2.899,0.082,0.425,0.444,0.225,0.241,0.099,0.644,0.266,0.351,0.027,0.030,3.400,1.499,0.351,0.082,0.518,0.471,0.036,0.721) k-rep(1,29) x-data.frame(k,x1,x2) freg-function(y,x1,x2){ reg- rlm(y ~ x1 + x2 , data=x) return(reg) } func - function(x1,x2,b){ fit-freg(y,x1,x2) b-c(coef(fit)) dv-1+ b[2]*x2^b[3] dv1-b[2]*x2^b[3]*log(x2) out - ( x1/(1+ b[2]*x2^b[3])) out1- c(-x1*x2^b[3]/dv^2,-x1* dv1/dv^2) return(list( out,out1)) } optim(par=c(b[2],b[3]), fn=out, gr =out1, method = c(BFGS), lower = -Inf, upper = Inf, control = list(), hessian = T) can someone help me try running this code because i try many occasion but prove abortive. the aim is to optimize the parameter of the model that is parameter estimates using optimization [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix values linked to vector index
Attempting to follow the OP's conditions and assuming I understood them correctly, here is one way to wrap this up into a function: makeMat - function(x) { stopifnot(is.integer(x)) nr - length(x) nc - max(x) # Initialize a matrix of zeros m - matrix(0, nr, nc) # Conditionally replace with ones for(i in seq_len(nr)) if(x[i] != 0) m[i, 1:x[i]] - 1 m } ## Examples: x1 - 1:3 x2 - as.integer(c(2, 0, 4, 3, 1)) x3 - c(2, 1, 2.2) makeMat(x1) makeMat(x2) makeMat(x3) makeMat(4:6) On Fri, Oct 11, 2013 at 9:49 AM, arun smartpink...@yahoo.com wrote: Hi, In the example you showed: m1- matrix(0,length(vec),max(vec)) 1*!upper.tri(m1) #or m1[!upper.tri(m1)] - rep(rep(1,length(vec)),vec) #But, in a case like below, perhaps: vec1- c(3,4,5) m2- matrix(0,length(vec1),max(vec1)) indx - cbind(rep(seq_along(vec1),vec1),unlist(tapply(vec1,list(vec1),FUN=seq),use.names=FALSE)) m2[indx]- 1 m2 # [,1] [,2] [,3] [,4] [,5] #[1,]11100 #[2,]11110 #[3,]11111 A.K. Hi- I'd like to create a matrix of 0's and 1's where the number of 1's in each row defined by the value indexed in another vector, and where the (value-1) is back-filled by 0's. For example, given the following vector: vec= c(1,2,3) I'd like to produce a matrix with dimensions (length(vec), max(vec)): 1,0,0 1,1,0 1,1,1 Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Library update from version
In this case the OP probably does need to reinstall contributed packages since going from 2.15.x to 3.0.y entails a major version change in R. Dennis On Mon, Sep 30, 2013 at 11:49 AM, Steve Lianoglou lianoglou.st...@gene.com wrote: Hi, On Mon, Sep 30, 2013 at 11:12 AM, Cem Girit gi...@comcast.net wrote: Hello, I recently installed version 3.0.1 of R on to a computer. I have a working installation for a Statconn application using R version 2.15.0 on another computer. I have many libraries under this old installation. Can I just copy them into the new library from the old, or do I install each one of them under the new R? No, you shouldn't do that. Also how can I get a list of differences in two libraries so that I can use this list to update the new one? A bit of googling could have provided several answers. This post in particular has a few answers from some people who know what they're doing w/ R, so probably a good place to start: http://stackoverflow.com/questions/1401904/painless-way-to-install-a-new-version-of-r-on-windows HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine multiple tables into one
Isn't this just a block diagonal matrix? library(pracma) blkdiag(table1, table2) [,1] [,2] [,3] [,4] [1,]1100 [2,]1200 [3,]0001 [4,]0004 Dennis On Wed, May 1, 2013 at 5:41 PM, David Winsemius dwinsem...@comcast.net wrote: add2blocks - function(m1, m2) { res - cbind(m1, matrix(0, dim(m2)[1], dim(m2)[2]) ) res - rbind(res, cbind( matrix(0, dim(m1)[1], dim(m1)[2]), m2) ) } new - add2block(table1, table2) new #--- [,1] [,2] [,3] [,4] row11100 row21200 row30001 row40004 On May 1, 2013, at 12:37 PM, arun wrote: Hi, May be this helps: dat1- as.data.frame(table1) dat2- as.data.frame(table2) names(dat2)-c(V3,V4) library(plyr) res-join(dat1,dat2,type=full) res[is.na(res)]- 0 res # V1 V2 V3 V4 #1 1 1 0 0 #2 1 2 0 0 #3 0 0 0 1 #4 0 0 0 4 combinedtable-as.matrix(res) colnames(combinedtable)- NULL combinedtable # [,1] [,2] [,3] [,4] #[1,]1100 #[2,]1200 #[3,]0001 #[4,]0004 A.K. Hi, I am trying to combine multiple tables into one, where the elements that are created as a result of the merge to be filled with zeroes. In other words, to go from this: #Create tables to combine row1 - c(1,1) row2 - c(1,2) row3 - c(0,1) row4 - c(0,4) table1 - rbind(row1,row2) table2 - rbind(row3, row4) table1 [,1] [,2] row111 row212 table2 [,1] [,2] row301 row404 To this: #What the combined table should look like if things worked out newrow1 - c(1,1,0,0) newrow2 - c(1,2,0,0) newrow3 - c(0,0,0,1) newrow4 - c(0,0,0,4) combinedtable - rbind(newrow1, newrow2, newrow3, newrow4) combinedtable [,1] [,2] [,3] [,4] newrow11100 newrow21200 newrow30001 newrow40004 I tried merge() but it doesn't make any sense here, and I also tried to melt() the orginal data that table1 and table2 are created from to re-created a combined table from one step further back from this but couldn't get it to form the right table. If that would be an easier solution, here is the starting point that table1 and table2 are created from and also the create.table() function I wrote to create them: create.table - function(a,b){ obs - unique(c(a,b)) squ - matrix(rep(0,length(obs)^2),ncol=length(obs)) for(i in 1:length(obs)){ for(j in 1:length(obs)){ squ[i,j] - sum((a==obs[i])*(b==obs[j]), na.rm=TRUE) } } squ } #Mock data sampleid - c(1:5) Q1before - c(0,0,1,0,1) Q1after - c(0,1,0,0,1) Q2before - c(1,0,0,0,0) Q2after - c(0,0,0,0,0) #This is what my real data looks like #It may be easier to reformat this df to a format that could then use my create.table() function to get to the combined table endpoint? mydf - as.data.frame(cbind(sampleid,Q1before, Q1after, Q2before, Q2after)) create.table(mydf$Q1before, mydf$Q1after) create.table(mydf$Q2before, mydf$Q2after) I hope at least some of this makes sense. Thank you for your input, you guys are always the best! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Arranging two different types of ggplot2 plots with axes lined up
library(ggplot2) library(gridExtra) #generate test precipitation data year-c(2000,2001,2002,2003,2004) precip-c(46,100,80,74,20) yp-data.frame(year, precip) #generate test fecal coliform data year2-c(2000,2000,2000,2000,2000,2000,2000,2000,2000,2000, 2001,2001,2001,2001,2001,2001,2001,2001,2001,2001, 2002,2002,2002,2002,2002,2002,2002,2002,2002,2002, 2003,2003,2003,2003,2003,2003,2003,2003,2003,2003, 2004,2004,2004,2004,2004,2004,2004,2004,2004,2004) fc-sample(1:1000, 50) yfc-data.frame(year2, fc) #make test precipitation plot yp_plot-ggplot(yp, aes(factor(year), y=precip)) + geom_point() + geom_line(aes(group = 1)) + labs(title = Site X \n , x = , y = Annual Precipitation (in.) \n ) + theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1), axis.text.y = element_text(angle = 90, hjust = 0.5, vjust = 0.5)) #make test fecal coliform plot yfc_plot-ggplot(yfc) + geom_boxplot() + aes(x=factor(year2), y=fc) + theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1), axis.text.y = element_text(angle = 90, hjust = 0.5, vjust = 0.5)) + xlab( \n Date) + ylab(Fecal coliforms (cfu/100 mL) \n ) + geom_smooth(stat='smooth', aes(group=1), size=1.5) + scale_y_log10() #arrange plots together grid.arrange(yp_plot, yfc_plot, ncol=1) On Fri, Apr 19, 2013 at 8:49 AM, Saalem Adera saalemad...@gmail.com wrote: Hi all, Thanks for the quick replies. Dennis - I tried rotating the y-axis tick labels 90 degrees and while the x-axes became the same width, the x-axis values still didn't line up with each other. So maybe I need to be more clear - how can I get the x-axis tick values to line up? For example, I want the 2000 x-axis tick in the upper and lower plots to be on top of each other (along with all the other x-axis ticks). Andrés - I tried the multiplot() function that you suggested, with the same result as when I used the grid.arrange() function - the x-axis tick values still don't line up. The reason I want the x-axis tick values to line up is so that with a quick look at the plots, a viewer will be able to understand the relationship between the data depicted by the boxplots and the data depicted by the line plot. Any other ideas on how to make this work? Thanks, Saalem On Thu, Apr 18, 2013 at 2:22 PM, Andrés Aragón Martínez armand...@gmail.com wrote: Hi Saalem, Check the following: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/ Regards, Andrés AM El 18/04/2013, a las 09:47, Saalem Adera saalemad...@gmail.com escribió: Hi all, I want to arrange two ggplot2 plots on the same page with their x-axes lined up - even though one is a boxplot and the other is a line plot. Is there a simple way to do this? I know I could do this using facetting if they were both the same type of plot (for example, if they were both boxplots), but I haven't been able to figure it out for two different types of plots. Below is my test case: library(ggplot2) library(gridExtra) #generate test precipitation data year-c(2000,2001,2002,2003,2004) precip-c(46,100,80,74,20) yp-data.frame(year, precip) #generate test fecal coliform data year2-c(2000,2000,2000,2000,2000,2000,2000,2000,2000,2000, 2001,2001,2001,2001,2001,2001,2001,2001,2001,2001, 2002,2002,2002,2002,2002,2002,2002,2002,2002,2002, 2003,2003,2003,2003,2003,2003,2003,2003,2003,2003, 2004,2004,2004,2004,2004,2004,2004,2004,2004,2004) fc-sample(1:1000, 50) yfc-data.frame(year2, fc) #make test precipitation plot yp_plot-ggplot(yp) + geom_point() + geom_line() + aes(year, y=precip) + opts(title=Site X \n , axis.text.x=theme_text(angle=45, hjust=1, vjust=1)) + ylab(Annual Precipitation (in.) \n ) + xlab() #make test fecal coliform plot yfc_plot-ggplot(yfc) + geom_boxplot() + aes(x=as.factor(year2), y=fc) + opts(axis.text.x=theme_text(angle=45, hjust=1, vjust=1)) + xlab( \n Date) + ylab(Fecal coliforms (cfu/100 mL) \n ) + geom_smooth(stat='smooth', aes(group=1), size=1.5) + scale_y_log10() #arrange plots together grid.arrange(yp_plot, yfc_plot, ncol=1) You can see that I got the plot areas to line up using grid.arrange(), but the x-axes are still off. I'd really appreciate any help I can get. Thanks, Saalem [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: less than equal sign
Hi: Here's a way you could do it entirely within ggplot2. The annotation functions have a parse = argument which allows you to pass character string representations of math expressions, but there is no such thing in the scale functions, so you need a different approach. library(ggplot2) df - data.frame(vis=c(0,0,1,1) , count=c(10,15,20,10) , grp=c(0,1,0,1)) # Generate a list of expressions that will become the legend labels lbs - list(expression(x = 10), expression(x 10)) ggplot(df, aes(x = factor(vis), y=count , fill=factor(grp))) + geom_bar(stat =identity) + scale_fill_manual(breaks = levels(factor(df$grp)), values = c(blue, orange), labels = lbs) The specifications in scale_fill_manual() are: - breaks: the values to go on the horizontal axis - values: the vector of fill colors for each level of grp - labels: the legend labels Notice that the labels = takes a list of expressions as its argument. This approach gives you more control over the legend and choice of fill colors at the expense of a couple of lines of code. To change the axis and legend titles, one can use the labs() function; e.g., last_plot() + labs(x = Visibility, y = Frequency, fill = Threshold) Dennis On Thu, Mar 28, 2013 at 2:22 PM, soon yi soon...@ymail.com wrote: Hi I am trying to add a less than equal sign to a plot. I have previously done this using unicode but is not working in this instance. Any suggestions would be great thanks example code: library(ggplot2) df-data.frame(vis=c(0,0,1,1) , count=c(10,15,20,10) , grp=c(0,1,0,1)) df$grp -factor(df$grp ,levels=c(0,1) , labels =c(x \u2265 10 , x 10)) ggplot(df, aes(x = factor(vis), y=count , fill=grp)) + geom_bar(stat = identity) -- View this message in context: http://r.789695.n4.nabble.com/ggplot2-less-than-equal-sign-tp4662784.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
Hi: The other day I ran 100K simulations, each of which returned a 20 x 4 data frame. I stored these in a list object. When attempting to rbind them into a single large data frame, my first thought was to try plyr: library(plyr) bigD - ldply(L, rbind) # where L is the list object I quit at around a half hour. Ditto for do.call(rbind, L). [Sorry, I didn't time it - these are approximate times.] I then checked to see if the data.table package could do this, and lo and behold, I discovered the rbindlist() function. When applied to my list object, it ran correctly in under a second. Here's the actual example with some names changed to mask the application: g - gs[1:10] # gs is a list of lists length(g) [1] 10 class(g) [1] list dim(g[[1]]) [1] 20 4 dim(g[[10]]) [1] 20 4 library(data.table) system.time(bigD - rbindlist(g)) user system elapsed 0.450.020.47 dim(bigD) [1] 200 4 class(bigD) [1] data.table data.frame Dennis On Tue, Feb 26, 2013 at 7:05 PM, David Kulp dk...@fiksu.com wrote: On Feb 26, 2013, at 9:33 PM, Anika Masters anika.mast...@gmail.com wrote: Thanks Arun and David. Another issue I am running into are memory issues when one of the data frames I'm trying to rbind to or merge with are very large. (This is a repetitive problem, as I am trying to merge/rbind thousands of small dataframes into a single very large dataframe.) I'm thinking of creating a function that creates an empty dataframe to which I can add data, but will need to first determine and ensure that each dataframe has the exact same columns, in the exact same location. Before I write any new code, is there any pre-existing functions or code that might solve this problem of merging small or medium sized dataframes with a very large dataframe.) Consider plyr. Memory issues can be a problem, but it's a piece of cake to write a one liner that iterates over a list of data frames and returns them all rbind'd together. Or just: do.call(rbind, list.of.data.frames). If memory is a serious problem then I think it's best to write your own code that appends each row by index - which avoids copying entire data frames in memory. On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson dcarl...@tamu.edu wrote: Clumsy but it doesn't require any packages: merge2 - function(x, y) { if(all(union(names(x), names(y)) == intersect(names(x), names(y{ rbind(x, y) } else merge(x, y, all=TRUE) } merge2(df1, df2) df3 - df1 merge2(df1, df3) -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of arun Sent: Tuesday, February 26, 2013 1:14 PM To: Anika Masters Cc: R help Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.? Hi, You could also try: library(gtools) smartbind(df2,df1) # a b d #1 7 99 12 #2 7 99 12 When df1!=df2 smartbind(df1,df2) # a b d x y c #1 7 99 12 NA NA NA #2 NA 34 88 12 44 56 A.K. - Original Message - From: Anika Masters anika.mast...@gmail.com To: r-help@r-project.org Cc: Sent: Tuesday, February 26, 2013 1:55 PM Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.? #I want to merge or join 2 dataframes (df1 df2) into a 3rd (mydf). I want the 3rd dataframe to contain 1 row for each row in df1 df2, and all the columns in both df1 df2. The solution should work even if the 2 dataframes are identical, and even if the 2 dataframes do not have the same column names. The rbind.fill function seems to work. For learning purposes, are there other good ways to solve this problem, using merge or other functions other than rbind.fill? #e.g. These 3 examples all seem to work correctly and as I hoped: df1 - data.frame(matrix(data=c(7, 99, 12) , nrow=1 , dimnames = list( NULL , c('a' , 'b' , 'd') ) ) ) df2 - data.frame(matrix(data=c(88, 34, 12, 44, 56) , nrow=1 , dimnames = list( NULL , c('d' , 'b' , 'x' , 'y', 'c') ) ) ) mydf - merge(df2, df1, all.y=T, all.x=T) mydf #e.g. this works: library(reshape) mydf - rbind.fill(df1, df2) mydf #This works: library(reshape) mydf - rbind.fill(df1, df2) mydf #But this does not (the 2 dataframes are identical) df1 - data.frame(matrix(data=c(7, 99, 12) , nrow=1 , dimnames = list( NULL , c('a' , 'b' , 'd') ) ) ) df2 - df1 mydf - merge(df2, df1, all.y=T, all.x=T) mydf #Any way to get mere to work for this final example? Any other good solutions? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] Plot a Matrix as an Image with ggplot
Hi: See if the following works for you: library(reshape2) library(ggplot2) tdm - melt(testData) ggplot(tdm, aes(x = Var2, y = Var1, fill = factor(value))) + labs(x = MHz, y = Threshold, fill = Value) + geom_raster() + scale_fill_manual(breaks = levels(factor(tdm$value)), values = c(white, black)) + theme(plot.background = element_rect(fill = grey90), legend.background = element_rect(fill = grey90)) + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) It was necessary to create a different plot background color because you wanted to remove the padding, which blended much of the plot boundary with the original white background, and had to do something similar to the legend background so that you could see the white legend box. The expand = c(0, 0) argument to the two scale functions answers your second question, and converting value from numeric to vector answers the first. Dennis On Fri, Feb 15, 2013 at 3:24 PM, Alaios ala...@yahoo.com wrote: You were right, the following two work testData-matrix(data=round(runif(25)),nrow=5,ncol=5,dimnames=list(Var1=c(1:5),Var2=c(1:5))) ggplot(melt(testData), aes(Var2,Var1, fill=value))+xlab(MHz) + ylab(Threshold) +geom_raster() + scale_fill_gradient(low=#FF, high=#00) still there are two minor unresolved issues. a. How to reduce the extra space in the plot between legend and the tiles? b. How to make color bar depict only the two values of the game, 0 and 1 , instead of 0, 0.25, 0.50, 0.75, 1? I would like to thank you in advanec for your help Regards Alex - Original Message - From: Dennis Murphy djmu...@gmail.com To: Alaios ala...@yahoo.com Cc: Sent: Friday, February 15, 2013 10:19 AM Subject: Re: [R] Plot a Matrix as an Image with ggplot Your aesthetic is fill, not color. Change scale_color_gradient to scale_fill_gradient and you'll get what you expect. D. On Thu, Feb 14, 2013 at 11:31 PM, Alaios ala...@yahoo.com wrote: Hi, thanks I changed slightly the code to be reproducible from everyone . I have tried ggplot but I need a bit of help to tweak it a bit So you can run the following in your computer testData-matrix(data=round(runif(25)),nrow=5,ncol=5,dimnames=list(1:5,1:5)) p-ggplot(melt(testData), aes(Var2,Var1, fill=value))+xlab(MHz) + ylab(Threshold) +geom_raster() p What I want to improve is a: make the colorbar so only two specific colors lets say black and white and only two values 0 and 1 and somewhere the string as title of the color bar text to appear. I have tried something like p+ scale_color_gradient(low=red,high=blue) Fehler in unit(tic_pos.c, mm) : 'x' and 'units' must have length 0 ggplot(melt(testData), aes(Var2,Var1, fill=value,colorbin=2))+xlab(MHz) + ylab(Threshold) +geom_raster() but this did not affect colorbar entries. b. reduce/remove the grayish border that appears between the legend and the image plot Could you please help me with these two? Regards Alex From: John Kane jrkrid...@inbox.com To: Alaios ala...@yahoo.com; R help r-help@r-project.org Sent: Thursday, February 14, 2013 4:35 PM Subject: RE: [R] Plot a Matrix as an Image with ggplot The R-help list is rather picky about what attached. None of your attachments arrived. The str() info is useful but please supply some sample data The easiest way to supply data is to use the dput() function. Example with your file named testfile: dput(testfile) Then copy the output and paste into your email. For large data sets, you can just supply a representative sample. Usually, dput(head(testfile, 100)) will be sufficient. How are you writing the code/or what format is the original email. Your code in the body of the text is badly messed up -- you probably need to post only in text. HTML etc is automatically dropped. John Kane Kingston ON Canada -Original Message- From: ala...@yahoo.com Sent: Thu, 14 Feb 2013 07:15:05 -0800 (PST) To: r-help@r-project.org Subject: [R] Plot a Matrix as an Image with ggplot Dear all, I am trying to plot a matrix I have as an image str(matrixToPlot) num [1:21, 1:66] 0 0 0 0 0 0 0 0 0 0 . that contains only 0s and 1s, where the xlabel will be Labeled as str(xLabel) num [1:66] 1e+09 1e+09 1e+09 1e+09 1e+09 ... and the yLabels will be labeled as str(yLabel) num [1:21] -88 -87 -86 -85 -84 -83 -82 -81 -80 -79 ... I have found on the internet that I can do something like that with ggplot2. For example you can run the following library(reshape2)library(ggplot2)m =matrix(rnorm(20),5)ggplot(melt(m),aes(Var1,Var2,fill=value))+geom_raster() What I see missing here is to get my matrix and transform it to a data frame with labels for x and y axis so as ggplot can print the values nice. If you get the idea this matrix will be printed as a two tile
Re: [R] Why replacement has length zero? And How can I fix it?
Hi: No offense, but this is code is inefficient on several levels. Firstly, a loop is unnecessary (see below); secondly, ifelse() is a vectorized function and you're attempting to apply it to each row of DataSet. Try this instead, given dput(DataSet) structure(list(Age = c(-0.552450789929175, -2.06988485889524, -1.1290497440272, -0.268777167923272, -0.012748181465016, 0.554761979685086, 0.179401201802416, 0.0973216900180107, -0.914782595075178, -1.53708818506571 ), Gender = c(-0.0705367578252014, 0.969426004829301, 0.0556317051882303, 1.33716753655214, -0.638382962845772, 0.480939398350137, -0.755053441639217, 1.25393329162734, 1.45933385954704, 0.672871276697159), SES = c(-1.88102131807556, 0.529181612783923, 0.271835979074789, -0.656415898217512, -0.517622943602359, 0.639094817790355, -1.08205962796926, -0.193116856300625, 0.707374091321319, 1.98180909069287), ISS = c(-1.74208565286036, -1.235721877648, 0.459828219284939, -0.87781916182417, -0.429262433047679, 0.379335629940134, 0.949088546564315, 0.689758637278236, -1.14832161185134, 1.40363651773545 ), IQTe = c(-0.266523341317252, -0.32477349599139, -0.75723700635683, 0.74556355929622, -0.238794158234391, 0.0906173696194765, 0.331200954821995, -0.373009866084321, 0.417237160703897, 0.755603172535694), PreTe = c(-0.743400287678305, -1.1501727014007, -0.668278397501659, -0.0766957410591086, -0.232360222985187, 0.320050787942651, 0.474114337341715, -0.0181639478062117, -0.0905594374421625, 0.517614989060291)), .Names = c(Age, Gender, SES, ISS, IQTe, PreTe), row.names = c(NA, -10L), class = data.frame) u - runif(10) u [1] 0.72 0.57 0.71 0.64 0.38 0.53 0.14 0.14 0.82 0.12 # Remember two things: # (i) ifelse() is a vectorized function; # (ii) logical statements return either TRUE (1) or FALSE (0). # As a result, # less efficient: IAP0 - ifelse(DataSet$SES 0, ifelse(u 0.75, 1, 0), ifelse(u = 0.25, 1, 0)) # more efficient: IAP1 - ifelse(DataSet$SES 0, u 0.75, u = 0.25) * 1 IAP1 [1] 0 0 0 0 0 0 1 1 1 0 identical(IAP0, IAP1) [1] TRUE # Verification: cbind(DataSet$SES, u, IAP1) This result conforms to what one would expect with u as the realized uniform random vector. Moreover, the code is transparent: if SES 0, then apply the test u 0.75, else apply the test u = 0.25). The post-multiplication by 1 ensures that the result is numeric rather than logical. Dennis On Sat, Feb 2, 2013 at 12:57 PM, soon yi soon...@ymail.com wrote: Hi for the loop section runif needs curved brackets Try IAP -NA for (i in 1:Sample.Size){ if (DataSet$SES[i]0) { IAP[i] - ifelse(runif(1)0.75, 1, 0) # High SES, higher chance to be in Treatment # } else { IAP[i] - ifelse(runif(1)=0.25, 1, 0) # Low SES, lower chance to be in Treatment # } } # End loop # IAP IAP zjiaqi19880219 wrote Hi, all, I am working on a project to run a simulation. I am now working on the main body of the simulation body, I generate the data and all work well before I identify a variable IAP, there is always an error message as replacement has length zero, and I do not know how to fix it. Can anyone help? Here is the code: library (mvtnorm) options(digits=2) # Set variable name # namelist1 - c(Age, Gender, SES, ISS, IQTe, PreTe) # Generate Data # Sample.Size - 10 # Variable Sample Size # CorMat - matrix(c(1.0 ,0.0 ,0.0 , 0.5, 0.2, 0.5, 0.0, 1.0 ,0.0, 0.0, 0.0, 0.0, 0.0 ,0.0 ,1.0, 0.5, 0.4, 0.5, 0.5, 0.0, 0.5, 1.0, 0.4, 0.7, 0.2, 0.0, 0.4, 0.4, 1.0, 0.9, 0.5, 0.0, 0.5, 0.7, 0.9, 1.0),ncol=6) # Correlation Matrix # DataSet - rmvnorm(Sample.Size, mean=c(0,0,0,0,0,0), sigma=CorMat) # Draw Sample.Size covariate set all mean 0 # DataSet - as.data.frame(DataSet) names(DataSet) - namelist1 DataSet # For Treatment IAP # IAP - matrix(0, nrow=Sample.Size) for (i in 1:Sample.Size){ IAP[i] - if (DataSet$SES0) { ifelse(runif[1]0.75, 1, 0) # High SES, higher chance to be in Treatment # } IAP[i] - if (DataSet$SES=0) { ifelse(runif[1]=0.25, 1, 0) # Low SES, lower chance to be in Treatment # } } # End loop # IAP Error in IAP[i] - if (DataSet$SES 0) { : replacement has length zero In addition: Warning message: In if (DataSet$SES 0) { : the condition has length 1 and only the first element will be used Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Why-replacement-has-length-zero-And-How-can-I-fix-it-tp4657373p4657380.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] facet plot
Hi: Try DF - structure(list(Gene = structure(1:5, .Label = c(NM_001001130, NM_001001144, NM_001001152, NM_001001160, NM_001001176 ), class = factor), T0h = c(68L, 0L, 79L, 1L, 0L), T0.25h = c(95L, 1L, 129L, 1L, 0L), T0.5h = c(56L, 4L, 52L, 2L, 0L), T1h = c(43L, 0L, 50L, 0L, 0L), T2h = c(66L, 1L, 24L, 0L, 0L), T3h = c(62L, 1L, 45L, 1L, 0L), T6h = c(68L, 1L, 130L, 0L, 0L), T12h = c(90L, 4L, 154L, 0L, 0L), T24h = c(63L, 1L, 112L, 0L, 0L), T48h = c(89L, 2L, 147L, 1L, 0L)), .Names = c(Gene, T0h, T0.25h, T0.5h, T1h, T2h, T3h, T6h, T12h, T24h, T48h), class = data.frame, row.names = c(1, 2, 3, 4, 5)) library(reshape2) library(ggplot2) # stack the time-related variables by Gene Dfm - melt(DF, id = Gene) # extract the numeric time values Dfm$time - as.numeric(gsub([a-zA-Z], , as.character(Dfm$variable))) ggplot(Dfm, aes(x = time, y = value)) + geom_point() + facet_wrap(~ Gene) Dennis On Thu, Jan 31, 2013 at 2:37 AM, Jose Iparraguirre jose.iparragui...@ageuk.org.uk wrote: Hi Robin, You didn't provide the list with a clear structure of your data, but I hope I understood it properly. If not, it'd be easy for you to adapt what follows. I guess your data looks like this: Gene T0h T0.25h T0.5h T1h T2h T3h T6h T12h T24h T48h 1 NM_001001130 68 9556 43 66 62 68 90 63 89 2 NM_0010011440 1 40111412 3 NM_001001152 7912952 50 24 45 130 154 112 147 4 NM_0010011601 1 20010001 5 NM_0010011760 0 00000000 ... If this is the case, first reshape your data (say, gene.df) to a long format: Gene Time 1 NM_001001130 68 2 NM_0010011440 3 NM_001001152 79 4 NM_0010011601 5 NM_0010011760 6 NM_0010011771 7 NM_0010011780 8 NM_0010011790 9 NM_001001182 4691 ... Then, you create the ggplot: sp - ggplot(gene.df, aes(x=Gene, y=Time)) + geom_point(shape=1) And use the function facet_wrap(): sp + facet_wrap( ~ Gene,ncol=3) You can set a different number of columns and a different shape for the markers, of course. Hope this helps. Regards, José José Iparraguirre Chief Economist Age UK -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robin Mjelle Sent: 31 January 2013 09:06 To: r-help@r-project.org Subject: [R] facet plot Hi all, I want to plot a facet plot with column names as x and column values as y. One plot for each row. here is part of my dataset: Gene T0h T0.25h T0.5h T1h T2h T3h T6h T12h T24h T48h NM_001001130 68 95 56 43 66 62 68 90 63 89 NM_001001144 0 1 4 0 1 1 1 4 1 2 NM_001001152 79 129 52 50 24 45 130 154 112 147 NM_001001160 1 1 2 0 0 1 0 0 0 1 NM_001001176 0 0 0 0 0 0 0 0 0 0 NM_001001177 1 3 2 3 0 1 1 0 2 0 NM_001001178 0 0 0 0 0 0 0 0 0 0 NM_001001179 0 0 0 0 0 0 0 0 0 0 NM_001001180 1 0 0 1 0 0 0 0 0 0 NM_001001181 350 539 424 470 441 451 554 553 419 548 NM_001001182 4691 6458 5686 6761 5944 4486 6030 5883 6009 7552 NM_001001183 22 23 12 21 25 17 34 40 33 32 NM_001001184 138 147 111 100 54 61 61 77 57 99 NM_001001185 2 3 4 3 4 5 9 2 4 4 NM_001001186 231 337 187 168 148 178 275 319 181 279 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Wrap Up and Run 10k is back! Also, new for 2013 – 2km intergenerational walks at selected venues. So recruit a buddy, dust off the trainers and beat the winter blues by signing up now: http://www.ageuk.org.uk/10k Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | Harewood House, Leeds | Tatton Park, Cheshire | Southampton | Coventry Age UK Improving later life http://www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual
Re: [R] How to select a subset data to do a barplot in ggplot2
Hi: The simplest way to do it is to modify the input data frame by taking out the records not having status live or dead and then redefining the factor in the new data frame to get rid of the removed levels. Calling your input data frame DF rather than data, DF - structure(list(FID = c(1L, 1L, 2L, 2L, 2L, 2L, 6L, 6L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 17L, 17L, 17L), IID = c(4621L, 4628L, 4631L, 4632L, 4633L, 4634L, 4675L, 4679L, 4716L, 4719L, 4721L, 4726L, 4728L, 4730L, 4732L, 4783L, 4783L, 4784L), STATUS = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c(dead, live, nosperm), class = factor)), .Names = c(FID, IID, STATUS), class = data.frame, row.names = c(NA, -18L )) # The right hand side above came from dput(DF), where DF was created by # DF - read.table(textConnection(your posted data), header = TRUE) # Consider using dput() to represent your data in the future. # Retain the records with status live or dead only DF2 - DF[DF$STATUS %in% c(live, dead), ] # This does not get rid of the original levels... levels(DF2$STATUS) # ...so redefine the factor DF2$STATUS - factor(DF2$STATUS) str(DF2) 'data.frame': 16 obs. of 3 variables: $ FID : int 1 1 2 2 2 2 6 6 10 10 ... $ IID : int 4621 4628 4631 4632 4633 4634 4675 4679 4716 4719 ... $ STATUS: Factor w/ 2 levels dead,live: 2 1 2 2 2 2 2 1 1 2 ... # now plot: # (1) FID numeric ggplot(DF2, aes(x = FID, fill = STATUS)) + geom_bar() # (2) FID factor ggplot(DF2, aes(x = factor(FID), fill = STATUS)) + geom_bar() The second one makes more sense to me, but you may have reasons to prefer the first. Dennis On Thu, Dec 13, 2012 at 4:38 AM, Yao He yao.h.1...@gmail.com wrote: FID IID STATUS 14621live 14628dead 24631live 24632live 24633live 24634live 64675live 64679dead 104716dead 104719live 104721dead 114726live 114728nosperm 114730nosperm 124732live 174783live 174783live 174784live __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate() runs out of memory
Hi: This should give you some idea of what Steve is talking about: library(data.table) dt - data.table(x = sample(10, 1000, replace = TRUE), y = rnorm(1000), key = x) dt[, .N, by = x] system.time(dt[, .N, by = x]) ...on my system, dual core 8Gb RAM running Win7 64-bit, system.time(dt[, .N, by = x]) user system elapsed 0.120.020.14 .N is an optimized function to find the number of rows of each data subset. Much faster than aggregate(). It might take a little longer because you have more columns that suck up space, but you get the idea. It's also about 5-6 times faster if you set a key variable in the data table than if you don't. Dennis On Fri, Sep 14, 2012 at 12:26 PM, Sam Steingold s...@gnu.org wrote: I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns). I want to get the result of table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is 24.3G, and no end in sight. both V1 and V2 are characters (not factors). Is there anything I could do to speed this up? Thanks. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/ http://dhimmi.com http://think-israel.org http://iris.org.il WinWord 6.0 UNinstall: Not enough disk space to uninstall WinWord __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill binary matrices with random 1s
Hi: Here's one approach. I assume that your first 1000 matrices have a single 1 in each matrix, the next set of 1000 have two 1's, ..., and the last one has 99 1's. (No point in doing all 1's since they're all constant.) If that's the case, then try the following. # Each row represents a different 'density' of 1's # upper triangle of m is 0 m - matrix(0, 100, 100) m[lower.tri(m)] - 1 diag(m) - 1 m - m[-100, ] # remove row of all 1's # Functions to operate on a single matrix # Function to permute a vector of 0's and 1's # and reshape it into a 10 x 10 matrix randomMatrix - function(x) matrix(sample(x), 10, 10) # Generate a 10 x 10 x 1000 array marray - function(x) replicate(1000, randomMatrix(x)) # Create a vector of names to which to assign the results # of simulating from each row of m: arraynames - paste('array', 1:99, sep = '') # apply the marray() function to each row of m and assign # to the corresponding index of arraynames for(i in seq_along(arraynames)) assign(arraynames[i], marray(m[i, ])) HTH, Dennis On Tue, Nov 29, 2011 at 4:32 AM, Grant McDonald grantforacco...@hotmail.co.uk wrote: Dear all, I am finding difficulty in the following, I would like to create an empty matrix e.g. 10x10 of 0s and sequentially fill this matrix with randomly placed a 1s until it is saturated. Producing 100 matrices of sequentially increasing density., This process needs to be randomized 1000 times., I assume i should run this along the following lines, 1) Create 1000 matrices all zeros, 2) add a random 1 to all matrices, 3) run function on all 1000 matrices and output results to a vector table (i.e. calculate density of matric at each step for all 100 matrices. )., 4) add another 1 to the previous 1000 matrices in a random position., repeat till all matrices saturated., I have looked through histories on random fill algorithms but all packages I can find nothing as simple as the random fill I am looking for., sorry for bothering, Thank you for any help in advance. Something that starts along the lines of the following? Sorry this example is atrocious. matrixfill - function(emptymatrix, K=fullmatrix, time=100, from=0, to=time) { N - numeric(time+1) N[1] - emptymatrix for (i in 1:time) N[i+1] - N[i]+place random 1 in a random xy position until K. Calculate Density of matrix [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed in reproducing a plot
Hi: This looks like the one: library('lattice') data(Oats, package = 'MEMSS') print(xyplot(yield ~ nitro | Block, Oats, groups = Variety, type = c(g, b), auto.key = list(lines = TRUE, space = 'top', columns = 3), xlab = Nitrogen concentration (cwt/acre), ylab = Yield (bushels/acre), aspect = 'xy')) Look at the source code of the implementation document in the same place that you found the pdf. I found it from the html help page for the lme4 package, but there are other places you could find it.) The code for the plots in the document are contained therein. Dennis On Tue, Nov 29, 2011 at 7:43 AM, syrvn ment...@gmx.net wrote: Hello, can anybody tell me how to produce a plot like the one in http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf on page 13, Figure 6? The data is stored in: library(nlme) data(Oats) Cheers -- View this message in context: http://r.789695.n4.nabble.com/Help-needed-in-reproducing-a-plot-tp4119603p4119603.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumsum in 3d arrays
Hi: Could you supply a small reproducible example with the output that you expect? For example, what output would you expect from the following: a - array(1:24, c(2, 2, 3)) ? Dennis On Mon, Nov 28, 2011 at 12:32 AM, zloncaric zlonca...@biologija.unios.hr wrote: Thank you for your time and help. I'm quite aware that this problem seems as basic stuff, and of course I've read several R and matlab manuals, and also consulted the D. Hiebeler, Matlab / R Reference and several others, but none of the suggested solutions seems to give me the right result. I've tried the combination of apply and cumsum but the result is wrong. -- View this message in context: http://r.789695.n4.nabble.com/cumsum-in-3d-arrays-tp4110470p4114510.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify source code
Hi: Here's one way you could do it. I manufactured some fake data with a simple model to illustrate. This assumes you are using the same model formula with the same starting values and remaining arguments for each response. dg - data.frame(x = 1:10, y1 = sort(abs(rnorm(10))), y2 = sort(abs(rnorm(10))), y3 = sort(abs(rnorm(10 # Model: y = b0 + b1 exp(x/theta) vars - c('y1', 'y2', 'y3') # Function to create the model formula by plugging in the # response y and run the model mfun - function(y) { form - as.formula(paste(y, 'cbind(1, exp(x/th))', sep = ' ~ ')) nls(form, data = dg, start = list(th = 0.3), algorithm = 'plinear') } # Generate a list of model objects: mlist - lapply(vars, mfun) # To see what they contain: str(mlist[[1]]) str(summary(mlist[[1]])) # Extract a few features from each: # The first two return matrices, the third returns a list do.call(rbind, lapply(mlist, function(m) coef(m))) do.call(rbind, lapply(mlist, function(m) deviance(m))) lapply(mlist, function(m) summary(m)$cov.unscaled) To get more control over the output format, the plyr package can come in handy. For example, to get data frames for the first two extractions above, one would do library('plyr') ldply(mlist, function(m) coef(m)) ldply(mlist, function(m) deviance(m)) # ldply() means list input, data frame output (ld). # For the third extraction, one has a list input and a list output: llply(mlist, function(m) summary(m)$cov.unscaled) HTH, Dennis On Sat, Nov 26, 2011 at 2:30 PM, Christof Kluß ckl...@email.uni-kiel.de wrote: Hi I would like to shorten mod1 - nls(ColName2 ~ ColName1, data = table, ...) mod2 - nls(ColName3 ~ ColName1, data = table, ...) mod3 - nls(ColName4 ~ ColName1, data = table, ...) ... is there something like cols = c(ColName2,ColName3,ColName4,...) for i in ... mod[i-1] - nls(ColName[i] ~ ColName1, data = table, ...) I am looking forward to help Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table merge equivalent for all.x
Hi: There may well be a more efficient way to do this, but here's one take. library('data.table') # Want to merge by D in the end, so set D as part of the key: t1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID, D) t2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2, D) # The J expression produces sums of B (the non-key variable) for each D group # .SD denotes 'sub-data'. The result 'junk' is a data table. junk - t2[, lapply(.SD, sum), by = D] tables() # junk has no key # set a key for junk so that it can be merged setkey(junk, 'D') # t1 and junk have a common key variable D, so the left join is merge(t1, junk, by = 'D', all.x = TRUE) # check against t1 junk HTH, Dennis On Sat, Nov 26, 2011 at 3:59 PM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Dear all, I'm trying to use data.table to summarise a table and merge it to another table. Here is what I would like to do, but by using data.table() in a proper way. library(data.table) tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID) tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2) junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum) merge(tab1, junk, by = D, all.x = TRUE) This my attempt using data.table() junk - tab2[, mean(B), by = D] tab1[junk] Best regards, Thierry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Copula Fitting Using R
Hi: This is the type of question for which the sos package can come to the rescue: library('sos') findFn('Gumbel Clayton copula') It appears that the QRMlib package would be a reasonable place to start. Dennis On Thu, Nov 24, 2011 at 7:29 PM, cahaya iman qaisfay...@gmail.com wrote: Hi, Is anybody using Copula package for fitting copulas to own data? I have two marginals Log Normal with (parameters 1.17 and 0.76) and Gamma ( 2.7 and 1.05) Which package I should use to fit Gumbel and Clayton Copulas? Thanks, fayyad [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping and paste
Hi: There are two good reasons why the loop solution is not efficient in this (and related) problem(s): (i) There is more code and less transparency; (ii) the vectorized solution is four times faster. Here are the two proposed functions: # Vectorized version m1 - function(v) paste(v, ' to ', v + 50, ' mN', sep = '') # Loop version: m2 - function(v) { out - rep(NA, length(v)) for(i in seq_along(v)) out[i] - paste(v[i], ' to ', v[i] + 50, ' mN', sep = '') out } BndY - seq(from = 18900, to = 19700, by = 50) identical(m1(BndY), m2(BndY)) [1] TRUE # Put them to the test: system.time(replicate(1, m1(BndY))) user system elapsed 0.670.000.67 system.time(replicate(1, m2(BndY))) user system elapsed 2.670.002.67 The vectorized version is four times faster and produces the same output as the loop version. Experiments with a longer test vector (501 elements) maintained the timing ratio. Dennis On Wed, Nov 23, 2011 at 7:00 PM, markm0705 markm0...@gmail.com wrote: Thank you On Thu, Nov 24, 2011 at 7:31 AM, B77S [via R] ml-node+s789695n4102066...@n4.nabble.com wrote: out - vector(list) Ylab - for(i in 1:length(BndY)) { out[i] - paste(BndY[i], to ,BndY[i],mN) } Ylab - do.call(c, out) markm0705 wrote Dear R helpers I'm trying to make up some labels for plot from this vector BndY-seq(from = 18900,to= 19700, by = 50) using Ylab-for(i in BndY) {c((paste(i, to ,i+50,mN)))} but the vector created is NULL However if i use for(i in BndY) {print(c(paste(i, to ,i+50,mN)))} I can see the for loop is making the labels I'm looking for but not sure on my error in assigning them to a vector Thanks in advance -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Looping-and-paste-tp4101892p4102066.html To unsubscribe from Looping and paste, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4101892code=bWFya20wNzA1QGdtYWlsLmNvbXw0MTAxODkyfDExNDQyODMxMDM= . NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Looping-and-paste-tp4101892p4102553.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AlgDesign - $D $A $Ge $Dea
See ?AlgDesign::optFederov and look for the 'Value' section, where these elements of the output list are defined. Dennis On Wed, Nov 23, 2011 at 7:15 PM, 蓁蓁 李 tszhenzh...@hotmail.com wrote: Hi, I am wondering how I should interpreate the output of optFederov() in AlgDesign. Specially I want to know what is $D, $A, $Ge and $Dea, which one I can use as an efficiency to say how good the optimal design is. I only know when a orthogonal design comes, $D = 1. I red the pdf document -- vignette(AlgDesign) [Just type: vignette(AlgDesign) in R, you will get this pdf document] On page 20 section 4.2.1.3, it says: the D and G efficiencies of the following ... are 98% and 89%. I tried the code as shown on page 20 a few times and output showed different designs but the figure of $D, $A, $Ge and $Dea are similar. I don't see $D is close to 0.98, but $Ge is 0.89. One of the output as below: dat = gen.factorial(3, 3, center = T, varNames = c(A, B, C)) desC = optFederov(~quad(A,B,C), dat, nT = 14, evaluateI = T, nR = 100) desC $D [1] 0.4630447 $A [1] 3.22 $I [1] 9.945833 $Ge [1] 0.893 $Dea [1] 0.887 $design A B C 1 -1 -1 -1 3 1 -1 -1 5 0 0 -1 7 -1 1 -1 9 1 1 -1 11 0 -1 0 13 -1 0 0 15 1 0 0 17 0 1 0 19 -1 -1 1 21 1 -1 1 23 0 0 1 25 -1 1 1 27 1 1 1 $rows [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Thanks very much! J [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe indexing by number of cases per group
A very similar question was asked a couple of days ago - see the thread titled Removing rows in dataframe w'o duplicated values - in particular, the responses by Dimitris Rizopoulos and David Winsemius. The adaptation to this problem is df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length) 4, ] groupx 1 A 3.903747 2 A 3.599547 3 A 2.449991 4 A 2.740639 5 A 4.268988 6 B 8.649600 7 B 5.493841 8 B 1.892154 9 B 6.781754 10 B 1.459250 11 B 6.749522 HTH, Dennis On Thu, Nov 24, 2011 at 4:02 AM, Johannes Radinger jradin...@gmx.at wrote: Hello, assume we have following dataframe: group -c(rep(A,5),rep(B,6),rep(C,4)) x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15)) df - data.frame(group,x) Now I want to select all cases (rows) for those groups which have more or equal 5 cases (so I want to select all cases of group A and B). How can I use the indexing for such questions? df[??]... I think it is probably quite easy but I really don't know how to do that at the moment. maybe someone can help me... /johannes -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Thank you
Well said. +1 Dennis On Thu, Nov 24, 2011 at 6:43 AM, Bert Gunter gunter.ber...@gene.com wrote: ... and while I am at it, as this is the U.S. Thanksgiving... My sincere thanks to the many R developers and documenters who contribute large amounts of their personal time and effort to developing, improving, and enhancing the accessibility of R for data analysis and science. I believe it is fair to say that R has had as much or more impact than Gosset's Student's T, and I fear that academics who do much of this work do not receive the professional recognition they deserve. I continue to be amazed and humbled by their high quality and consummate professionalismism -- I could not live without R. Kind regards and best wishes to all, -- Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lines and points in xyplot()
Hi: Try this: library('lattice') xyplot(y ~ x, type = c('g', 'p'), panel = function(x, y, ...){ panel.xyplot(x, y, ...) panel.lines(x, predict(fm), col = 'black', lwd = 2) } ) HTH, Dennis On Wed, Nov 23, 2011 at 9:18 AM, Doran, Harold hdo...@air.org wrote: Given the following data, I want a scatterplot with the data points and the predictions from the regression. Sigma - matrix(c(1,.6,1,.6), 2) mu - c(0,0) dat - mvrnorm(5000, mu, Sigma) x - dat[,1] * 50 + 200 y - dat[,2] * 50 + 200 fm - lm(y ~ x) ### This gives the regression line, but not the data xyplot(y ~ x, type = c('g', 'p'), panel = function(x, y){ panel.lines(x, predict(fm)) } ) ### This gives both data but as point xyplot(y + predict(fm) ~ x, type = c('g', 'p'), ) I know I can add an abline easily, but my problem is a bit more complex and the code above is just an example. What is the best way for the predicted data to form a solid line and let the data points remain as points Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zeros to NA's - faster
Matrix multiplication, maybe? all_data %*% iu a b c [1,] 1 4 0 [2,] 2 5 0 [3,] 3 6 0 I have no idea if this is a general solution or not, but it works in this case. If you need something else, perhaps a more realistic example would help. Dennis On Wed, Nov 23, 2011 at 11:41 AM, Ben quant ccqu...@gmail.com wrote: Hello, Is there a faster way to do this? Basically, I'd like to NA all values in all_data if there are no 1's in the same column of the other matrix, iu. Put another way, I want to replace values in the all_data columns if values in the same column in iu are all 0. This is pretty slow for me, but works: all_data = matrix(c(1:9),3,3) colnames(all_data) = c('a','b','c') all_data a b c [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 iu = matrix(c(1,0,0,0,1,0,0,0,0),3,3) colnames(iu) = c('a','b','c') iu a b c [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 0 fun = function(x,d){ vals = d[,x] i = iu[,x] if(!any(i==1)){ vals = rep(NA,times=length(vals)) }else{ vals } vals } all_data = sapply(colnames(iu),fun,all_data) all_data a b c [1,] 1 4 NA [2,] 2 5 NA [3,] 3 6 NA ...again, this work, but is slow for a large number of columns. Have anything faster? Thanks, ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing rows in dataframe w'o duplicated values
Hi: Here's one way: do.call(rbind, lapply(L, function(d) if(nrow(d) 1) return(d))) id value value2 1.1 1 5 1 1.2 1 6 4 1.3 1 7 3 3.5 3 5 4 3.6 3 4 3 HTH, Dennis On Tue, Nov 22, 2011 at 9:43 AM, AC Del Re de...@wisc.edu wrote: Hi, Is there an easy way to remove dataframe rows without duplicated values of a specified column ('id')? e.g., dat - data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 = c(1,4,3,3,4,3)) dat id value value2 1 1 5 1 2 1 6 4 3 1 7 3 4 2 4 3 5 3 5 4 6 3 4 3 This is sample data and the real data has hundreds of rows. In this case, only row 4 does not have a duplicated id and I would like to remove it without using: dat$id[4] - NULL Any help is appreciated! AC [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing rows in dataframe w'o duplicated values
Sorry, you need this first: L - split(dat, dat$id) do.call(rbind, lapply(L, function(d) if(nrow(d) 1) return(d))) D. On Tue, Nov 22, 2011 at 10:38 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Here's one way: do.call(rbind, lapply(L, function(d) if(nrow(d) 1) return(d))) id value value2 1.1 1 5 1 1.2 1 6 4 1.3 1 7 3 3.5 3 5 4 3.6 3 4 3 HTH, Dennis On Tue, Nov 22, 2011 at 9:43 AM, AC Del Re de...@wisc.edu wrote: Hi, Is there an easy way to remove dataframe rows without duplicated values of a specified column ('id')? e.g., dat - data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 = c(1,4,3,3,4,3)) dat id value value2 1 1 5 1 2 1 6 4 3 1 7 3 4 2 4 3 5 3 5 4 6 3 4 3 This is sample data and the real data has hundreds of rows. In this case, only row 4 does not have a duplicated id and I would like to remove it without using: dat$id[4] - NULL Any help is appreciated! AC [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R ignores number only with a nine under 10000
Hi: Strictly a guess, but the following might be helpful. The call below assumes that the referent data frame is test1, which consists of a single column named x. Modify as appropriate. test_lab - with(test1, cut(x, c(0, 18954, 37791, 56951, 75944, 84885, 113835), labels = c('I8456', 'ref', 'Cyprus1', 'KE3870', 'KE3873', 'KE3926OT'))) cut() creates a factor from a numeric variable. The second argument consists of the cut points and the third argument generates the labels to be associated with values falling between the cut points. See ?cut for more details, and pay attention to the options. The object test_lab is a vector external to test1; if you want it to be a column of test1, then add it to the data frame in one of the usual ways. HTH, Dennis On Mon, Nov 21, 2011 at 7:42 AM, set asta...@hotmail.com wrote: Hello R users, I'm trying to replace numerical values in a datamatrix with strings. R does this except for numbers under 1 starting with a 9 (eg 98, 970, 9504 etc). This is really weird and I wondered whether someone had encountered such a problem or knows the solution. I'm using the next script: test_1 - read.table(5+ref_15clusters3.csv, header = TRUE, sep = ,, colClasses = numeric) test_1[test_1 94885 test_1 = 113835] = KE3926OT test_1[test_1 != 0 test_1 = 18954] = I8456 test_1[test_1 75944 test_1 = 94885] = KE3873 test_1[test_1 56951 test_1 = 75944] = KE3870 test_1[test_1 37991 test_1 = 56951] = Cyprus1 test_1[test_1 18954 test_1 = 37991] = ref write.table(test_1, file = test_replace7.txt, quote = FALSE, sep=\t) Thanks, Set -- View this message in context: http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-1-tp4091936p4091936.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding two or more columns of a data frame for each row when NAs are present.
Hi: Does this work for you? yy Q20 Q21 Q22 Q23 Q24 1 0 1 2 3 4 2 1 NA 2 3 4 3 2 1 2 3 4 rowSums(yy, na.rm = TRUE) 1 2 3 10 10 12 # Use a subset of the variables in yy: selectVars - paste('Q', c(20, 21, 24), sep = '') rowSums(yy[, selectVars], na.rm = TRUE) 1 2 3 5 5 7 HTH, Dennis On Sun, Nov 20, 2011 at 12:38 PM, Ian Strang hamame...@ntlworld.com wrote: I am fairly new to R and would like help with the problem below. I am trying to sum and count several rows in the data frame yy below. All works well as in example 1. When I try to add the columns, with an NA in Q21, I get as NA as mySum. I would like NA to be treated as O, or igored. I wrote a function to try to count an NA element as 0, Example 3 function. It works with a few warnings, Example 4, but still gives NA instead of the addition when there is an NA in an element. In Example 6 7, I tried using sum() but it just sums the whole data frame, I think, How do I add together several columns giving the result for each row in mySum? NA should be treated as a 0. Please, note, I do not want to sum all the columns, as I think rowSums would do, just the selected ones. Thanks for your help. Ian, yy - read.table( header = T, sep=,, text = ## to create a data frame + Q20, Q21, Q22, Q23, Q24 + 0,1, 2,3,4 + 1,NA,2,3,4 + 2,1, 2,3,4) + yy Q20 Q21 Q22 Q23 Q24 1 0 1 2 3 4 2 1 NA 2 3 4 3 2 1 2 3 4 x - transform( yy, ## Example 1 + mySum = as.numeric(Q20) + as.numeric(Q22) + as.numeric(Q24), + myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24)) + ) + x Q20 Q21 Q22 Q23 Q24 mySum myCount 1 0 1 2 3 4 6 3 2 1 NA 2 3 4 7 2 3 2 1 2 3 4 8 3 + x - transform( yy, Example 2 + mySum = as.numeric(Q20) + as.numeric(Q21) + as.numeric(Q24), + myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24)) + ) + x Q20 Q21 Q22 Q23 Q24 mySum myCount 1 0 1 2 3 4 5 3 2 1 NA 2 3 4 NA 2 3 2 1 2 3 4 7 3 NifAvail - function(x) { if (is.na(x)) x-0 else x - x ### Example 3 + return(as.numeric(x)) + } #end function + NifAvail(5) [1] 5 + NifAvail(NA) [1] 0 x - transform( yy, + mySum = NifAvail(Q20) + NifAvail(Q22) + NifAvail(Q24), ### Example 4 + myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24)) + ) Warning messages: 1: In if (is.na(x)) x - 0 else x - x : the condition has length 1 and only the first element will be used 2: In if (is.na(x)) x - 0 else x - x : the condition has length 1 and only the first element will be used 3: In if (is.na(x)) x - 0 else x - x : the condition has length 1 and only the first element will be used x Q20 Q21 Q22 Q23 Q24 mySum myCount 1 0 1 2 3 4 6 3 2 1 NA 2 3 4 7 2 3 2 1 2 3 4 8 3 x - transform( yy, + mySum = NifAvail(Q20) + NifAvail(Q21) + NifAvail(Q24), Example 5 + myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24)) + ) Warning messages: 1: In if (is.na(x)) x - 0 else x - x : the condition has length 1 and only the first element will be used 2: In if (is.na(x)) x - 0 else x - x : the condition has length 1 and only the first element will be used 3: In if (is.na(x)) x - 0 else x - x : the condition has length 1 and only the first element will be used x Q20 Q21 Q22 Q23 Q24 mySum myCount 1 0 1 2 3 4 5 3 2 1 NA 2 3 4 NA 2 3 2 1 2 3 4 7 3 x - transform( yy, Example 6 + mySum = sum(as.numeric(Q20), as.numeric(Q21), as.numeric(Q23), na.rm=T), + myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24)) + ) + x Q20 Q21 Q22 Q23 Q24 mySum myCount 1 0 1 2 3 4 14 3 2 1 NA 2 3 4 14 2 3 2 1 2 3 4 14 3 x - transform( yy, # Example 7 + mySum = sum(as.numeric(Q20), as.numeric(Q22), as.numeric(Q23), na.rm=T), + myCount = as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24)) + ) + x Q20 Q21 Q22 Q23 Q24 mySum myCount 1 0 1 2 3 4 18 3 2 1 NA 2 3 4 18 2 3 2 1 2 3 4 18 3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] Continuasly Compunded Returns with quantmod-data
Hi: Start by writing a small utility function to compute the CCR (not to be confused with the rock band): ccr - function(x) 100 * (log(x[-1]) - log(x[-length(x)])) # Small test: x - round(rnorm(10, 5, 0.5), 2) ccr(x) # Works on vectors... # Load up the stock data: library(quantmod) getSymbols(GOOG,from=2011-11-01) GOOG1-GOOG[,1] # Try it directly: ccr(GOOG[, 1]) GOOG.Open 2011-11-02 0 2011-11-03 0 2011-11-04 0 2011-11-07 0 2011-11-08 0 2011-11-09 0 2011-11-10 0 2011-11-11 0 2011-11-14 0 2011-11-15 0 2011-11-16 0 2011-11-17 0 ## Hmm, no go. Try this instead: apply(GOOG1, 2, ccr) GOOG.Open 2011-11-02 0.82403900 2011-11-03 0.35839274 2011-11-04 1.10123942 2011-11-07 -0.03033316 2011-11-08 2.60843853 2011-11-09 -0.78136988 2011-11-10 0.27598990 2011-11-11 -0.76704898 2011-11-14 1.10809039 2011-11-15 0.78637365 2011-11-16 -0.11756255 2011-11-17 -0.33220719 2011-11-18 -1.32834757 If you look at the structure of GOOG1 [str(GOOG1)], you'll see that it is a one column matrix, so apply() is one way to go, especially if you want to run the function on multiple columns of a matrix. HTH, Dennis On Sun, Nov 20, 2011 at 3:52 PM, barb mainze...@hotmail.com wrote: Hey guys, i want to calculate the continuasly compounded returns for stock prices. Formula for CCR: R_t = ln(P_t/P_{t-1})*100 With R: First i have to modify the vectors, so that they have the same length and we start at the second observation. log(GOOG1[-1]/GOOG1[1:length(GOOG1)-1])*100 That does work with normal vectors. My Questions: 1) I want to use this for stock prices. so i use: library(quantmod) getSymbols(GOOG,from=2011-11-01) GOOG1-GOOG[,1] If i use my formula i get only the value 1 for every observation :( Thanks for your time and help! I appreciate it Regards Tonio -- View this message in context: http://r.789695.n4.nabble.com/Continuasly-Compunded-Returns-with-quantmod-data-tp4090014p4090014.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deleting multiple rows from a data matrix based on exp value
Without a reproducible example this is just a guess, but try Matrix[apply(Matrix, 1, function(x) any(x 1.11)), ] This will retain all rows of Matrix where at least one value in a row is above the threshold 1.11. If that doesn't do what you want, please provide a small reproducible example and a clearer statement of the desired output. It's not at all clear to me what 'exp. values' means - I can devise at least three meanings off the top of my head, none of which may conform to what you mean. HTH, Dennis On Sun, Nov 20, 2011 at 2:45 PM, Peter Davidsen pkdavid...@gmail.com wrote: Dear List, I have a data matrix that consists of ~4500 rows and 25 columns (i.e. an exprSet object that I converted via the 'exprs' function into a data matrix) Now I want to remove/delete the rows where all exp. values in that particular row are below or equal to a specific cut-off value (e.g 1.11) I have tried using several commands to address this issue: Matrix[rowSums(Matrix = 1.11) = 1.11, ] or Matrix[ !apply(Matrix=1.11,1,all), ] The above commands seem to work fine when I generate a small test matrix like: M - matrix(c(2,5,8,0.4,0.8,0.5,4,12,3), nrow=3, byrow=T) However, non of the two commands decrease the number of rows in my real matrix! Any help would be appreciated Kind regards, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply functions along layers of a data matrix
Hi: Here are two ways to do it; further solutions can be found in the doBy and data.table packages, among others. library('plyr') ddply(daf, .(id), colwise(mean, c('v1', 'v2', 'v3', 'v4'))) aggregate(cbind(v1, v2, v3, v4) ~ id, data = daf, FUN = mean) # Result of each: id v1 v2 v3 v4 1 1 6 21 36 51 2 2 7 22 37 52 3 3 8 23 38 53 4 4 9 24 39 54 5 5 10 25 40 55 Dennis On Fri, Nov 18, 2011 at 5:05 AM, saschav...@gmail.com wrote: Hello How can I apply functions along layers of a data matrix? Example: daf - data.frame( 'id' = rep(1:5, 3), matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), rep = rep(1:3, each=5) ) The data frame daf contains 3 repetitions/layers (rep) of 4 variables of 5 persons (id). For some reason, I want to calculate various statistics (e.g., mean, median) *along* the repetitions. The mean calculation, for example, would produce the means of daf[1, 'v1'] *along* the 3 repetition: (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3 That is to say, each of the calculations would result in a data frame with 4 variables (and the id) of the 5 persons: id v1 v2 v3 v4 1 1 6 21 36 51 2 2 7 22 37 52 3 3 8 23 38 53 4 4 9 24 39 54 5 5 10 25 40 55 Currently, I do this in a loop, but I was wondering about a quick and ressource-friendly way to achieve this? Thanks *S* -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] couting events by subject with black out windows
Hi: Here's a Q D solution that could be improved. It uses the plyr package. Starting from your data1 data frame, library('plyr') dseq - seq(as.Date('2010-01-01'), as.Date('2011-06-05'), by = '30 days') # Use the cut() function to create a factor whose levels are demarcated # by the dates in dseq: # See ?cut for labeling options data1[['tf']] - cut(data1$Date, dseq) ddply(subset(data1, event == 1L), .(tf), summarise, Date.min = min(Date)) tf Date.min 1 2010-01-01 2010-01-01 2 2010-01-31 2010-02-12 3 2010-05-01 2010-05-03 4 2011-03-27 2011-04-21 The value of tf is the left endpoint of the time interval. This isn't your desired output in two respects: (1) summarise won't carry along extra variables, so ID gets dropped; (2) you have 2010-03-01 as the first date of a 30-day period, but according to the way I defined the 30-day intervals, Mar. 1 is the last day of an interval, so that's why it's not included [2010-2-12 precedes it]. You can always change the definitions. If you group by months instead, you get the output you expected. Hope this is enough to get you started.. Dennis On Fri, Nov 18, 2011 at 3:22 PM, Chris Conner connerpha...@yahoo.com wrote: I large datset that includes subjects(ID), Dates and events that need to be counted. Not every date includes an event, and I need to only count one event per 30days, per subject. So in essence, I need to create a 30-day black out period during which time an event cannot be counted for each subject. The reason is that a rule has been set up, whereby a subject can only be counted once per 30 day period (the 30 day window includes the day the event of interest is counted). The solution should count only the following events per subject(per the 30-day blackout rule): ID Date auto1 1/1/2010 auto2 2/12/2010 auto2 4/21/2011 auto3 3/1/2010 auto3 5/3/2010 I have created a multistep process to do this, but it is extremely clumsy (detailed below). I have to believe that one of you has a much more elegant solution. Thank you all in advance for any help ## example data data1 - structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 4L, 4L, 4L, 4L, 4L), .Label = c(, auto1, auto2, auto3), class = factor), Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 15085, 15086, 14642, 14669, 14732,14747, 14749), class = Date), event = c(1L, 1L, 1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c(ID, Date,event), class = data.frame, row.names = c(NA, 14L)) ## remove non events data2 - data1[data1$event==1,] library(doBy) ## create a table of first events step1 - summaryBy(Date~ID, data = data2, FUN=min) step1$Date30 - step1$Date.min+30 step2 - merge(data2, step1, by.x=ID, by.y=ID) ## use an ifelse to essentially remove any events that shouldn't be counted step2$event - ifelse(as.numeric(step2$Date) = step2$Date.min as.numeric(step2$Date) = step2$Date30, 0, step2$event) ## basically repeat steps above until I get an error (no more events) data3 - step2[step2$event==1,] data3- data3[,1:3] step3 - summaryBy(Date~ID, data = data3, FUN=min) step3$Date30 - step3$Date.min+30 step4 - merge(data3, step3, by.x=ID, by.y=ID) step4$event - ifelse(as.numeric(step4$Date) = step4$Date.min as.numeric(step4$Date) = step4$Date30, 0, step4$event) ## then I rbind the keepers ## in this case steps 1 and 3 above final - rbind(step1,step3) ## then reformat final - final[,1:2] final$Date.min - as.Date(final$Date.min,origin=1970-01-01) ## again, extremely clumsy, but it works... HELP! :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape data.frame
This is very straightforward using the reshape2 package: library('reshape2') dc - dcast(a, name ~ year, value_var = 'amount') name 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1a123456789 10 NA NA NA NA 2b 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1985 1 NA 2 25 dcast() returns a data frame; a companion function acast() returns a matrix. If you want to change the names afterward, use names(dc)[-1] - paste('X', 1971:1985, sep = '.') HTH, Dennis On Fri, Nov 18, 2011 at 4:04 PM, Trevor Davies davies.tre...@gmail.com wrote: A late friday afternoon coding question. I'm having a hard time thinking of the correct search terms for what I want to do. If I have a df like this: a - data.frame(name=c(rep('a',10),rep('b',15)),year=c(1971:1980,1971:1985),amount=1:25) name year amount 1 a 1971 1 2 a 1972 2 3 a 1973 3 4 a 1974 4 5 a 1975 5 6 a 1976 6 7 a 1977 7 8 a 1978 8 9 a 1979 9 10 a 1980 10 11 b 1971 11 12 b 1972 12 13 b 1973 13 14 b 1974 14 15 b 1975 15 16 b 1976 16 17 b 1977 17 18 b 1978 18 19 b 1979 19 20 b 1980 20 21 b 1981 21 22 b 1982 22 23 b 1983 23 24 b 1984 24 25 b 1985 25 and I'd like to reshape it so it is like this: X.1971 X.1972 X.1973 X.1974 X.1975 X.1976 X.1977 X.1978 X.1979 X.1980 X.1981 a 1 2 3 4 5 6 7 8 9 10 NA b 11 12 13 14 15 16 17 18 19 20 21 X.1982 X.1983 X.1984 X.1985 a NA NA NA NA b 22 23 24 25 Thanks for the assist. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split list of characters in groups of 2
Hi: Here's one way: apply(matrix(var.names, ncol = 2, byrow = TRUE), 1, function(x) paste(x[1], x[2], sep = ',')) [1] a,b c,d e,f HTH, Dennis On Wed, Nov 16, 2011 at 9:46 PM, B77S bps0...@auburn.edu wrote: hi, If i have a list of things, like this var.names - c(a, b, c, d, e, f) how can i get this: a, b, c, d, e, f thanks ahead of time. -- View this message in context: http://r.789695.n4.nabble.com/split-list-of-characters-in-groups-of-2-tp4079031p4079031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Introducing \n's so that par.strip.text can produce multiline strips in lattice
Hi: This worked for me - I needed to modify some of the strip labels to improve the appearance a bit and also reduced the strip font size a bit to accommodate the lengths of the strings. The main thing was to change \\n to \n. Firstly, I created a new variable called Indic as a character variable and then did some minor surgery on three of the strings: Indic - as.character(imports$Indicator) Indic[3 + 6 *(0:5)] - Chemicals and related\n products imports Indic[4 + 6 *(0:5)] - Pearls, semiprecious \nprecious stones imports Indic[5 + 6 *(0:5)] - Metaliferrous ores \nmetal scrap imports # Read Indic into the imports data frame as a factor: imports$Indic - factor(Indic) # Redo the plot: barchart(X03/1000 ~ time | Indic, data = imports[which(imports$time != 1), ], horiz = FALSE, scales = list(x = list(rot=45, labels=paste(Mar,2007:2011))), par.strip.text=list(lineheight=1, lines=2, cex = 0.8)) Dennis On Wed, Nov 16, 2011 at 11:25 PM, Ashim Kapoor ashimkap...@gmail.com wrote: Dear all, I have the following data, which has \\n in place of \n. I introduced \n's in the csv file so that I could use it in barchart in lattice. When I did that and read it into R using read.csv, it read it as \\n. My question is how do I introduce \n in the middle of a long string of quoted text so that lattice can make multiline strips. Hitting Enter which is supposed to introduce \n's does'nt work because when I goto the middle of the line and press enter Open Office thinks that I am done with editing my text and takes me to the next line. dput(imports) structure(list(Indicator = structure(c(5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L ), .Label = c(, Chemicals and related\\n products imports, Coal export, Gold imports, Gold silver imports, Iron ore export, Iron steel imports, Metaliferrous ores metal scrap imports, Mica export, Ores minerals\\nexport, Other ores \\nminerals export, Pearls precious \\n semiprecious stones imports, Processed minerals\\n export ), class = factor), Units = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(, Rs.crore), class = factor), Expression = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(, Ival), class = factor), time = c(7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 1, 1, 1, 1, 1, 1), X03 = c(66170.46, 65337.72, 62669.86, 33870.17, 36779.35, 27133.25, 71829.14, 67226.04, 75086.89, 29505.61, 31750.99, 32961.26, 104786.39, 95323.8, 134276.63, 76263, 36363.61, 41500.36, 140440.36, 135877.91, 111269.69, 76678.27, 36449.89, 36808.06, 162253.77, 154346.72, 124895.76, 142437.03, 42872.16, 43881.85, 109096.024, 103622.438, 101639.766, 71750.816, 36843.2, 36456.956), id = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L)), row.names = c(1.7, 2.7, 3.7, 4.7, 5.7, 6.7, 1.8, 2.8, 3.8, 4.8, 5.8, 6.8, 1.9, 2.9, 3.9, 4.9, 5.9, 6.9, 1.10, 2.10, 3.10, 4.10, 5.10, 6.10, 1.11, 2.11, 3.11, 4.11, 5.11, 6.11, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ), .Names = c(Indicator, Units, Expression, time, X03, id), class = data.frame, reshapeLong = structure(list(varying = structure(list( X03 = c(X03.07, X03.08, X03.09, X03.10, X03.11, X03.1)), .Names = X03, v.names = X03, times = c(7, 8, 9, 10, 11, 1)), v.names = X03, idvar = id, timevar = time), .Names = c(varying, v.names, idvar, timevar))) On which I want to run barchart(X03/1000~time|Indicator, data=imports[which(imports$time!=1),], horiz=F, scales=list(x=list(rot=45,labels=paste(Mar,2007:2011))), par.strip.text=list(lineheight=1,lines=2)) Many thanks, Ashim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Theil decomposition
Looking at the help pages in the ineq package, the Theil() function simply returns the value of the index. You can look at the source code of the relevant functions to see what they actually compute: library('ineq') ineq # the main function Theil# for the Theil index Neither function is long - ineq() is a wrapper for the index of choice. The code for Theil is very straightforward. It appears that if you want to decompose the index, you'll have to roll your own function to do so. If you know how to compute it componentwise, you could write a function to get the necessary information in each group, and then use that information to compute the overall index. Here's a reproducible example using the chickwts data that computes means and sample sizes by feed type, then uses the result to compute a weighted mean: library('plyr') (v - ddply(chickwts, .(feed), summarise, m = mean(weight), n = length(weight))) with(v, weighted.mean(m, n)) # [1] 261.3099 This is a template of how you might produce the components you need for the Theil index and them pull them together into a weighted sum or product, if that's what you need to do. HTH, Dennis On Tue, Nov 15, 2011 at 10:35 PM, Kitty Lee lee.ki...@yahoo.com wrote: I came across the package 'ineq' that computes a variety of inequality measures (e.g. gini, theil etc). I want to compute the Theil index (racial segregation) and decompose the total into sub-components (by geog levels). I think the package doesn't report the decomposition (correct me if I'm wrong). Just wonder is that available elsewhere? K. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_bar with missing data in package ggplot
Hi: Here's one way, but it puts the two countries side by side rather than stacked (I'm not a big fan of stacked bar charts except in certain contexts). The first version uses the original data, but one can see immediately that there is no distinction between NA and 0: ggplot(g, aes(x = Date, y = value, fill = var2)) + geom_bar(position = 'dodge', stat = 'identity') + facet_wrap(~ variable, nrow = 1) + scale_fill_manual('Country', breaks = levels(g$var2), values = c('red', 'blue')) + opts(legend.position = c(0.87, 0.88), legend.background = theme_rect(fill = 'white')) To compensate, I copied the data to a new object g2 and imputed a small negative value to replace the zero: g2 - g g2$value[8] - -0.01 ggplot(g2, aes(x = Date, y = value, fill = var2)) + geom_bar(position = 'dodge', stat = 'identity') + facet_wrap(~ variable, nrow = 1) + scale_fill_manual('Country', breaks = levels(g2$var2), values = c('red', 'blue')) + opts(legend.position = c(0.87, 0.88), legend.background = theme_rect(fill = 'white')) An additional improvement could be made by keeping the original data and adding some text that indicates where the NAs reside; to do this, we need to offset the date a bit to decently locate the text: ggplot(g, aes(x = Date, y = value, fill = var2)) + geom_bar(position = 'dodge', stat = 'identity') + facet_wrap(~ variable, nrow = 1) + scale_fill_manual('Country', breaks = levels(g$var2), values = c('red', 'blue')) + geom_text(aes(x = as.Date('2001-3-31'), y = 1, label = 'NA'), size = 6) + opts(legend.position = c(0.87, 0.88), legend.background = theme_rect(fill = 'white')) HTH, Dennis On Wed, Nov 16, 2011 at 1:55 AM, Aidan Corcoran aidan.corcora...@gmail.com wrote: Dear all, I was hoping someone could help with a ggplot question. I would like to generate a faceted bar chart, but missing data are causing problems. g-structure(list(Date = structure(c(11322, 11687, 12052, 11322, 11687, 12052, 11322, 11687, 12052, 11322, 11687, 12052), class = Date), variable = c(Govt Revenues to GDP, Govt Revenues to GDP, Govt Revenues to GDP, Govt Revenues to GDP, Govt Revenues to GDP, Govt Revenues to GDP, Structural Budget Position, Structural Budget Position, Structural Budget Position, Structural Budget Position, Structural Budget Position, Structural Budget Position ), var2 = c(United States, United States, United States, Japan, Japan, Japan, United States, United States, United States, Japan, Japan, Japan), value = c(NA, 34.288, 31.831, 29.636, 30.539, 29.093, NA, 0, -2.7, -7.4, -5.7, -7)), .Names = c(Date, variable, var2, value ), row.names = c(21L, 22L, 23L, 169L, 170L, 171L, 206L, 207L, 208L, 354L, 355L, 356L), class = data.frame) gp - ggplot(g, aes(Date, value)) gp- gp + geom_line() gp -gp + facet_grid(var2 ~ variable) gp this works, but trying to get a bar chart version gp - ggplot(g, aes(Date, value)) gp- gp + geom_bar(stat=identity) gp -gp + facet_grid(var2 ~ variable) gp gives the error Error in if (!is.null(data$ymin) !all(data$ymin == 0)) warning(Stacking not well defined when ymin != 0, : missing value where TRUE/FALSE needed Is there something I can do to have a gap for missing data, as happens with the line version? More generally, I may also have missing data between present data: e.g. is.na(g[5,4])-TRUE and I would like if possible to simply see gaps at these points. Thanks for any help. Aidan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create list of names where two df contain == values
Hi: I think you're overthinking this problem. As is usually the case in R, a vectorized solution is clearer and provides more easily understood code. It's not obvious to me exactly what you want, so we'll try a couple of variations on the same idea. Equality of floating point numbers is a difficult computational problem (see R FAQ 7.31), but if it makes sense to define a threshold difference between floating numbers that practically equates to zero, then you're in business. In your example, the difference in numb1 for letter h in the two data frames is far from zero, so define 'equal' to be a difference 10 ^{-6}. Then: # Return the entire matching data frame df.1[abs(df.1$numb1 - df.2$numb1) 0.01, ] Letters numb1 extra.colid 1a 0.3735462 1 CG234 2b 1.1836433 2 CG232 3c 0.1643714 3 CG441 4d 2.5952808 4 CG128 5e 1.3295078 5 CG125 6f 0.1795316 6 CG182 7g 1.4874291 7 CG982 9i 1.5757814 9 CG282 10 j 0.694611610 CG154 # Return the matching letters only as a vector: df.1[abs(df.1$numb1 - df.2$numb1) 0.01, 'Letters' ] If you want the latter object to remain a data frame, use drop = FALSE as an extra argument after 'Letters'. If you want to create a list object such that each letter comprises a different list component, then the following will do - the as.character() part coerces the factor Letters into a character object: as.list(as.character(df.1[abs(df.1$numb1 - df.2$numb1) 0.01, 'Letters' ])) HTH, Dennis On Wed, Nov 16, 2011 at 5:03 AM, Rob Griffin robgriffin...@hotmail.com wrote: Hello again... sorry to be posting yet again, but I hadn't anticipated this problem. I am trying to now put the names found in one column in data frame 1 (lets call it df.1[,1]) in to a list from the rows where the values in df.1[,2] match values in a column of another dataframe (df.2[3]) I tried to write this function so that it put the list of names (called Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think its too complex for a beginner R-enthusiast ify-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL} Iffy-apply( df.1, 1, FUN=ify, x=df.1, y=df.2, a=2, b=3, c=1 ) But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i]) Here is a dataset that replicates the problem, you'll notice the h criteria values are different between the two dataframes and therefore it would produce a list of the 9 letters where the two criteria columns matched (a,b,c,d,e,f,g,i,j): df.1-data.frame(rep(letters[1:10])) colnames(df.1)[1]-(Letters) set.seed(1) df.1$numb1-rnorm(10,1,1) df.1$extra.col-c(1,2,3,4,5,6,7,8,9,10) df.1$id-c(CG234,CG232,CG441,CG128,CG125,CG182,CG982,CG541,CG282,CG154) df.1 df.2-data.frame(rep(letters[1:10])) colnames(df.2)[1]-(Letters) set.seed(1) df.2$extra.col-c(1,2,3,4,5,6,7,8,9,10) df.2$numb1-rnorm(10,1,1) df.2$id-c(CG234,CG232,CG441,CG128,CG125,CG182,CG982,CG541,CG282,CG154) df.2[8,3]-12 df.1 df.2 Your patience is much appreciated, Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] access sublists by a variable
Hi: Try dict[[a]] rather than dict$a, and read the section on lists in the Introduction to R manual to learn how to properly index them. HTH, Dennis On Wed, Nov 16, 2011 at 7:16 AM, Antje Gruner n...@allesjetzt.net wrote: Dear list, I'd like to access the elements of a list within another list with the help of a variable. dict - list( 24 = c(1,2,3,6,12,24,48,72,96,120,144,168,720), 168 = c(1,12,24,48,72,96,120,144,168,336,504,672), 8760 = c(1,24,168,730,4380,8760) ) dict$24 works fine, but a - 24 dict$a returns NULL Unfortunately dict[a] does not work for me, because I see no possibility to access the inner elements of list 24 in that case i.e. dict[a][1] returns the whole list and not the first element. What is the correct syntax to access those elements with the help of a variable? Thanks in advance Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] equal spacing of the polygons in levelplot key (lattice)
OK, how about this instead? # library('lattice') levs - as.vector(quantile(volcano,c(0,0.1,0.5,0.9,0.99,1))) levq - seq(min(levs), max(levs), length = 6) levelplot(volcano, at = levs, colorkey = list(at = levq, labels = list(at = levq, labels = levs) )) Dennis On Wed, Nov 16, 2011 at 10:27 AM, Andy Bunn andy.b...@wwu.edu wrote: -Original Message- From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Tuesday, November 15, 2011 8:54 PM To: Andy Bunn Cc: r-help@r-project.org Subject: Re: [R] equal spacing of the polygons in levelplot key (lattice) Hi: Does this work? Thanks Dennis. This almost works. Is there a way to make the rectangles in the key the same size? In this example five rectangles of the same area evenly arrayed? Can the key be coerced into being categorical? The data I want to work with are not spatial but it occurs to me that this is a common mapping task (e.g., in this example you might want to label these colors 'low', 'kind of low', 'medium low', etc. or map land covers or such.) I'll look at the sp or raster plotting equivalent. # library('lattice') levs - as.vector(quantile(volcano,c(0,0.1,0.5,0.9,0.99,1))) levelplot(volcano, at = levs, colorkey = list(labels = list(at = levs, labels = levs) )) HTH, Dennis On Tue, Nov 15, 2011 at 1:12 PM, Andy Bunn andy.b...@wwu.edu wrote: Given the example: R (levs - quantile(volcano,c(0,0.1,0.5,0.9,0.99,1))) 0% 10% 50% 90% 99% 100% 94 100 124 170 189 195 R levelplot(volcano,at=levs) How can I make the key categorical with the size of the divisions equally spaced in the key? E.g., five equal size rectangles with labels at levs c(100,124,170,189,195)? Apologies if this is obvious. -A R version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 14.0 year 2011 month 10 day 31 svn rev 57496 language R version.string R version 2.14.0 (2011-10-31) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot strange behavior
And you got a reply on the ggplot2 list, which is why you're asked not to cross-post. For those who are wondering, geom_boxplot() in the ggplot2 package will by default plot outside points along the same line as the boxplot whiskers at their actual values. The gentleman jittered the original points in a separate call to geom_point(), so the result is that the outside points are plotted twice - once along the whisker lines (in black) from geom_boxplot() and once as jittered points with alpha transparency from geom_point(). Since jitter can be positive or negative in either the horizontal direction, confusion ensues... Dennis On Wed, Nov 16, 2011 at 11:16 AM, Giovanni Azua azuag...@student.ethz.ch wrote: Hello, I generate box plots from my data like this: qplot(x=xxx,y=column,data=data,geom=boxplot) + xlab(xxx) + ylab(ylabel) + theme_bw() + scale_y_log10() + geom_jitter(alpha=I(1/10)) The problem is that I see lot of points above the maximum at the same level as some outliers. It looks very weird as I expected the outliers to be few and specially not see any points other than the outliers below the minimum or above the maximum indicator. Can anyone explain what's going on? Attached I send a snapshot, see the second level having some outliers, separate from the outliers there are also some points which seem not to be outliers and that are above the maximum indicator? is this a bug or am I missing anything? TIA, Best regards, Giovanni PS: I sent this to the ggplot2 list too (sorry for the double post but I am kind of under pressure with this) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] equal spacing of the polygons in levelplot key (lattice)
Hi: Does this work? # library('lattice') levs - as.vector(quantile(volcano,c(0,0.1,0.5,0.9,0.99,1))) levelplot(volcano, at = levs, colorkey = list(labels = list(at = levs, labels = levs) )) HTH, Dennis On Tue, Nov 15, 2011 at 1:12 PM, Andy Bunn andy.b...@wwu.edu wrote: Given the example: R (levs - quantile(volcano,c(0,0.1,0.5,0.9,0.99,1))) 0% 10% 50% 90% 99% 100% 94 100 124 170 189 195 R levelplot(volcano,at=levs) How can I make the key categorical with the size of the divisions equally spaced in the key? E.g., five equal size rectangles with labels at levs c(100,124,170,189,195)? Apologies if this is obvious. -A R version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 14.0 year 2011 month 10 day 31 svn rev 57496 language R version.string R version 2.14.0 (2011-10-31) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert full matrix back to lower triangular matrix
Hi: (1) Here is why your e-mails look mangled on this list: [[alternative HTML version deleted]] R-help is a text-only list, so please change your mailer's settings to send ASCII text rather than HTML. (2) The print method you see displayed in dd1 is equivalent to the following: ddm * lower.tri(ddm) Note that lower.tri(ddm) produces a logical matrix; in R, when you multiply a numeric object by a logical, the latter is converted to 0's and 1's. In this case, the lower triangular elements are retained while the other elements are coerced to 0. HTH, Dennis On Tue, Nov 15, 2011 at 2:55 PM, Juliet Ndukum jpnts...@yahoo.com wrote: Given a vector; ab = seq(0.5,1, by=0.1) ab[1] 0.5 0.6 0.7 0.8 0.9 1.0 The euclidean distance between the vector elements is given by the lower triangular matrix dd1 = dist(ab,euclidean) dd1 1 2 3 4 52 0.1 3 0.2 0.1 4 0.3 0.2 0.1 5 0.4 0.3 0.2 0.1 6 0.5 0.4 0.3 0.2 0.1 Convert the lower triangular matrix to a full matrix ddm = as.matrix(dd1) ddm 1 2 3 4 5 61 0.0 0.1 0.2 0.3 0.4 0.52 0.1 0.0 0.1 0.2 0.3 0.43 0.2 0.1 0.0 0.1 0.2 0.34 0.3 0.2 0.1 0.0 0.1 0.25 0.4 0.3 0.2 0.1 0.0 0.16 0.5 0.4 0.3 0.2 0.1 0.0 I would be grateful if someone could provide me with a code to convert ddm to the lower triangular matrix as before i.e. dd1 above. Your help will be greatly appreciated.JN NB: I apologize for the poor output of the previous email. Thanks for your understanding. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] global optimisation with inequality constraints
(1) Please do not hijack another thread to ask an off-topic question - start a new one instead. (2) Your question refers to a topic called 'restricted least squares'. Search the R-help archives on that subject and you should get a number of answers. Someone answered a similar question here within the past two weeks. (3) Weighted least squares is not restricted least squares. They have different purposes. Good luck. Dennis On Tue, Nov 15, 2011 at 10:24 PM, geeknick geekn...@rocketmail.com wrote: Please advice on the package I should use to run a linear regression model (weighted least squared) with linear equality constraint. I initially tried constrOptim but it turned out that it only supported http://solve-graph-linear-inequalities.blogspot.com/2011/11/blog-post.html linear inequality constraint. Thank you very much in advance. Cheers, NICK -- View this message in context: http://r.789695.n4.nabble.com/global-optimisation-with-inequality-constraints-tp3799258p4075455.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] max min values within dataframe
Groupwise data summarization is a very common task, and it is worth learning the various ways to do it in R. Josh showed you one way to use aggregate() from the base package and Michael showed you one way of using the plyr package to do the same; another way would be ddply(df, .(Patient, Region), summarise, max = max(Score), min = min(Score)) to save on writing an explicit function. Similarly, if you have a version of R = 2.11.0, the aggregate() function now has a nice formula interface, so Josh's code could also be written as aggregate(Score ~ Patient + Region, data = df, FUN = range) with a subsequent renaming of the variables as shown. Other packages that could perform this task with ease include the doBy package, the data.table package, the remix package, the Hmisc package and, if you are comfortable with SQL, the sqldf package. For relative novices, the doBy package is a very nice place to start because it comes with a well written vignette and the function names correspond well with the tasks they perform (e.g., summaryBy(), transformBy()). The plyr and data.table packages are more general and more powerful in terms of the types of tasks to which each is suited. Unlike aggregate() and doBy:::summaryBy(), these packages can process multivariable functions. As noted above, if you have an SQL background, sqldf operates on R data objects as though they were SQL tables, which is advantageous in complex data extraction tasks. Package remix is useful if you want to organize results into a tabular form that is reminiscent of SAS. HTH, Dennis On Mon, Nov 14, 2011 at 8:10 AM, B Laura gm.spam2...@gmail.com wrote: dear R-team I need to find the min, max values for each patient from dataset and keep the output of it as a dataframe with the following columns - Patient nr - Region (remains same per patient) - Min score - Max score Patient Region Score Time 1 1 X 19 28 2 1 X 20 126 3 1 X 22 100 4 1 X 25 191 5 2 Y 12 1 6 2 Y 12 2 7 2 Y 25 4 8 2 Y 26 7 9 3 X 6 1 10 3 X 6 4 11 3 X 21 31 12 3 X 22 68 13 3 X 23 31 14 3 X 24 38 15 3 X 21 15 16 3 X 22 24 17 3 X 23 15 18 3 X 24 243 19 3 X 25 77 20 4 Y 6 5 21 4 Y 22 28 22 4 Y 23 75 23 4 Y 24 19 24 5 Y 23 3 25 5 Y 24 1 26 5 Y 23 33 27 5 Y 24 13 28 5 Y 25 42 29 5 Y 26 21 30 5 Y 27 4 31 6 Y 24 4 32 6 Y 32 8 So far I could find the min and max values for each patient, but the output of it is not (yet) what I need. Patient.nr = unique(Patient) aggregate(Score, list(Patient), max) Group.1 x 1 1 25 2 2 26 3 3 25 4 4 24 5 5 27 6 6 32 aggregate(Score, list(Patient), min) Group.1 x 1 1 19 2 2 12 3 3 6 4 4 6 5 5 23 6 6 24 I would like to do same but writing this new information (min, max values) in a dataframe with following columns - Patient nr - Region (remains same per patient) - Min score - Max score Can anybody help me with this? Thanks Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding units to levelplot's colorkey
You don't show code or a reproducible example, so I guess you want a general answer. Use the draw.colorkey() function inside the levelplot() call. It takes an argument key =, which accepts a list of arguments, including space, col, at, labels, tick.number, width and height (see p. 155 of the Lattice book). More specifically, the labels argument also accepts a list of values, with components at, labels, cex, col, font, fontface and fontfamily. You could attach the units as labels with a combination of at = and labels = inside the (outer) labels argument, something like myAt - c(1, 3, 5, 7, 9) myLab - paste(myAt, 'cm') levelplot( ... draw.colorkey(key = list(labels = list(at = myAt, labels = myLab)), ...) If that doesn't work, then please provide a reproducible example. HTH, Dennis On Mon, Nov 14, 2011 at 12:34 PM, Carlisle Thacker carlisle.thac...@noaa.gov wrote: How to add units (e.g. cm) to the color key of a lattice levelplot? The plots looks fantastic, but it would be nice to indicate somewhere near the end of the color key that the values associated with its colors are in centimeters or some other physical units. The only thing I find is the possibility to specify the labels so that one explicitly includes the units. That leaves little flexibility for positioning where this information appears. Is there a better way? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate the distribution
Google is an amazing resource for getting information. Try Googling 'simulation in R' - I got several useful hits on the first page. HTH, Dennis On Sun, Nov 13, 2011 at 7:41 AM, Anban nino.z...@gmail.com wrote: Hi everyone, i really need some help with one task. I simply cant understand what i really have to do. The task is: Generate the distribution of maximum on samples of size 200 from beta with shape parameters 5 and 5 distribution. Plot a histogram of simulated values and overlay at least one distribution curve that you think might be suitable. Im rookie with simulations, so i need yours help. Tnx -- View this message in context: http://r.789695.n4.nabble.com/Generate-the-distribution-tp4036755p4036755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2^k*r (with replications) experimental design question
I'm guessing you have nine replicates of a 2^5 factorial design with a couple of missing values. If so, define a variable to designate the replicates and use it as a blocking factor in the ANOVA. If you want to treat the replicates as a random rather than a fixed factor, then look into the nlme or lme4 packages. HTH, Dennis On Sun, Nov 13, 2011 at 4:33 PM, Giovanni Azua brave...@gmail.com wrote: Hello, I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the context of performance analysis i.e. my response variables are Throughput and Response Time. I use the aov function and the results look ok: str(throughput) 'data.frame': 286 obs. of 7 variables: $ Time : int 6 7 8 9 10 11 12 13 14 15 ... $ Throughput : int 42 44 33 41 43 40 37 40 42 37 ... $ No_databases : Factor w/ 2 levels 1,4: 1 1 1 1 1 1 1 1 1 1 ... $ Partitioning : Factor w/ 2 levels sharding,replication: 1 1 1 1 1 1 1 1 1 1 ... $ No_middlewares: Factor w/ 2 levels 2,4: 1 1 1 1 1 1 1 1 1 1 ... $ Queue_size : Factor w/ 2 levels 40,100: 1 1 1 1 1 1 1 1 1 1 ... $ No_clients : Factor w/ 1 level 128: 1 1 1 1 1 1 1 1 1 1 ... head(throughput) Time Throughput No_databases Partitioning No_middlewares Queue_size 1 6 42 1 sharding 2 40 2 7 44 1 sharding 2 40 3 8 33 1 sharding 2 40 4 9 41 1 sharding 2 40 5 10 43 1 sharding 2 40 6 11 40 1 sharding 2 40 throughput.aov - aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) summary(throughput.aov) Df Sum Sq Mean Sq F value Pr(F) No_databases 1 28488651 28488651 53.4981 2.713e-12 *** Partitioning 1 71687 71687 0.1346 0.713966 No_middlewares 1 5624454 5624454 10.5620 0.001295 ** Queue_size 1 50892 50892 0.0956 0.757443 Residuals 281 149637226 532517 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 This is somehow what I expected and I am happy, it is saying that the Throughput is significatively affected firstly by the number of database instances and secondly by the number of middleware instances. The problem is that I need to integrate multiple replications of this same 2^k so I can also account for experimental error i.e. the _r_ of 2^k*r but I can't see how to integrate the _r_ term into the data and into the aov function parameters. Can anyone advice? TIA, Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] beanplot without log scale?
Hi: Try the log = argument to beanplot(). Here's an example: library('beanplot') v - exp(runif(1000, 0, 10)) beanplot(v) beanplot(v, log = ') HTH, Dennis On Thu, Nov 10, 2011 at 1:27 PM, Twila Moon twi...@uw.edu wrote: Is it possible to force beanplot not to use a log scale? I want to be able to create multiple beanplots of different data on the same specified y-axis. When I try to force a ylim, I get the following... beanplot(reg5vel,ylim=(c(0,12000))) log=y selected Warning message: In plot.window(xlim = c(0.5, 7.5), ylim = c(0, 12000), log = y) : nonfinite axis limits [GScale(-inf,4.07918,2, .); log=1] Thanks, Twila [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multivariate modeling codes
Hi: R-Bloggers picked up on this web site that contains several interesting posts re usage of R, including this one on using a Bayesian approach to multivariate mixed effects models using the MCMCglmm package: http://www.quantumforest.com/2011/11/coming-out-of-the-bayesian-closet-multivariate-version/ With multiple responses in a longitudinal study, this might be of interest to you. A similar question re multivariate response was asked a few days ago on R-sig-mixed-models, where Kevin Wright posted the above link. I'd suggest that future questions on this subject be moved to that group. HTH, Dennis On Fri, Nov 11, 2011 at 12:25 PM, yurirouge lilysh...@msn.com wrote: HI, I am relatively new to R and would appreciate some help or directions for this. I am trying to model 3 longitudinal outcomes jointly and to identify some predictors for these 3 joint outcomes (all continuous). I am trying to find some codes that I may modify to do this but cannot seem to find anything. -- View this message in context: http://r.789695.n4.nabble.com/multivariate-modeling-codes-tp4032624p4032624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining Overlapping Data
Hi: This doesn't sort the data by strain level, but I think it does what you're after. It helps if strain is either a factor or character vector in each data frame. h - function(x, y) { tbx - table(x$strain) tby - table(y$strain) # Select the strains who have more than one member # in each data frame mgrps - intersect(names(tbx[tbx 0]), names(tby[tby 0])) # concatenate the data with common strains rbind(subset(x, gp %in% mgrps), subset(y, gp %in% mgrps)) } # Result: dc - h(x, y) HTH, Dennis On Fri, Nov 11, 2011 at 1:07 PM, kickout kyle.ko...@gmail.com wrote: I've scoured the archives but have found no concrete answer to my question. Problem: Two data sets 1st data set(x) = 20,000 rows 2nd data set(y) = 5,000 rows Both have the same column names, the column of interest to me is a variable called strain. For example, a strain named Chab1405 appears in x 150 times and in y 25 times... strain Chab1999 only appears 200 times in x and none in y (so i dont want that retained). I want to create a new data frame that has all 175 measurements for Chab1405 and any other 'strain' that appears in both the two data sets.. but not strains that appear in only one data set...So i want the intersection of two data sets (maybe?). I've tried x %in% y, but that only gives TRUE/FALSE -- View this message in context: http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 - regression statistics how to display on plot
Hi: Here's an example of how one might do this in a specific example using geom_text(). # Some fake data: df - data.frame(x = 1:10, y = 0.5 + (1:10) + rnorm(10)) # Fit a linear model to the data and save the model object: mod - lm(y ~ x, data = df) # Create a list of character strings - the first component # produces the fitted model, the second produces a # string to compute R^2, but in plotmath syntax. rout - list(paste('Fitted model: ', round(coef(mod)[1], 3), ' + ', round(coef(mod)[2], 3), ' x', sep = ''), paste('R^2 == ', round(summary(mod)[['r.squared']], 3), sep = '') ) # This looks ugly, but I'm using the round() function to make the # equations look more sane. coef(mod) extracts the model # coefficients (intercept, then slope), and # summary(mod)[['r.squared']] extracts R^2. # See what they look like: rout # Notice that the first component of rout is simply a text string # that can be passed as is, but the second string needs to be # wrapped inside an expression, which is what parse = TRUE # in geom_text() does. # Now construct the plot; given the (x, y) extent of the data, the # coordinates make sense, but you have to adapt them to # your data. ggplot(df, aes(x, y)) + geom_smooth(method = 'lm') + geom_text(aes(x = 2, y = 10, label = rout[[1]]), hjust = 0) + geom_text(aes(x = 2, y = 9.5, label = rout[[2]]), hjust = 0, parse = TRUE ) hjust = 0 makes the text strings flush left, parse = TRUE in the second call converts the second string in rout into an expression. This is fine if your plot is a one-off or two-off deal; the code above gives you a sense of the programming required. OTOH, if you need to do this a lot, then you're better off with a function that can automate the process, which Bryan was kind enough to provide. I might add that there is a dedicated ggplot2 listserv on google: ggpl...@googlegroups.com, for which questions like these are well suited. HTH, Dennis On Thu, Nov 10, 2011 at 5:45 AM, Durant, James T. (ATSDR/DTEM/PRMSB) h...@cdc.gov wrote: Hello - So I am trying to use ggplot2 to show a linear regression between two variables, but I want to also show the fit of the line on the graph as well. I am using ggplot2 for other graphics in what I am working on, so even though this would be a fairly easy thing to do in Excel, I would prefer to do it in R to keep my look and feel, and I think ggplot2 is just cooler. Here is a sample script of what I am trying to accomplish: df-NULL df$x-rnorm(100) df$y-rnorm(100) df-data.frame(df) ggplot(df, aes(x=x,y=y))+geom_point()+geom_smooth(method=lm) # would like to be able to showr squared and slope/intercept of lm VR Jim James T. Durant, MSPH CIH Emergency Response Coordinator US Agency for Toxic Substances and Disease Registry Atlanta, GA 30341 770-378-1695 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] newbie's question : xyplot legend with a white background
Hi: Add a background = 'color' argument to key(): library('lattice') xyplot(1~1, panel = function(x,y, ...) { panel.xyplot(x,y) panel.abline(v=seq(0,1.4,by=0.1)) panel.abline(h=seq(0,1.4,by=0.1)) } ,key = list (x=0.2, y=0.88, points = list(col=c(red,green),pch=20, cex=2), text=list(c(a,z)), background = 'white')) Dennis On Thu, Nov 10, 2011 at 8:09 AM, PtitBleu ptit_b...@yahoo.fr wrote: Hello, Sorry in advance for adding a silly question on this forum but I haven't found the right keywords to find a solution to this basic problem. I'm just looking a way to have a white background behind the legend to hide the grid. Thanks in advance. The silly example for my silly question: xyplot(1~1, panel = function(x,y, ...) { panel.xyplot(x,y) panel.abline(v=seq(0,1.4,by=0.1)) panel.abline(h=seq(0,1.4,by=0.1)) } ,key=list(x=0.2, y=0.88, points=list(col=c(red,green),pch=20, cex=2), text=list(c(a,z))) ) -- View this message in context: http://r.789695.n4.nabble.com/newbie-s-question-xyplot-legend-with-a-white-background-tp4024152p4024152.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting columns that match criteria
Hi: Here's a toy example: # Default var names are V1-V20: u - as.data.frame(matrix(rpois(100, 3), ncol = 20)) u - transform(u, ngt1 = apply(u[, c('V1', 'V4', 'V9', 'V15')], 1, function(x) sum(x 1)) ) u HTH, Dennis On Thu, Nov 10, 2011 at 7:24 AM, JL Villanueva jlpost...@gmail.com wrote: Hi, I am a little new in R but I'm finding it extremely useful :) Here's my tiny question: I've got a table with a lot of columns. What I am interested now is to evaluate how many of 4 columns have a value greater than 1. I think it can be done with subset() but it will take a very long condition and become unfeasible if I want to compare more than 4 columns. I put here a small example Col1 Col2 Col3 Col 4 1 1 1 1 -0 columns greater than 1 2 1 1 1 -1 column greater than 1 4 1 4 1 -2 columns greater than 1 3 3 3 3 -3 columns greater than 1 Then I want to filter by that number, my idea is to create a new column storing the number calculated and subset() by it. Any hints? Thanks in advance JL [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting a function with given formula in ggplot2
Hi: Borrowing from this thread, courtesy of Brian Diggs: http://groups.google.com/group/ggplot2/browse_thread/thread/478f9e61d41b4678/ed323c497db61156?lnk=gstq=stat_function#ed323c497db61156 ...here's a small reproducible example: ddf - data.frame(x = 1:10, y = 0.4 + 0.6 * (1:10) + rnorm(10)) # Find the linear model coefficients lmc - coef(lm(y ~ x, data = ddf)) # Create a function to produce the fitted line lmeq - function(x) lmc[1] + lmc[2] * x # Construct the ggplot() and use stat_function(): ggplot(ddf, aes(x = x, y = y)) + geom_point() + stat_function(fun = lmeq, colour = 'red', size = 1) HTH, Dennis On Thu, Nov 10, 2011 at 10:47 AM, Curiouslearn curiousle...@gmail.com wrote: Hi All, I have a scatter plot produced using ggplot2 and I want to add the regression line to this scatter plot. I suppose I can use geom_smooth() to do this, but for the sake of learning ( I am new both to R and ggplot2), I want to try and add it as a function (something that curve() does in the standard R plotting). I did some search and found that stat_function() can be used for this. But somehow it is not working. The following is my code. Can you please tell me where I am going wrong and what the correct code would be: reg1 - lm(kid_score ~ mom_hs, data=iqdata) scatter - ggplot() + geom_point(data=iqdata, aes(x=as.factor(mom_hs), y=kid_score) ) + xlab(Mother's High School Status) + ylab(Children's Test Score) + stat_function(fun=function(x) coef(reg1)[1] + coef(reg1)[2] * x) I understand that you probably do not have the data I am using (I am trying this out from Gelman Hill's book on Multilevel models). But I was hoping you may be able to point out the error without the data. Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Listing tables together from random samples from a generated population?
Hi: There are two ways you could go about this: lists or arrays. It's pretty easy to generate an array, a little more work to get the list. I'm assuming the objective is to extract a chi-square statistic from each table, so I'll show a couple of ways to do that, too. library('plyr') ## Start with the data: y-data.frame(gender=sample(c('Male', 'Female'), size=10, replace=TRUE, c(0.5, 0.5)), tea=sample(c('Yes', 'No'), size=10, replace=TRUE, c(0.5, 0.5))) ## Function to produce a table: tabfun - function(d) table(d[sample(seq_len(nrow(d)), 100), ]) x2stat - function(m) chisq.test(m)$statistic ## Array version: tbarr - replicate(100, tabfun(y)) # X^2 statistics using apply() from base R and # aaply() from plyr: u1 - apply(tablist, 3, x2stat) u2 - aaply(tablist, 3, x2stat) ## List version: tblst - vector('list', 100) for(i in seq_along(tblst)) tblst[[i]] - tabfun(y) v1 - unname(do.call(c, lapply(tblst, x2stat))) v2 - laply(tblst, x2stat) From here, it's easy to do the histogram :) HTH, Dennis On Thu, Nov 10, 2011 at 12:48 PM, Simon Kiss sjk...@gmail.com wrote: . HI there, I'd like to show demonstrate how the chi-squared distribution works, so I've come up with a sample data frame of two categorical variables y-data.frame(gender=sample(c('Male', 'Female'), size=10, replace=TRUE, c(0.5, 0.5)), tea=sample(c('Yes', 'No'), size=10, replace=TRUE, c(0.5, 0.5))) And I'd like to create a list of 100 different samples of those two variables and the resulting 2X2 contingency tables table(.y[sample(nrow(.y), 100), ]) How would I combine these 100 tables into a list? I'd like to be able to go in and find some of the extreme values to show how the sampling distribution of the chi-square values. I can already get a histogram of 100 different chi-squared values that shows the distribution nicely (see below), but I'd like to actually show the underlying tables, for demonstration's sake. .z-vector() for (i in 1:100) { .z-c(.z, chisq.test(table(.y[sample(nrow(.y), 200), ]))$statistic) } hist(.z, xlab='Chi-Square Value', main=Chi-Squared Values From 100 different samples asking\nabout gender and tea/coffee drinking) abline(v=3.84, lty=2) Thank you in advance, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why NA coefficients
The cell mean mu_{12} is non-estimable because it has no data in the cell. How can you estimate something that's not there (at least without imputation :)? Every parametric function that involves mu_{12} will also be non-estimable - in particular, the interaction term and the population marginal means . That's why you get the NA estimates and the warning. All this follows from the linear model theory described in, for example, Milliken and Johnson (1992), Analysis of Messy Data, vol. 1, ch. 13. Here's an example from Milliken and Johnson (1992) to illustrate: B1 B2 B3 T1 2, 6 8, 6 T23 14 12, 9 T36 9 Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means are estimated by the cell averages. From M J (p. 173, whence this example is taken): Whenever treatment combinations are missing, certain hypotheses cannot be tested without making some additional assumptions about the parameters in the model. Hypotheses involving parameters corresponding to the missing cells generally cannot be tested. For example, for the data [above] it is not possible to estimate any linear combinations (or to test any hypotheses) that involve parameters \mu_{12} and \mu_{33} unless one is willing to make some assumptions about them. They continue: One common assumption is that there is no interactions between the levels of T and the levels of B. In our opinion, this assumption should not be made without some supporting experimental evidence. In other words, removing the interaction term makes the non-estimability problem disappear, but it's a copout unless there is some tangible scientific justification for an additive rather than an interaction model. For the above data, M J note that it is not possible to estimate all of the expected marginal means - in particular, one cannot estimate the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$, $\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and $\bar{\mu}_{.1}$ since these functions of the parameters involve terms associated with the means of the missing cells. Moreover, any hypotheses involving parametric functions that contain non-estimable cell means are not testable. In this example, the test of equal row population marginal means is not testable because $\bar{\mu}_{1.}$ and $\bar{\mu}_{3.}$ are not estimable. [Aside: if the term parametric function is not familiar, in this context it refers to linear combinations of model parameters. In the M J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a parametric function.] Hopefully this sheds some light on the situation. Dennis On Mon, Nov 7, 2011 at 10:17 PM, array chip arrayprof...@yahoo.com wrote: Hi Dennis, The cell mean mu_12 from the model involves the intercept and factor 2: Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by: predict(fit,data.frame(list(treat=1,group=2))) 1 0.09133691 Warning message: In predict.lm(fit, data.frame(list(treat = 1, group = 2))) : prediction from a rank-deficient fit may be misleading But as you can see, it gave a warning about rank-deficient fit... why this is a rank-deficient fit? Because treat 1_group 2 has no cases, so why it is still estimable while on the contrary, treat 7_group 2 which has 2 cases is not? Thanks John From: Dennis Murphy djmu...@gmail.com To: array chip arrayprof...@yahoo.com Sent: Monday, November 7, 2011 9:29 PM Subject: Re: [R] why NA coefficients Hi John: What is the estimate of the cell mean \mu_{12}? Which model effects involve that cell mean? With this data arrangement, the expected population marginal means of treatment 1 and group 2 are not estimable either, unless you're willing to assume a no-interaction model. Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data (vol. 1) cover this topic in some detail, but it assumes you're familiar with the matrix form of a linear statistical model. Both chapters cover the two-way model with interaction - Ch.13 from the cell means model approach and Ch. 14 from the model effects approach. Because this was written
Re: [R] ggplot2 reorder factors for faceting
Hi: (1) Here is one way to reorganize the levels of a factor: plotData[['infection']] - factor(plotData[['infection']], levels = c('InfA', 'InfC', 'InfB', 'InfD')) Do this ahead of the call to ggplot(), preferably after plotData is defined. relevel() resets the baseline category of a factor, but here you want to make multiple changes. (2) You probably want a better title for the legend. Assuming you want 'Scale' as the title, you can add the following to labs: labs(..., fill = 'Scale') HTH, Dennis On Tue, Nov 8, 2011 at 3:51 AM, Iain Gallagher iaingallag...@btopenworld.com wrote: Dear List I am trying to draw a heatmap using ggplot2. In this heatmap I have faceted my data by 'infection' of which I have four. These four infections break down into two types and I would like to reorder the 'infection' column of my data to reflect this. Toy example below: library(ggplot2) # test data for ggplot reordering genes - (rep (c(rep('a',4), rep('b',4), rep('c',4), rep('d',4), rep('e',4), rep('f',4)) ,4)) fcData - rnorm(96) times - rep(rep(c(2,6,24,48),6),4) infection - c(rep('InfA', 24), rep('InfB', 24), rep('InfC', 24), rep('InfD', 24)) infType - c(rep('M', 24), rep('D',24), rep('M', 24), rep('D', 24)) # data is long format for ggplot2 plotData - as.data.frame(cbind(genes, as.numeric(fcData), as.numeric(times), infection, infType)) hp2 - ggplot(plotData, aes(factor(times), genes)) + geom_tile(aes(fill = scale(as.numeric(fcData + facet_wrap(~infection, ncol=4) # set scale hp2 - hp2 + scale_fill_gradient2(name=NULL, low=#0571B0, mid=#F7F7F7, high=#CA0020, midpoint=0, breaks=NULL, labels=NULL, limits=NULL, trans=identity) # set up text (size, colour etc etc) hp2 - hp2 + labs(x = Time, y = ) + scale_y_discrete(expand = c(0, 0)) + opts(axis.ticks = theme_blank(), axis.text.x = theme_text(size = 10, angle = 360, hjust = 0, colour = grey25), axis.text.y = theme_text(size=10, colour = 'gray25')) hp2 - hp2 + theme_bw() In the resulting plot I would like infections infA and infC plotted next to each other and likewise for infB and infD. I have a column in the data - infType - which I could use to reorder the infection column but so far I have no luck getting this to work. Could someone give me a pointer to the best way to reorder the infection factor and accompanying data into the order I would like? Best iain sessionInfo() R version 2.13.2 (2011-09-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6 loaded via a namespace (and not attached): [1] digest_0.5.0 tools_2.13.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply to list of variables
Hi: Here's another way of doing this on the simplified version of your example: L - vector('list', 3) # initialize a list of three components ## populate it for(i in seq_along(L)) L[[i]] - rnorm(20, 10, 3) ## name the components names(L) - c('Monday', 'Tuesday', 'Wednesday') ## replace values = 10 with NA lapply(L, function(x) replace(x, x = 10, NA) If you have a large number of atomic objects, you could create a vector of the object names, make a list out of them (e.g., L - list(objnames)) and then mimic the lapply() as above. replace() only works on vectors, though - Uwe's solution is more general. HTH, Dennis On Tue, Nov 8, 2011 at 8:59 AM, Ana rrast...@gmail.com wrote: Hi Can someone help me with this? How can I apply a function to a list of variables. something like this listvar=list(Monday,Tuesday,Wednesday) func=function(x){x[which(x=10)]=NA} lapply(listvar, func) were Monday=[213,56,345,33,34,678,444] Tuesday=[213,56,345,33,34,678,444] ... in my case I have a neverending list of vectors. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
In addition to Dan's quite valid concern, the final sample is not truly 'random' - the first k - 1 elements are randomly chosen, but the last is determined so that the constraint is met. Dennis On Tue, Nov 8, 2011 at 9:59 AM, Nordlund, Dan (DSHS/RDA) nord...@dshs.wa.gov wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of SarahJoyes Sent: Tuesday, November 08, 2011 5:57 AM To: r-help@r-project.org Subject: Re: [R] Sampling with conditions That is exactly what I want, and it's so simple! Thanks so much! Sarah, I want to point out that my post was qualified by something like. I am not sure it is exactly what you want. Since you didn't quote my post, let me show my suggestion and then express my concern. n - matrix(0,nrow=5, ncol=10) repeat{ c1 - sample(0:10, 4, replace=TRUE) if(sum(c1) = 10) break } n[,1] - c(c1,10-sum(c1)) n This nominally meets your criteria, but it will tend to result in larger digits being under-represented. For example, you unlikely to get a result like c(0,8,0,0,2) or (9,0,0,1,0). That may be OK for your purposes, but I wanted to point it out. You could use something like n - matrix(0,nrow=5, ncol=10) c1 - rep(0,4) for(i in 1:4){ upper - 10-sum(c1) c1[i] - sample(0:upper, 1, replace=TRUE) if(sum(c1) == 10) break } n[,1] - c(c1,10-sum(c1)) n if that would suit your purposes better. Good luck, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nesting scale_manual caracteristics in ggplot
Hi: The problem is that you use different sets of labels in each scale_*_manual. To get all of the scales to merge into one, you need to have the same title, same breaks and same labels for the legend. This gets you a single legend: ggplot(data = df, aes(x = year, y = total, colour = treatment, linetype=treatment)) + geom_point(aes(shape = treatment)) + facet_wrap(~country) + scale_colour_manual(breaks=c('A','B','C','D','E','F','G'), values=c('A'='black','B'='black', 'C'='grey', 'D'='grey','E'='red','F'='grey', 'G'='red'), labels=c('A: Line 1','B: Line 2','C: Line3','D: Line 4', 'E: Line 5','F: Line 6','G: Line 7')) + scale_linetype_manual(breaks=c('A','B','C','D','E','F','G'), values=c('A'='solid','B'='dotted', 'C'='solid','D'='dashed', 'E'='dashed','F'='dotted', 'G'='dotted'), labels=c('A: Line 1','B: Line 2','C: Line3','D: Line 4', 'E: Line 5','F: Line 6','G: Line 7')) + scale_shape_manual(breaks=c('A','B','C','D','E','F','G'), labels=c('A: Line 1','B: Line 2','C: Line3','D: Line 4', 'E: Line 5','F: Line 6','G: Line 7'), values = c(0, 1, 2, 3, 4, 5, 6)) + theme_bw() + geom_abline(aes(intercept = Intercept, slope = Slope, colour = treatment, linetype=treatment), data = lines) In your original, you had different line numbers as labels in scale_colour_manual() and scale_shape_manual(). If you want to maintain that, then you'll have to have separate legends for color and shape. HTH, Dennis On Tue, Nov 8, 2011 at 11:22 AM, Sigrid s.stene...@gmail.com wrote: Hi there, I am having a little problem with combining three scale_manual commands in a facet plot. I am not able to combine the three different characteristics, instead ending up with three different descriptions next to the graph for the same geom. I would like to see two separate labels (not three); one describing lines 1-7 and the other 8-14. For each of the treatments (A-B) I want a combination of color, line type and symbol. How do I do this? Here are my codes (Feel free to modify the example to make it easier to work with. I was not able to do this while keeping the problem I wanted help with) df -structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), treatment = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c(A, B, C, D, E, F, G), class = factor), total = c(135L, 118L, 121L, 64L, 53L, 49L, 178L, 123L, 128L, 127L, 62L, 129L, 126L, 99L, 183L, 45L, 57L, 45L, 72L, 30L, 71L, 123L, 89L, 102L, 60L, 44L, 59L, 124L, 145L, 126L, 103L, 67L, 97L, 66L, 76L, 108L, 36L, 48L, 41L, 69L, 47L, 57L, 167L, 136L, 176L, 85L, 36L, 82L, 222L, 149L, 171L, 145L, 122L, 192L, 136L, 164L, 154L, 46L, 57L, 57L, 70L, 55L, 102L, 111L, 152L, 204L, 41L, 46L, 103L, 156L, 148L, 155L, 103L, 124L, 176L, 111L, 142L, 187L, 43L, 52L, 75L, 64L, 91L, 78L, 196L, 314L, 265L, 44L, 39L, 98L, 197L, 273L, 274L, 89L, 91L, 74L, 91L, 112L, 98L, 140L, 90L, 121L, 120L, 161L, 83L, 230L, 266L, 282L, 35L, 53L, 57L, 315L, 332L, 202L, 90L, 79L, 89L, 67L, 116L, 109L, 44L, 68L, 75L, 29L, 52L, 52L, 253L, 203L, 87L, 105L, 234L, 152L, 247L, 243L, 144L, 167L, 165L, 95L, 300L, 128L, 125L, 84L, 183L, 88L, 153L, 185L, 175L, 226L, 216L, 118L, 118L, 94L, 224L, 259L, 176L, 175L, 147L, 197L, 141L, 176L, 187L, 87L, 92L, 148L, 86L, 139L, 122L), country = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
Re: [R] window?
The ets() function in the forecast package requires either a numeric vector or a Time-Series object (produced from ts()). The frequency argument in ts() refers to the time duration between observations; e.g., frequency = 7 means that the data are weekly; frequency = 12 means that the data are monthly; frequency = 4 means that the data are quarterly. You can see this from the examples on the help page of ts: ?ts at the R prompt. The example associated with the forecast::ets() function uses the USAccDeaths data: data(USAccDeaths) USAccDeaths ## monthly data for six years # Simulate the same structure with ts: u - ts(rnorm(72), start = c(1973, 1), frequency = 12) u # Evidently you want to produce a multivariate series; # here's one way with monthly frequency: v - ts(matrix(rnorm(106), ncol = 2), start = c(2001, 1), frequency = 12) v Is that more or less what you were after? Dennis On Tue, Nov 8, 2011 at 2:04 PM, Kevin Burton rkevinbur...@charter.net wrote: I expect the frequency to be set to what I set it at and the window to return all of the data in the window from the original time series. The error is not because it is prime. I can generate a time series with just 52 values (or 10) and it still occurs. I am building these objects for use with the 'forecast' packages and one of the methods 'ets' cannot handle a frequency above 24 so I set it (or try to) to 1. Will 'window' take z zoo or xts object? Can I convert from zoo or xts to ts? -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: Tuesday, November 08, 2011 2:28 PM To: Kevin Burton Cc: r-help@r-project.org Subject: Re: [R] window? I'm not entirely sure that your request makes sense: what do you expect the frequency to be? It makes sense to me as is...Might your troubles be because 53 is prime? More generally, most people don't like working with the raw ts class and prefer the zoo or xts packages because they are much more pleasant for most time series work. You might want to take a look into those. Michael On Tue, Nov 8, 2011 at 3:18 PM, Kevin Burton rkevinbur...@charter.net wrote: This doesn't seem to work: d - rnorm(2*53) ds - ts(d, frequency=53, start=c(2000,10)) dswin - window(ds, start=c(2001,1), end=c(2001,10), frequency=1) dswin Time Series: Start = 2001 End = 2001 Frequency = 1 [1] 1.779409 dswin - window(ds, start=c(2001,1), end=c(2001,10)) dswin Time Series: Start = c(2001, 1) End = c(2001, 10) Frequency = 53 [1] 1.7794090 0.6916779 -0.6641813 -0.7426889 -0.5584049 -0.2312959 [7] -0.0183454 -1.0026301 0.4534920 0.6058198 The problem is that when the frequency is specified only one value shows up in the window. When no frequency is specified I get all 10 values but now the time series has a frequency that I don't want. Comments? Kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nesting scale_manual caracteristics in ggplot
As I mentioned earlier, if you use different labels/breaks/titles for the legends, you'll get separate legends in a ggplot. I'm assuming you want to keep your original labels in scale_shape_manual(); there's no problem if you do that. The code I provided showed you how to align all three legends; if you want one to be different, change its labels and a second legend will appear. Couldn't be simpler ;) Dennis On Tue, Nov 8, 2011 at 1:20 PM, Sigrid s.stene...@gmail.com wrote: Hi Dennis, Thank you, it looks much better now the three characteristics are combined in the label. However, I would like to keep two different sets of labels on the side as I want to describe lines for each facet (high and low). Do you think this is possible, without breaking up the characteristics again? Cheers -- View this message in context: http://r.789695.n4.nabble.com/nesting-scale-manual-caracteristics-in-ggplot-tp4017171p4017523.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting Panel Data by Time
Here's another approach using the plyr and data.table packages, where df is the name I gave to your example data: # plyr library('plyr') ddply(df, .(TIME), mutate, L1 = sort(S1)) # Another way with the data.table package: library('data.table') dt - data.table(df, key = 'TIME') dt[, list(X1, S1, L1 = sort(S1)), by = 'TIME'] HTH, Dennis On Tue, Nov 8, 2011 at 11:58 AM, economicurtis curtisesj...@gmail.com wrote: I have panel data in the following form: TIME X1 S1 1 1 0.99 1 2 0.50 1 3 0.01 2 3 0.99 2 1 0.99 2 2 0.25 3 3 0.75 3 2 0.50 3 1 0.25 ... ... .. And desire a new vector of observations in which one column (S1 above) is sorted for each second from least to largest. That is, a new vector (L1 below) of the form: TIME X1 S1 L1 1 1 0.99 0.01 1 2 0.50 0.50 1 3 0.01 0.99 2 3 0.99 0.25 2 1 0.99 0.99 2 2 0.25 0.99 3 3 0.75 0.25 3 2 0.50 0.50 3 1 0.25 0.75 ... ... .. . Sorry for the NOOB question, but any help would be great. Curtis Kephart -- View this message in context: http://r.789695.n4.nabble.com/Sorting-Panel-Data-by-Time-tp4017271p4017271.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot separate groups with plotmeans()
Hi: Here are a couple of ways using the plyr and ggplot2 packages: library('plyr') library('ggplot2') # Create a summary data frame that computes the mean # and standard deviation at each time/group combination xsumm - ddply(x, .(Time, Group), summarise, m = mean(Score), s = sd(Score)) # Create a new time variable to appose the two groups at # each time (in case that was the intent): xsumm2 - mutate(xsumm, Time2 = rep(1:5, each = 2)) # Plot 1: Plot all 10 times in one panel, using color to #separate the two groups ggplot(xsumm, aes(x = Time, y = m)) + geom_errorbar(aes(ymin = m - s, ymax = m + s, colour = Group), size = 1) + geom_point(size = 3, colour = 'red') + geom_line(aes(group = 1), colour = 'red', size = 1) + scale_colour_manual(breaks = c('A', 'B'), values = c('blue', 'orange')) + scale_x_continuous(breaks = 1L:10L) + ylab('Score') # Plot 2: Separate the two groups into different panels ggplot(xsumm2, aes(x = Time2, y = m)) + geom_errorbar(aes(ymin = m - s, ymax = m + s), colour = 'blue', size = 1, width = 0.4) + geom_point(size = 3, colour = 'red') + geom_line(aes(group = 1), colour = 'red', size = 1) + labs(x = 'Time', y = 'Score') + facet_wrap(~ Group, ncol = 1) # Plot 3: Put the groups side by side at each time # This one is a little trickier... ggplot(xsumm2, aes(x = Time2, y = m)) + geom_errorbar(aes(ymin = m - s, ymax = m + s, colour = Group), position = position_dodge(width = 0.4), size = 1, width = 0.4) + geom_point(size = 3, colour = 'red', position = position_dodge(width = 0.4)) + geom_line(aes(group = Group), colour = 'red', size = 1, position = position_dodge(width = 0.4)) + labs(x = 'Time', y = 'Score') + scale_colour_manual(breaks = c('A', 'B'), values = c('blue', 'orange')) You might have to be careful about saving the last plot, especially if you want to shrink the size or change the aspect ratio - it might upset the alignment of the bars, lines and points. If you save it at the standard size (7'' x 7'') you should be fine. See the ggsave() function in the ggplot2 package for details (?ggplot2::ggsave once the package is installed). If this is new to you, go to http://had.co.nz/ggplot2/ for the package's web page; the on-line help files (with many examples) start as you scroll toward the bottom of the page. Click on the book's website (linked from the above site) to access the chapter on qplots, the entry level chapter of the ggplot2 book. This would be harder to do in the lattice package because it doesn't have a built-in error bar panel function (by design, I think :). HTH, Dennis On Tue, Nov 8, 2011 at 4:19 PM, Jon Zadra jr...@virginia.edu wrote: Hi, I often use plotmeans() from the gplots package to quickly visualize a pattern of change. I would like to be able to plot separate lines for different groups, but the function gives an error when a grouping variable is included in the formula argument. For instance, require(gplots) x - data.frame(Score=rnorm(100), Time=rep(1:10, 10), Group=factor(rep(c(A,B),50))) plotmeans(Score ~ Time, x) #works fine plotmeans(Score ~ Time * Group, x) Error in .subset2(x, i, exact = exact) : attempt to select more than one element plotmeans(Score ~ Time | Group, x) Error in ns - 1 : non-numeric argument to binary operator In addition: Warning message: In Ops.factor(Time, Group) : | not meaningful for factors The help for plotmeans() states that it accepts a formula object including a grouping variable. There is no other grouping argument listed. Any help is appreciated, including pointing to a more robust function for plotting means. Thanks, Jon -- Jon Zadra Department of Psychology University of Virginia P.O. Box 400400 Charlottesville VA 22904 (434) 982-4744 email: za...@virginia.edu http://www.google.com/calendar/embed?src=jzadra%40gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistric regression: model revision
Since you didn't provide a reproducible example, here are a couple of possibilities to check, but I have utterly no idea if they're applicable to your problem or not: * does costdis1 consist of 0's and 1's? * is costdis1 a factor? In the first model, you treat costdis1 as a pure quadratic and in the second model, it is a linear term. The two models are not nested. Modeling a term as a pure quadratic is a very strong assumption - the more usual practice is to fit both a linear and quadratic term in costdis1 to allow more flexibility in the fitted surface, but that would require costdis1 to be numeric. HTH, Dennis On Mon, Nov 7, 2011 at 7:58 AM, Sally Ann Sims sallys...@earthlink.net wrote: Hello, I am working on fitting a logistic regression model to my dataset. I removed the squared term in the second version of the model, but my model output is exactly the same. Model version 1: GRP_GLM-glm(HB_NHB~elev+costdis1^2,data=glm_1,family=binomial(link=logit)) summary(GRP_GLM) Model version 2: QM_1-glm(HB_NHB~elev+costdis1,data=glm_2,family=binomial(link=logit)) summary(QM_1) The call in version 2 has changed: Call: glm(formula = HB_NHB ~ elev + costdis1, family = binomial(link = logit), data = glm_2) But I’m getting the exact same results as I did in the model where costdis1 is squared. Any ideas what I might do to correct this? Thank you. Sally [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correction in error
Hi: In your function call, x[1, 1] = theta = 0. In the first line of the loop, your rbinom() call works out to be x[2, 1] - rbinom(x[1, 3], 1, x[1, 1]) = rbinom(10, 1, 0) That likely accounts for the error message: Error in x[t, 1] - rbinom(x[t - 1, 3], 1, x[t - 1, 1]) : replacement has length zero HTH, Dennis On Mon, Nov 7, 2011 at 12:10 PM, Gyanendra Pokharel gyanendra.pokha...@gmail.com wrote: Hello R community, following is my code and it shows error, can some one fix this error and explain why this occurs? gibbs -function(m,n, theta = 0, lambda = 1){ alpha - 1.5 beta - 1.5 gamma - 1.5 x- array(0,c(m+1, 3)) x[1,1] - theta x[1,2] - lambda x[1,3]- n for(t in 2:m+1){ x[t,1] - rbinom(x[t-1,3], 1, x[t-1,1]) x[t,2]-rbeta(m, x[t-1,1] + alpha, tx[t-1,3] - x[t-1,1] + beta) x[t,3] - rpois(x[t-1,3] - x[t-1,1],(1 - x[t-1,2])*gamma) } x } gibbs(100, 10) Gyn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation between matrices
Hi: On Sat, Nov 5, 2011 at 11:06 PM, Kaiyin Zhong kindlych...@gmail.com wrote: Thank you Dennis, your tips are really helpful. I don't quite understand the lm(y~mouse) part; my intention was -- in pseudo code -- lm(y(Enzyme) ~ y(each elem)). As I said in my first response, I didn't quite understand what you were trying to regress so I used the mouse as a way of showing you how the code works. I think I understand what you want now, though. I'll create a data set in two ways: the first assumes you have the data as constructed in your original post and the second generates random numbers after erecting a 'scaffold' data frame. The game is to separate the enzyme data from the element data and put them into the final data frame as separate columns. Then the regression is easy if that's what you need to do. # Method 1: Generate the data as you did into separate data frames elem0 - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') # Creates five 5 x 5 data frames with names V1-V5: for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) { assign(n, as.data.frame(replicate(5, rnorm(5 } # Stack the chemical element data using melt() from # the reshape2 package: library('reshape2') d1 - rbind(melt(Cu), melt(Zn), melt(Fe), melt(Ca)) # Relabel V1 - V5 with brain region names, add a factor # to distinguish individual elements and tack on the melted # Enzyme data so that it repeats in each element block d1 - within(d1, { variable - factor(d1$variable, labels = regions) elem - factor(rep(elem0[1:4], each = 25)) Enzyme - melt(Enzyme)[, 2] } ) # Plot the data using lattice and latticeExtra: library('lattice') library('latticeExtra') p - xyplot(Enzyme ~ value | variable + elem, data = d1, type = c('p', 'r')) useOuterStrips(p) ### ## Method 2: Generate the random data after setting ## up the element/region/mouse combinations ## # Generate a data frame from the combinations of # mice, regions and elem: library('ggplot2') mice - paste('mouse', 1:5, sep = '') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') elem - elem0[1:4] d0 - data.frame(expand.grid(mice = mice, regions = regions, elem = elem)) d0 - within(d0, { value - rnorm(100) # generate element values Enzyme - rnorm(25) # generate enzyme values } ) # the Enzyme values are recycled through all element blocks. # You can either adapt the lattice code above to plot d0, or you # can do the following to get an analogous plot in ggplot2. # It's easier to compute the slopes and intercepts and put # them into a data frame that ggplot() can import, so that's # what we'll do first. # A function to return regression coefficients from a # generic data frame. Since this function goes into ddply(), # the argument df is a (generic) data frame and the output # will be converted to a one-line data frame. coefun - function(df) coef(lm(Enzyme ~ value, data = df)) # Apply the function to all regions * elem combinations. # Output is a data frame of coefficients corresponding to # each region/element combination coefs - ddply(d0, .(regions, elem), coefun) # Rename the columns names(coefs) - c('regions', 'elem', 'b0', 'b1') # Generate the plot using package ggplot2: ggplot(d0, aes(x = val, y = Enzyme)) + geom_point(size = 2.5) + geom_abline(data = coefs, aes(intercept = b0, slope = b1), size = 1) + xlab() + facet_grid(elem ~ regions) In addition, attach(d) seems necessary before using lm(y~mouse), and since d$mouse has a length 125, while each elem for each region has a length 5, it generates the following error: You should never need to use attach() - use the data = argument in lm() instead, where the value of data is the name of a data frame. It's always easier to use the modeling functions in R having formula interfaces with data frames. coefs = ddply(d, .(regions, elem), coefun) Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) : variable lengths differ (found for 'mouse') You're clearly doing something here that's messing up the structure of the data. Study what the code (and its output) above are telling you, particularly if you're not familiar with plyr, lattice and/or ggplot2. Writing functions to insert into a **ply() function in plyr can be tricky. If you continue to have problems, please provide a reproducible example as you did here. HTH, Dennis On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: I don't think you want to keep these objects separate; it's better to combine everything into a data frame. Here's a variation of your example - the x variable ends up being a mouse, but you may have another variable that's more appropriate to plot so take
Re: [R] Correlation between matrices
Hi: I don't think you want to keep these objects separate; it's better to combine everything into a data frame. Here's a variation of your example - the x variable ends up being a mouse, but you may have another variable that's more appropriate to plot so take this as a starting point. One plot uses the ggplot2 package, the other uses the lattice and latticeExtra packages. library('ggplot2') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') elem - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme') # Generate a data frame from the combinations of # mice, regions and elem: d - data.frame(expand.grid(mice = mice, regions = regions, elem = elem), y = rnorm(125)) # Create a numeric version of mice d$mouse - as.numeric(d$mice) # A function to return regression coefficients coefun - function(df) coef(lm(y ~ mouse), data = df) # Apply to all regions * elem combinations coefs - ddply(d, .(regions, elem), coefun) names(coefs) - c('regions', 'elem', 'b0', 'b1') # Generate the plot using package ggplot2: ggplot(d, aes(x = mouse, y = y)) + geom_point(size = 2.5) + geom_abline(data = coefs, aes(intercept = b0, slope = b1), size = 1) + facet_grid(elem ~ regions) # Same plot in lattice: library('lattice') library('latticeExtra') p - xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'), layout = c(5, 5)) HTH, Dennis On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong kindlych...@gmail.com wrote: regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) { + assign(n, as.data.frame(replicate(5, rnorm(5 + } names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) = row.names(Enzyme) = mice Cu cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.5436573 -0.31486713 0.1039148 -0.3908665 -1.0849112 mouse2 1.4559136 1.75731752 -2.1195118 -0.9894767 0.3609033 mouse3 -0.6735427 -0.04666507 0.9641000 0.4683339 0.7419944 mouse4 0.6926557 -0.47820023 1.3560802 0.9967562 -1.3727874 mouse5 0.2371585 0.20031393 -1.4978517 0.7535148 0.5632443 Zn cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.66424043 0.6664478 1.1983546 0.0319403 0.41955740 mouse2 -1.14510448 1.5612235 0.3210821 0.4094753 1.01637466 mouse3 -0.85954416 2.8275458 -0.6922565 -0.8182307 -0.06961242 mouse4 0.03606034 -0.7177256 0.7067217 0.2036655 -0.25542524 mouse5 0.67427572 0.6171704 0.1044267 -1.8636174 -0.07654666 Fe cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.8337008 2.0884261 0.29730413 -1.6884804 0.8336137 mouse2 -0.2734139 -0.5728439 0.63791556 -0.6232828 -1.1352224 mouse3 -0.4795082 0.1627235 0.21775206 1.0751584 -0.5581422 mouse4 1.7125147 -0.5830600 1.40597896 -0.2815305 0.3776360 mouse5 -0.3469067 -0.4813120 -0.09606797 1.0970077 -1.1234038 Ca cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.7663354 0.8595091 1.33803798 -1.17651576 0.8299963 mouse2 -0.7132260 -0.2626811 0.08025079 -2.40924271 0.7883005 mouse3 -0.7988904 -0.1144639 -0.65901136 0.42462227 0.7068755 mouse4 0.3880393 0.5570068 -0.49969135 0.06633009 -1.3497228 mouse5 1.0077684 0.6023264 -0.57387762 0.25919461 -0.9337281 Enzyme cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.3430936 0.5335819 -0.56992947 1.3565803 -0.8323391 mouse2 1.0520850 -1.0201124 0.8965 1.4719880 1.0854768 mouse3 -0.2802482 0.6863323 -1.37483570 -0.7790174 0.2446761 mouse4 -0.1916415 -0.4566571 1.93365932 1.3493848 0.2130424 mouse5 -1.0349593 -0.1940268 -0.07216321 -0.2968288 1.7406905 In each anatomic region, I would like to calculate the correlation between Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and do a scatter plot with a tendency line, organizing those plots into a grid. See the image below for the desired effect: http://postimage.org/image/62brra6jn/ How can I achieve this? Thank you in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
Hi: You're doing the right thing in R by pre-allocating memory for the result, but ifelse() is a vectorized function and your loop is operating elementwise, so if-else is more appropriate. Try for (i in 2:100){ b_vec[i] - if(abs(b_vec[i-1] + a_vec[i]) 1) a_vec[i] else b_vec[i-1] + a_vec[i] } If speed is an issue, then I echo Michael's suggestion to write a C(++) function and call it within R. The inline package is good for this kind of thing. HTH, Dennis On Thu, Nov 3, 2011 at 12:10 PM, hihi v.p.m...@freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) (The behaviour is like cumsum's, but when the value would excess 1.0 then it has another value from a_vec.) Is it possible to make this faster? I experienced that my way is even slower than in Excel! Programming in C would my last try... Any suggestions? Than you, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round up a number to 10^4
Works in the newly released 2.14.0: X = c(60593.23, 71631.17, 75320.1) round(X, -4) [1] 6 7 8 Dennis On Tue, Nov 1, 2011 at 10:07 AM, Wendy wendy2.q...@gmail.com wrote: Hi all, I have a list of numbers, e.g., X = c(60593.23, 71631.17, 75320.1), and want to round them so the output is Y = c(6, 8, 8). I tried Y-round(X,-4), but it gives me Y = c(6, 7, 8). Do anybody know how to round up a number to 10^4? Thank you in advance. Wendy -- View this message in context: http://r.789695.n4.nabble.com/round-up-a-number-to-10-4-tp3964394p3964394.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removal/selecting specific rows in a dataframe conditional on 2 columns
Does this work? library('plyr') # Function to return a data frame if it has one row, else return NULL: f - function(d) if(nrow(d) == 1L) d else NULL ddply(RV09, .(set, month), f) record.t trip set month stratum NAFO unit.area time dur.set distance 15 913 110 351 3O R31 1044 179 25 913 310 340 3O Q31 1800 189 35 913 410 340 3O Q32 2142 179 ddply() is an apply-like function that takes a data frame as input and a data frame as output (hence the dd). The first argument is the data frame name, the second argument the set of grouping variables and the third is the function to be called (in this application). HTH, Dennis On Tue, Nov 1, 2011 at 10:16 AM, Aurelie Cosandey Godin god...@dal.ca wrote: Dear list, After reading different mails, blogs, and tried a few different codes without any success, I am asking your help! I have the following data frame where each row represent a survey unit with the following variables: names(RV09) [1] record.t trip set month stratum NAFO [7] unit.area time dur.set distance operation mean.d [13] min.d max.d temp.d slat slong spp [19] number weight elat elong Each survey unit generates one set record, denoted by a 5 in column record.t. Each species identified in this particular survey unit generates an additional set record, denoted by a 6. unique(RV09$record.t) [1] 5 6 Each survey unit are identified by a specific trip and set number, so if there is a 5 record type with no associated 6 records, it means that no species were observed in that survey unit. I would like to be able to select all and only these survey units, which represent my zeros. So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of my zeros data.frame as they appear with no record.t 6, such that no species were observed in this survey unit. head(RV09) record.t trip set month stratum NAFO unit.area time dur.set distance 585 5 913 1 10 351 3O R31 1044 17 9 586 5 913 2 10 351 3O R31 1440 17 9 587 6 913 2 10 351 3O R31 1440 17 9 588 5 913 3 10 340 3O Q31 1800 18 9 589 5 913 4 10 340 3O Q32 2142 17 9 Any tips on how extract this zero data.frame in R? Thank you very much in advance! Best, ~Aurelie Aurelie Cosandey-Godin Ph.D. student, Department of Biology Industrial Graduate Fellow, WWF-Canada Dalhousie University | Email: god...@dal.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting entries to create a new table
Hi: After cleaning up your data, here's one way using the plyr and reshape packages: d - read.csv(textConnection( Individual, A, B, C, D Day1, 1,1,1,1 Day2, 1,3,4,2 Day3, 3,,6,4), header = TRUE) closeAllConnections() d library('plyr') library('reshape') # Stack the variables dm - melt(d, id = 'Individual') # Convert the new value column to factor as follows: dm$value - factor(dm$value, levels = c(1:7, NA), exclude = NULL) # Use ddply() in conjunction with tabulate(): ddply(dm, .(variable), function(d) tabulate(d$value, nbins = 8)) variable V1 V2 V3 V4 V5 V6 V7 V8 1A 2 0 1 0 0 0 0 0 2B 1 0 1 0 0 0 0 1 3C 1 0 0 1 0 1 0 0 4D 1 1 0 1 0 0 0 0 This returns a data frame. If you want a matrix instead, use the daply() function rather than ddply() and leave everything else the same. HTH, Dennis On Tue, Nov 1, 2011 at 1:05 PM, Empty Empty phytophthor...@yahoo.com wrote: Hi, I am an R novice and I am trying to do something that it seems should be fairly simple, but I can't quite figure it out and I must not be using the right words when I search for answers. I have a dataset with a number of individuals and observations for each day (7 possible codes plus missing data) So it looks something like this Individual A, B, C, D Day1 1,1,1,1 Day 2 1,3,4,2 Day3 3,,6,4 (I've also tried transposing it so that individuals are rows and days are columns) I want to summarize the total observation codes by individual so that I end up with a table something like this: , 1, 2 ,3 ,4, 5, 6,7, missing A 2,0,1,0,0,0,0,0 B 1,0,1,0,0,0,0,1 C 1,0,0,1,0,1,0,0 D 1,1,0,1,0,0,0,0 If I can get that I'll be happy! Part two is subsetting which days I include in the counts so that I could, say look at days 1-30 and 30-60 separately - create two different tables from the same original table. (Or I can do it manually and start with different subsets of the data). Thanks so much for any help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict for a cv.glmnet returns an error
Hi: Here's what I got when I ran your code: library('glmnet') x=matrix(rnorm(100*20),100,20) y=rnorm(100) cv.fit=cv.glmnet(x,y) predict(cv.fit,newx=x[1:5,]) 1 [1,] 0.1213114 [2,] 0.1213114 [3,] 0.1213114 [4,] 0.1213114 [5,] 0.1213114 coef(cv.fit) 21 x 1 sparse Matrix of class dgCMatrix 1 (Intercept) 0.1213114 V1 0.000 V2 0.000 V3 0.000 V4 0.000 V5 0.000 V6 0.000 V7 0.000 V8 0.000 V9 0.000 V10 0.000 V11 0.000 V12 0.000 V13 0.000 V14 0.000 V15 0.000 V16 0.000 V17 0.000 V18 0.000 V19 0.000 V20 0.000 ### Check against the versions of the packages listed below: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] glmnet_1.7.1 Matrix_1.0-1 lattice_0.20-0 ggplot2_0.8.9 proto_0.3-9.2 [6] reshape_0.8.4 plyr_1.6 loaded via a namespace (and not attached): [1] tools_2.14.0 Dennis On Tue, Nov 1, 2011 at 3:34 PM, asafw assafweinst...@gmail.com wrote: Hi there, I am trying to use predict() with an object returned by cv.glmnet(), and get the following error: no applicable method for 'predict' applied to an object of class cv.glmnet What's wrong? my code: x=matrix(rnorm(100*20),100,20) y=rnorm(100) cv.fit=cv.glmnet(x,y) predict(cv.fit,newx=x[1:5,]) coef(cv.fit) Thanks so much, Asaf -- View this message in context: http://r.789695.n4.nabble.com/predict-for-a-cv-glmnet-returns-an-error-tp3965744p3965744.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] building a subscript programatically
Here's a hack, but perhaps you might want to rethink what type of output you want. # Function: g - function(arr, lastSubscript = 1) { n - length(dim(arr)) commas - paste(rep(',', n - 1), collapse = '') .call - paste('arr[', commas, lastSubscript, ']', sep = '') eval(parse(text = .call)) } # Examples: a1 - array(1:8, c(2, 2, 2)) a2 - array(1:16, c(2, 2, 2, 2)) a3 - array(1:32, c(2, 2, 2, 2, 2)) g(a1, 2) g(a2, 2) g(a3, 2) Notice the subscripting in the last two examples - if you only want one submatrix returned, then try this: h - function(arr, lastSubscript = c(1)) { n - length(dim(arr)) subs - if(length(lastSubscript) 1) paste(lastSubscript, collapse = ',') else lastSubscript .call - paste('arr[,,', subs, ']', sep = '') eval(parse(text = .call)) } h(a2, c(1, 1)) h(a3, c(2, 1, 1)) These functions have some ugly code, but I think it does what you were looking for. Hopefully someone can devise a more elegant solution. Dennis HTH 2011/11/1 Ernest Adrogué nfdi...@gmail.com: Hi, On ocasion, you need to subscript an array that has an arbitrary (ie. not known in advance) number of dimensions. How do you deal with these situations? It appears that it is not possible use a list as an index, for instance this fails: x - array(NA, c(2,2,2)) x[list(TRUE,TRUE,2)] Error in x[list(TRUE, TRUE, 2)] : invalid subscript type 'list' The only way I know is using do.call() but it's rather ugly. There must be a better way!! do.call('[', c(list(x), TRUE, TRUE, 2)) [,1] [,2] [1,] NA NA [2,] NA NA Any idea? Regards, Ernest __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] jarquebera_test_results
Hi: I'd double check the form of the upper bound of the sequence. In the first case (i = 1), it uses the subvector loghozamok[1:220]; when i = 2, it uses the subvector loghozamok[21:260], etc. In the last case, it uses loghozamok[1181:1420]. If instead you want to split the vector into 60 groups of size 20, then this is easier (assuming that loghozamok has length 1200): gp - rep(1:60, each = 20) # Returns a vector do.call(c, lapply(split(loghozamok, gp), function(z) unname(tseries::jarque.bera.test(z)$p.value))) This works when gp and loghozamok have the same length. # Small test: x - rnorm(100) gp - rep(1:5, each = 20) do.call(c, lapply(split(x, gp), function(z) unname(tseries::jarque.bera.test(z)$p.value))) 1 2 3 4 5 0.5849843 0.6745735 0.9453412 0.5978477 0.7207138 HTH, Dennis On Sun, Oct 30, 2011 at 10:23 AM, Szűcs Ákos szucsa...@t-online.hu wrote: Hi! I got a loop where i print out the results of Jarque Bera tests, but I have to put, the p-values in a vector. Can you help me how to do it in an effective way and not just typing in the results to a vector? Thanks a lot, here is the code: for(i in 1:60){ print(jarque.bera.test(loghozamok[((20*(i-1))+1):(20*(i+11))]))} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns
Hi: Here are a few ways (untested, so caveat emptor): # plyr package library('plyr') ddply(df, .(Plant, Tissue, Gene), summarise, ntest = shapiro.test(ExpressionLevel)) # data.table package library('data.table') dt - data.table(df, key = 'Plant, Tissue, Gene') dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)] # aggregate() function aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN = shapiro.test) # doBy package: summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN = shapiro.test) There are others, too... HTH, Dennis 2011/10/30 Joel Fürstenberg-Hägg jo...@life.ku.dk: Dear R users, I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column. head(df) ID Plant Tissue Gene ExpressionLevel 1 1 p1 t1 g1 366.53 2 2 p1 t1 g2 0.57 3 3 p1 t1 g3 11.81 4 4 p1 t2 g1 498.43 5 5 p1 t2 g2 2.14 6 6 p1 t2 g3 7.85 I would like to make the tests on every group according to the content of the Plant, Tissue and Gene columns. My first problem is how to run a function for all these sub groups. I first thought of making subsets: group1 - subset(df, Plant==p1 Tissue==t1 Gene==g1) group2 - subset(df, Plant==p1 Tissue==t1 Gene==g2) group3 - subset(df, Plant==p1 Tissue==t1 Gene==g3) group4 - subset(df, Plant==p1 Tissue==t2 Gene==g1) group5 - subset(df, Plant==p1 Tissue==t2 Gene==g2) group6 - subset(df, Plant==p1 Tissue==t2 Gene==g3) etc... But that would be very time consuming and I would like to be able to use the code for other data frames... I have also tried to store these in a list, which I am looping through, running the tests, something like this: alist=list(group1, group2, group3, group4, group5, group6) for(i in alist) { print(shapiro.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel)) print(pearson.test(i$ExpressionLevel, adjust=FALSE)) } But, there must be an easier and more elegant way of doing this... I found the example below at http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop. I think might be used for the printing of the results, but I do not know how to adjust for my data frame, since the functions are applied on several columns instead of certain rows in one column. DF - data.frame(A = rnorm(100), B = rlnorm(100)) obj2 - lapply(DF, shapiro.test) tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = x$p.value)) tab2 - data.frame(do.call(rbind, tab2)) printCoefmat(tab2, has.Pvalue = TRUE) Finally, I have found several different functions for testing for normality, but which one(s) should I choose? As far as I can see in the help files they only differ in the minimum number of samples required. Thanks in advance! Kind regards, Joel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster process for creating a matrix based on matrix element comparison
dat [,1] [,2] [,3] [1,]001 [2,]010 [3,]011 [4,] NA11 1 - is.na(dat) [,1] [,2] [,3] [1,]111 [2,]111 [3,]111 [4,]011 D. On Fri, Oct 28, 2011 at 11:40 AM, Evgenia ev...@aueb.gr wrote: I have matrix data data-matrix(cbind(c(0,0,0),c(NA,0,1),c(1,1,1),c(0,1,1)),ncol=3) and I want to create a new matrix by checking each element of the data and put value 0 if i have NA and 1 otherwise. For this reason i made the function below pdata-matrix(NA,ncol=ncol(data),nrow=nrow(data)) pdata-sapply(1:nrow(data),function(i) sapply (1:ncol(data),function(j) if (is.na(data[i,j])) {pdata[i,j]-0} else {pdata[i,j]-1} )) pdata-matrix(t(pdata),ncol=ncol(data),nrow=nrow(data)) with pdata pdata [,1] [,2] [,3] [1,] 1 1 1 [2,] 1 1 1 [3,] 1 1 1 [4,] 0 1 1 But in my case I have a matrix with 5 rows and 10 colums and the creation of pdata takes alot of time. Could anyone have any suggestion to make pdata faster? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Faster-process-for-creating-a-matrix-based-on-matrix-element-comparison-tp3948841p3948841.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] show.ci='FALSE' ignored in simple.lm
Hi: A trip to package sos reveals that simple.lm() is in the UsingR package. Looking at the code of this function, it plots the (x, y) pairs and the fitted least squares line without an option to suppress the plot. Here's a slight hack of the function; it adds a new argument plot with default action TRUE; after the linear model is fit, an if statement tests to see if the plot is desired; if FALSE, it simply returns the (print method of the) fitted object. simple.lm2 - function (x, y, plot = TRUE, show.residuals = FALSE, show.ci = FALSE, conf.level = 0.95, pred = FALSE) { op - par() ord - order(x) x - x[ord] y - y[ord] tmp.lm - lm(y ~ x) if(!plot) return(tmp.lm) else { if (show.residuals) { par(mfrow = c(2, 2)) } plot(x, y) abline(tmp.lm) if (show.ci) { xvals - seq(min(x), max(y), length = 15) curve(predict(tmp.lm, data.frame(x = x), level = conf.level, interval = confidence)[, 3], add = TRUE) curve(predict(tmp.lm, data.frame(x = x), level = conf.level, interval = confidence)[, 2], add = TRUE) curve(predict(tmp.lm, data.frame(x = x), level = conf.level, interval = prediction)[, 3], lty = 3, add = TRUE) curve(predict(tmp.lm, data.frame(x = x), level = conf.level, interval = prediction)[, 2], lty = 3, add = TRUE) } coeffs - floor(tmp.lm$coeff * 100 + 0.5)/100 plusorminus - c(+) if (coeffs[1] 0) plusorminus - c() title(paste(y = , coeffs[2], x , plusorminus, coeffs[1])) if (show.residuals) { Fitted - fitted.values(tmp.lm) Residuals - residuals(tmp.lm) plot(Fitted, Residuals) abline(h = 0) title(Residuals vs. fitted) hist(Residuals, main = hist of residuals) qqnorm(Residuals, main = normal plot of residuals) qqline(Residuals) } if (pred) { print(predict(tmp.lm, data.frame(x = pred))) } } tmp.lm } # # Simple test: plot(x - 1:10) y - 5*x + rnorm(10,0,1) tmp1 - simple.lm2(x, y, plot = FALSE) summary(tmp1) # Produces the plot simple.lm2(x, y) HTH, Dennis On Thu, Oct 27, 2011 at 7:35 AM, David Winsemius dwinsem...@comcast.net wrote: On Oct 27, 2011, at 6:58 AM, Elinor Zeev wrote: Hi, I am using PostgreSQL 9.1 and want to run loglinear_simple-simple.lm(x$p_overPnot,x$logQuantity,show.ci =FALSE,conf.level=confidenceLevel) without showing the graph, however the show.ci=FALSE is ignored. My guess (and it is a guess since you did not specify a package for simple.lm) is that specifying a conf.level is over-riding the show.ci parameter. Why would one specify a value for conf.level if one did not want a confidence level? I have quite few packages loaded and I still get: ?simple.lm No documentation for 'simple.lm' in specified packages and libraries: you could try '??simple.lm' -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with parallel processing code
Did you load the class package before calling lda()? Dennis On Thu, Oct 27, 2011 at 10:14 AM, 1Rnwb sbpuro...@gmail.com wrote: my modification gives me error rows- c(1:nrow(mat)) scores - c() labels -c() itr-1000 chnksz-ceiling(itr/getDoParWorkers()) smpopt=list(chunkSize=chnksz) foreach(icount(itr),.combine=cbind,.options.smp=smpopts)%dopar% + { + train - sample(rows, length(rows)-1) + label =0 ; if (mat$Self_T1D[-train] == N) label = 1 #need the value for this line, should it be 'N' or 'Y' + z - lda(mat[train,4] ~ mat[train,1:3]) + score = predict(z, mat[-train, ])$pos[1] + scores - c(scores, score) + labels- c(labels, label) + } Error in { : task 1 failed - could not find function lda -- View this message in context: http://r.789695.n4.nabble.com/help-with-parallel-processing-code-tp3944303p3945243.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Syntax Check: rshape2 melt()
Try this, based on your small example: tds.a - read.table(textConnection( + site sampdate param quant + 1UDS-O 2006-12-06 TDS 10800 + 4 STC-FS 1996-06-14 Cond 280 + 7UDS-O 2007-10-04Mg 1620 + 9UDS-O 2007-10-04 SO4 7580 + 19 JCM-10B 2007-06-21Ca79 + 20 JCM-10B 2007-06-21Cl 114), header = TRUE, stringsAsFactors = FALSE) closeAllConnections() # Define param so that all of its levels are represented: tds.a - within(tds.a, { param = factor(param, levels = c('TDS', 'Cond', 'Mg', 'Ca', 'Cl', 'Na', 'SO4')) sampdate = as.Date(sampdate) } ) library('reshape2') dcast(tds.a, site + sampdate ~ param, value_var = 'quant') # Result: site sampdate TDS Cond Mg Ca Cl SO4 1 JCM-10B 2007-06-21NA NA NA 79 114 NA 2 STC-FS 1996-06-14NA 280 NA NA NA NA 3 UDS-O 2006-12-06 10800 NA NA NA NA NA 4 UDS-O 2007-10-04NA NA 1620 NA NA 7580 HTH, Dennis On Thu, Oct 27, 2011 at 8:26 AM, Rich Shepard rshep...@appl-ecosys.com wrote: This is my first excursion into using reshape2 and I want to ensure that the melt() function call is syntactically correct. The unmodifed data frame is organized this way: head(tds.anal) site sampdate param quant 1 UDS-O 2006-12-06 TDS 10800 4 STC-FS 1996-06-14 Cond 280 7 UDS-O 2007-10-04 Mg 1620 9 UDS-O 2007-10-04 SO4 7580 19 JCM-10B 2007-06-21 Ca 79 20 JCM-10B 2007-06-21 Cl 114 What I want looks like this: site sampdate TDS Cond Mg Ca Cl Na SO4 UDS-O 2006-12-06 10800 NA 1620 NA NA NA 7580 with the actual data for each param, of course. I've read the reshape.pdf, reshape2.pdf, the ?melt help page, and the ?melt.data.frame help page. I'm still unclear on the differences among measure.vars, variable.name, and value.name. After several attempts I have what may be what the melted tds.anal should look like: m.tds.anal - melt(tds.anal, id.vars = c('site', 'sampdate', 'param'), \ measure.vars = 'quant', value.name = 'quant', na.rm = F) head(m.tds.anal) site sampdate param variable quant 1 UDS-O 2006-12-06 TDS quant 10800 2 STC-FS 1996-06-14 Cond quant 280 3 UDS-O 2007-10-04 Mg quant 1620 4 UDS-O 2007-10-04 SO4 quant 7580 5 JCM-10B 2007-06-21 Ca quant 79 6 JCM-10B 2007-06-21 Cl quant 114 Is the melt() function call correct? Should the melted result look like the unmelted (long form in Paul Dalgaard's book) data with the additional 'variable' column containing 'quant' for each row? Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with a scatter plot
Hi: The sort of thing you appear to want is fairly straightforward to in lattice and ggplot2, as both have ways to automate conditioning plots. Since you were looking at ggpot2, let's consider that problem. You don't really show enough data to provide a useful demonstration, but let's see if we can capture the essential structure. I want to create a plot where the colors of the hits represent the Product (A,B,C..), the character represents the color (X for yellow, box for green, etc..), the X axis is the price and the Y axis is the number (0-5) from the different Stores (A,B,C,D). I've thought either to create a matrix of 4 plots ( for the 4 stores) or in some creative way combine them into one plot? The first step is to melt the data so that Store becomes a factor variable and its corresponding values are assigned to another variable. To that end, one can invoke the very useful melt() function in the reshape2 package: library('reshape2') mdata - melt(mydata, id = c('Product', 'Color', 'Price')) This creates a new data frame with variables Product, Color, Price, variable and value. variable contains StoreA, ... StoreD as factor levels and value is a numeric variable consisting of the corresponding values. For its structure, see str(mdata) If you want to change StoreA - StoreD to A - D, say, then you could optionally do mdata$variable - factor(mdata$variable, labels = LETTERS[1:4]) Assuming that you've done enough reading to understand what aesthetics are about, the problem is essentially this: x = Price y = value shape = Color faceting variable = variable (the stores) So a template of a ggplot2 graph for the melted data might look something like library('ggplot2') ggplot(mdata, aes(x = Price, y = value, shape = Color)) + geom_point() + facet_wrap( ~ variable, ncol = 2) + scale_shape_manual('Color', breaks = levels(mdata$Color), values = c(4, 0, 2, 8), labels = c('Blue', 'Green', 'Red', 'Yellow')) This assigns x, box, triangle and asterisk as shapes via their numeric codes (see Hadley Wickham's ggplot2 book, p. 197 for the reference). The labels = argument lets you change the letters B, G, R, Y (which would comprise the default labels) to something more evocative. If for some odd reason you wanted to add corresponding colors to the shapes, you could also do that, as follows: ggplot(mdata, aes(x = Price, y = value, shape = Color, colour = Color)) + geom_point() + facet_wrap( ~ variable, ncol = 2) + scale_shape_manual('Color', breaks = levels(mdata$Color), values = c(4, 0, 2, 8), labels = c('Blue', 'Green', 'Red', 'Yellow')) + scale_colour_manual('Color', breaks = levels(mdata$Color), values = c('blue', 'green', 'red', 'yellow'), labels = c('Blue', 'Green', 'Red', 'Yellow')) This should color the shapes in the graph and provide one (merged) legend with colored shapes as symbols. HTH, Dennis On Tue, Oct 25, 2011 at 11:48 PM, RanRL rnr...@gmail.com wrote: Hi everyone, I have some data about a market research which I want to arrange in one plot for easy viewing, the data looks something like: Product Color StoreA StoreB StoreC StoreD Price ProdA R NA 4.33 2 4.33 35 G NA 4.33 2 4.33 35 B NA 4.33 2 3.76 58 Y NA 3.72 3 5.33 23 ProdB B 5.44 NA 4.22 3.76 87 ProdC G 4.77 3.22 4.77 2.10 65 B ... ... ... ... .. And so on... I want to create a plot where the colors of the hits represent the Product (A,B,C..), the characther represent the color (X for yellow, box for green, etc..), the X axis is the price and the Y axis is the number (0-5) from the different Stores (A,B,C,D). I've thought either to create a matrix of 4 plots ( for the 4 stores) or in some creative way combine them into one plot? Please help me or point me in the right direction as to which functions to look into, I've been playing around with ggplot for a few days, but can't seem to wrap my head around it yet... Thanks -- View this message in context: http://r.789695.n4.nabble.com/Help-with-a-scatter-plot-tp3939585p3939585.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] dotPlot with diagonal
Let's see: there is a dotPlot() function in each of the following packages: BHH2, caret, mosaic, qualityTools Would you be kind enough to share which of these packages (if any) you are using? Dennis On Wed, Oct 26, 2011 at 4:25 AM, Jörg Reuter jo...@reuter.at wrote: Hi, I want draw a dotPlot. All works fine: (Seq - matrix(c(1, 1, 6, 1, 2, 2, 5, 4, 3, 3, 4, 3, 4, 4, 3, 2, 5, 5, 2, 5, 6, 6, 1, 6), ncol = 6)) dotPlot(Seq[1,], Seq[2,], main = Sequenz 1 und Sequenz 2, asp = 1) Is there a way to draw a small diagonal, begin at (0/0) to (6/6) (perhaps in red??) or must I use gimp? I have many dotPlots, so it is fine if R can do this. Thanks Joerg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.