[R] overlay geom_contour to ggmap
consider this reproducible example # set the origin of the grid # in cartesian coordinates (epsg 32632) xmin<-742966 ymin<-5037923 # set x and y axis x<-seq(xmin, xmin+25*39, by=25) y<-seq(ymin, ymin+25*39, by =25) # define a 40 x 40 grid mygrid<-expand.grid(x = x, y = y) # set the z value to be interpolated by the contour set.seed(123) mygrid$z<- rnorm(nrow(mygrid)) library(tidyverse) # plot of contour is fine ggplot(data=mygrid, aes(x=x,y=y,z=z))+ geom_contour() library(ggspatial) # transform coordinates to wgs84 4326# (one of the possible many other ways to do it) mygrid_4326<-xy_transform(mygrid$x, mygrid$y, from = 32632, to = 4326) # create new grid with lon and lat # (geographical coordinates espg 4326) mygrid_4326<-mygrid_4326%>% mutate(z=mygrid$z) # define the bounding box my_bb<-c(min(mygrid_4326$x), min(mygrid_4326$y), max(mygrid_4326$x), max(mygrid_4326$y))names(my_bb)<-c('left', 'bottom', 'right', 'top') library(ggmap) # get the background map (by a free provider) mymap<-get_stamenmap(bbox = c(left = my_bb[['left']], bottom = my_bb[['bottom']], right = my_bb[['right']], top = my_bb[['top']]), zoom = 15, maptype = 'toner-lite') # plot of the map is fine mymap%>% ggmap() # overlay the contour of z is failing mymap%>% ggmap()+ #geom_contour(data=mygrid_4326, mapping=aes(x = x, y = y, z = z)) stat_contour(data=mygrid_4326, mapping=aes(x = x, y = y, z = z)) Warning messages:1: stat_contour(): Zero contours were generated 2: In min(x) : no non-missing arguments to min; returning Inf3: In max(x) : no non-missing arguments to max; returning -Inf the problem here is the overlay of the contour plot made with ggplot to a base map made with ggmap any help? thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interpretation of R output for exact permutation test
given this reproucible example library(coin) independence_test(asat ~ group, data = asat, ## exact null distribution distribution = "exact") I'm wondering why the default results are reporting also the critical value Z by considering that this method is supposed to be "exact", i.e. computing the direct probability: pvalue(independence_test(asat ~ group, data = asat, ## exact null distribution distribution = "exact")) my question is: what is the correct interpretation (if it exists at all) of the Z value printed out by the 'plain' function 'independence_test' when it is asked for an 'exact' test? am I completely out of track? sorry but I'm here missing the point somewhere, somehow... thank you for the feedback [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract and re-arrange components of data frame
thank you for your reply well, you are resorting to a supposed order of i which is not necessary the case, and in fact is not in mine... consider this example, please d<-data.frame(i=c(8,12,3), s=c('97,918,19','103,1205', '418'), stringsAsFactors = FALSE) d Da: "Bert Gunter" A: "Massimo Bressan" Cc: "r-help" Inviato: Martedì, 12 giugno 2018 16:42:18 Oggetto: Re: [R] extract and re-arrange components of data frame You mean like this? > s.new <-with(d, as.numeric(unlist(strsplit(s,"," > s.new <- cut(s.new,breaks = c(0,100,110,200),lab = d$i) > s.new [1] 1 1 1 2 2 3 Levels: 1 2 3 (Obviously, this could be a one-liner) See ?cut Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jun 12, 2018 at 6:32 AM, Massimo Bressan < massimo.bres...@arpa.veneto.it > wrote: # considering this data.frame as a reproducible example d<-data.frame(i=c(1,2,3), s=c('97,98,99','103,105', '118'), stringsAsFactors = FALSE) d #I need to get this final result r<-data.frame(i=c(1,1,1,2,2,3), s=c(97, 98, 99, 103, 105, 118)) r #this is my attempt #number of components for each element (3) of the list #returned by strsplit n<-unlist(lapply(strsplit(d$s,','), length)) #extract components of all elements of the list s<-cbind(unlist(strsplit(d$s,','))) #replicate each element of i #by the number of components of each element of the list i<-rep(d$i, n) i #compose final result r_final<-data.frame(i,s, stringsAsFactors = FALSE) r_final #I'm not much satisfied by the approach, it seems to me a bit clumsy... #any help for improving it? #thanks #a novice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract and re-arrange components of data frame
# considering this data.frame as a reproducible example d<-data.frame(i=c(1,2,3), s=c('97,98,99','103,105', '118'), stringsAsFactors = FALSE) d #I need to get this final result r<-data.frame(i=c(1,1,1,2,2,3), s=c(97, 98, 99, 103, 105, 118)) r #this is my attempt #number of components for each element (3) of the list #returned by strsplit n<-unlist(lapply(strsplit(d$s,','), length)) #extract components of all elements of the list s<-cbind(unlist(strsplit(d$s,','))) #replicate each element of i #by the number of components of each element of the list i<-rep(d$i, n) i #compose final result r_final<-data.frame(i,s, stringsAsFactors = FALSE) r_final #I'm not much satisfied by the approach, it seems to me a bit clumsy... #any help for improving it? #thanks #a novice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate and list elements of variables in data.frame
#ok, finally this is my final "best and more compact" solution of the problem by merging different contributions (thanks to all indeed) t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789)) l<-sapply(unique(t$A), function(x) t$id[which(t$A==x)]) r<-data.frame(unique_A= unique(t$A), list_id=unlist(lapply(l, paste, collapse = ", "))) r [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate and list elements of variables in data.frame
thank you for the help this is my solution based on your valuable hint but without the need to pass through the use of a 'tibble' x<-data.frame(id=LETTERS[1:10], A=c(123,345,123,678,345,123,789,345,123,789)) uA<-unique(x$A) idx<-lapply(uA, function(v) which(x$A %in% v)) vals<- lapply(idx, function(index) x$id[index]) data.frame(unique_A = uA, list_vals=unlist(lapply(vals, paste, collapse = ", "))) best Da: "Ben Tupper" A: "Massimo Bressan" Cc: "r-help" Inviato: Giovedì, 7 giugno 2018 14:47:55 Oggetto: Re: [R] aggregate and list elements of variables in data.frame Hi, Does this do what you want? I had to change the id values to something more obvious. It uses tibbles which allow each variable to be a list. library(tibble) library(dplyr) x <- tibble(id=LETTERS[1:10], A=c(123,345,123,678,345,123,789,345,123,789)) uA <- unique(x$A) idx <- lapply(uA, function(v) which(x$A %in% v)) vals <- lapply(idx, function(index) x$id[index]) r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals) > r # A tibble: 4 x 3 unique_A list_idx list_vals 1 123. 2 345. 3 678. 4 789. > r$list_idx[1] [[1]] [1] 1 3 6 9 > r$list_vals[1] [[1]] [1] "A" "C" "F" "I" Cheers, ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate and list elements of variables in data.frame
sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A #please consider this new example t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789)) t # I need to get this result r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97')) r # any help for this, please? Da: "Massimo Bressan" A: "r-help" Inviato: Giovedì, 7 giugno 2018 10:09:55 Oggetto: Re: aggregate and list elements of variables in data.frame thanks for the help I'm posting here the complete solution t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) t$A <- factor(t$A) l<-sapply(levels(t$A), function(x) which(t$A==x)) r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) r<-cbind(unique_A=row.names(r),r) row.names(r)<-NULL r best Da: "Massimo Bressan" A: "r-help" Inviato: Mercoledì, 6 giugno 2018 10:13:10 Oggetto: aggregate and list elements of variables in data.frame #given the following reproducible and simplified example t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) t #I need to get the following result r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10')) r # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A" #any help for that? #so far I've just managed to "aggregate" and "count", like: library(sqldf) sqldf('select count(*) as count_id, A as unique_A from t group by A') library(dplyr) t%>%group_by(unique_A=A) %>% summarise(count_id = n()) # thank you -- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it -------- -- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate and list elements of variables in data.frame
thanks for the help I'm posting here the complete solution t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) t$A <- factor(t$A) l<-sapply(levels(t$A), function(x) which(t$A==x)) r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) r<-cbind(unique_A=row.names(r),r) row.names(r)<-NULL r best Da: "Massimo Bressan" A: "r-help" Inviato: Mercoledì, 6 giugno 2018 10:13:10 Oggetto: aggregate and list elements of variables in data.frame #given the following reproducible and simplified example t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) t #I need to get the following result r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10')) r # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A" #any help for that? #so far I've just managed to "aggregate" and "count", like: library(sqldf) sqldf('select count(*) as count_id, A as unique_A from t group by A') library(dplyr) t%>%group_by(unique_A=A) %>% summarise(count_id = n()) # thank you -- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate and list elements of variables in data.frame
#given the following reproducible and simplified example t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) t #I need to get the following result r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10')) r # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A" #any help for that? #so far I've just managed to "aggregate" and "count", like: library(sqldf) sqldf('select count(*) as count_id, A as unique_A from t group by A') library(dplyr) t%>%group_by(unique_A=A) %>% summarise(count_id = n()) # thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assign NA to rows by test on multiple columns of a data frame
yes, it works, even if I do not really get how and why it's working the combination of logical results (could you provide some insights for that?) moreover, and most of all, I was hoping for a compact solution because I need to deal with MANY columns (more than 40) in data frame with the same basic structure as the simplified example I posted thanks m - Messaggio originale - Da: "Bert Gunter" <bgunter.4...@gmail.com> A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> Cc: "r-help" <r-help@r-project.org> Inviato: Mercoledì, 22 novembre 2017 17:32:33 Oggetto: Re: [R] assign NA to rows by test on multiple columns of a data frame Do you mean like this: mydf <- within(mydf, { is.na(A)<- !A_flag is.na(B)<- !B_flag } ) > mydf A A_flag B B_flag 1 8 10 5 12 2 NA 0 6 9 3 10 1 NA 0 4 NA 0 1 5 5 5 2 NA 0 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Nov 22, 2017 at 2:34 AM, Massimo Bressan < massimo.bres...@arpa.veneto.it> wrote: > > > Given this data frame (a simplified, essential reproducible example) > > > > > A<-c(8,7,10,1,5) > > A_flag<-c(10,0,1,0,2) > > B<-c(5,6,2,1,0) > > B_flag<-c(12,9,0,5,0) > > > > > mydf<-data.frame(A, A_flag, B, B_flag) > > > > > # this is my initial df > > mydf > > > > > I want to get to this final situation > > > > > i<-which(mydf$A_flag==0) > > mydf$A[i]<-NA > > > > > ii<-which(mydf$B_flag==0) > > mydf$B[ii]<-NA > > > > > # this is my final df > > mydf > > > > > By considering that I have to perform this task in a data frame with many > columns I’m wondering if there is a compact and effective way to get the > final result with just one ‘sweep’ of the dataframe? > > > > > I was thinking to the function apply or lapply but I can not properly > conceive how to… > > > > > any hint for that? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assign NA to rows by test on multiple columns of a data frame
...well, I don't think this is exactly the expected result (see my post) to be noted that the columns affected should be "A" and "B" thanks for the help max - Messaggio originale - Da: "Rui Barradas" <ruipbarra...@sapo.pt> A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it>, "r-help" <r-help@r-project.org> Inviato: Mercoledì, 22 novembre 2017 11:49:08 Oggetto: Re: [R] assign NA to rows by test on multiple columns of a data frame Hello, Try the following. icol <- which(grepl("flag", names(mydf))) mydf[icol] <- lapply(mydf[icol], function(x){ is.na(x) <- x == 0 x }) mydf # A A_flag B B_flag #1 8 10 5 12 #2 7 NA 6 9 #3 10 1 2 NA #4 1 NA 1 5 #5 5 2 0 NA Hope this helps, Rui Barradas On 11/22/2017 10:34 AM, Massimo Bressan wrote: > > > Given this data frame (a simplified, essential reproducible example) > > > > > A<-c(8,7,10,1,5) > > A_flag<-c(10,0,1,0,2) > > B<-c(5,6,2,1,0) > > B_flag<-c(12,9,0,5,0) > > > > > mydf<-data.frame(A, A_flag, B, B_flag) > > > > > # this is my initial df > > mydf > > > > > I want to get to this final situation > > > > > i<-which(mydf$A_flag==0) > > mydf$A[i]<-NA > > > > > ii<-which(mydf$B_flag==0) > > mydf$B[ii]<-NA > > > > > # this is my final df > > mydf > > > > > By considering that I have to perform this task in a data frame with many > columns I’m wondering if there is a compact and effective way to get the > final result with just one ‘sweep’ of the dataframe? > > > > > I was thinking to the function apply or lapply but I can not properly > conceive how to… > > > > > any hint for that? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assign NA to rows by test on multiple columns of a data frame
Given this data frame (a simplified, essential reproducible example) A<-c(8,7,10,1,5) A_flag<-c(10,0,1,0,2) B<-c(5,6,2,1,0) B_flag<-c(12,9,0,5,0) mydf<-data.frame(A, A_flag, B, B_flag) # this is my initial df mydf I want to get to this final situation i<-which(mydf$A_flag==0) mydf$A[i]<-NA ii<-which(mydf$B_flag==0) mydf$B[ii]<-NA # this is my final df mydf By considering that I have to perform this task in a data frame with many columns I’m wondering if there is a compact and effective way to get the final result with just one ‘sweep’ of the dataframe? I was thinking to the function apply or lapply but I can not properly conceive how to… any hint for that? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted average grouped by variables
hi thierry thanks for your reply yes, you are right, your solution is more straightforward best Da: "Thierry Onkelinx" <thierry.onkel...@inbo.be> A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> Cc: "r-help" <r-help@r-project.org> Inviato: Giovedì, 9 novembre 2017 15:17:31 Oggetto: Re: [R] weighted average grouped by variables Dear Massimo, It seems straightforward to use weighted.mean() in a dplyr context library(dplyr) mydf %>% group_by(date_time, type) %>% summarise(vel = weighted.mean(speed, n_vehicles)) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkel...@inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be /// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /// Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. /// -- -------- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted average grouped by variables
Hello an update about my question: I worked out the following solution (with the package "dplyr") library(dplyr) mydf%>% mutate(speed_vehicles=n_vehicles*mydf$speed) %>% group_by(date_time,type) %>% summarise( sum_n_times_speed=sum(speed_vehicles), n_vehicles=sum(n_vehicles), vel=sum(speed_vehicles)/sum(n_vehicles) ) In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean() any hints for a different approach much appreciated thanks Da: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> A: "r-help" <r-help@r-project.org> Inviato: Giovedì, 9 novembre 2017 12:20:52 Oggetto: weighted average grouped by variables hi all I have this dataframe (created as a reproducible example) mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), avg_speed = c(41.1029082774049, 40.3, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names = c(NA, -7L), class = "data.frame") mydf and I need to get to this final result mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521, 37.5, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names = c(NA, -4L), class = "data.frame") mydf_final my question: how to compute a weighted mean i.e. "weighted_avg_speed" from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) grouped by "date_time" and "type"? to be noted the complication of the case "motorcycle" (not present in both directions) any help for that? thank you max -- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weighted average grouped by variables
hi all I have this dataframe (created as a reproducible example) mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), avg_speed = c(41.1029082774049, 40.3, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names = c(NA, -7L), class = "data.frame") mydf and I need to get to this final result mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521, 37.5, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names = c(NA, -4L), class = "data.frame") mydf_final my question: how to compute a weighted mean i.e. "weighted_avg_speed" from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) grouped by "date_time" and "type"? to be noted the complication of the case "motorcycle" (not present in both directions) any help for that? thank you max [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] average at specific hour "endpoints" of the day
hi jeff thank you for your code, there is lot to think about it... In the meanwhile I've managed to work out a (sort of) solution but I'm still not completely satisfied with it I would like to keep it all more elegant and possibly general here it is, so far... mydate<-seq(ISOdatetime(2017,1, 1, 0, 0, 0), by="hour", length.out = 48) v1<-1:48 mydf<-data.frame(mydate,v1) library(zoo) z<-zoo(mydf[,-1], mydf[,1]) z8<-rollapply(z, width=8, FUN=mean, align="right") iz8<-which(as.numeric(strftime(index(z8), '%H'))==6) z8<-z8[iz8] z16<-rollapply(z, width=16, FUN=mean, align="right") iz16<-which(as.numeric(strftime(index(z16), '%H'))==22) z16<-z16[iz16] fortify.zoo(z16) fortify.zoo(z8) #and then any sort of manipulation with dataframes bye - Messaggio originale - Da: "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> Cc: "r-help" <r-help@r-project.org> Inviato: Giovedì, 6 aprile 2017 18:19:29 Oggetto: Re: [R] average at specific hour "endpoints" of the day On Thu, 6 Apr 2017, Massimo Bressan wrote: > hello > > given my reproducible example > > #--- > date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48) > v1<-1:48 > df<-data.frame(date,v1) > > #-- "date" and "df" are functions in base R... best to avoid hiding them by re-using those names in the global environment ISOdate forces GMT, which many data sets that you might work with do NOT use. It is better to use ISOdatetime to avoid letting hidden code determine the timezone that is applied to (or compared with) your data. > > I need to calculate the average of variable v1 at specific hour "endpoints" > of the day: i.e. at hours 6.00 and 22.00 respectively > > the desired result is > > date v1 > 01/01/17 22:00 15.5 > 02/01/17 06:00 27.5 > 02/01/17 22:00 39.5 > > at hour 06:00 of each day the average is calculated by considering the 8 > previous records (hours from 23:00 to 6:00) > at hour 22:00 of each day the average is calculated by considering the 16 > previous records (hours from 7:00 to 22:00) > > any hint please? > > I've been trying with some functions within the "xts" package but withouth > much result... I am not sure how I would do this with xts, but the below code is one fairly literal approach (implemented two ways) to translate your requirements that is also potentially extensible if the data or requirements change. ### Base R Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to # the system to decide dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 ) , by="hour" , length.out = 48 ) , v1 = 1:48 ) dta$nrec <- 1 dta$date <- as.POSIXct( trunc.POSIXt( dta$datetime, units="days" ) ) dta$tod <- as.numeric( dta$datetime - dta$date, units = "hours" ) dta$timeslot <- factor( ifelse( 6 < dta$tod & dta$tod <= 22 , "Day" , "Night" ) , levels = c( "Night", "Day" ) ) dta$slotdatetime <- dta$date + as.difftime( ifelse( "Day" == dta$timeslot , 22 , ifelse( 22 < dta$tod , 24+6 , 6 ) ) , units="hours" ) dta2 <- aggregate( dta[ , c( "v1", "nrec" ) ] , dta[ , c( "timeslot", "slotdatetime" ), drop=FALSE ] , FUN = sum ) dta2 <- subset( dta2, nrec == ifelse( "Day"==timeslot, 16, 8 ) ) dta2$v1mean <- dta2$v1 / dta2$nrec or if you don't mind the tidyverse library(dplyr) # wonderland of non-standard evaluation... beware, Alice! Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to # the system to decide dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 ) , by="hour" , length.out = 48 )
[R] average at specific hour "endpoints" of the day
hello given my reproducible example #--- date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48) v1<-1:48 df<-data.frame(date,v1) #-- I need to calculate the average of variable v1 at specific hour "endpoints" of the day: i.e. at hours 6.00 and 22.00 respectively the desired result is date v1 01/01/17 22:00 15.5 02/01/17 06:00 27.5 02/01/17 22:00 39.5 at hour 06:00 of each day the average is calculated by considering the 8 previous records (hours from 23:00 to 6:00) at hour 22:00 of each day the average is calculated by considering the 16 previous records (hours from 7:00 to 22:00) any hint please? I've been trying with some functions within the "xts" package but withouth much result... thanks for the help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)
thank you, what a nice compact solution with ave() I learned something new about the subtleties of R let me here summarize the alternative solutions, just in case someonelse might be interested... thanks, bye # # my user function (an example) mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))} # my dataframe to apply the formula by blocks mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) # blocks (factors) to be used for splitting b <- mydf$blocks # 1 - split-lapply-unsplit with anonimous function to return a new df s <- split(mydf, b) l<- lapply(s, function(x) data.frame(x, v1mod=mynorm(x$v1))) mydf_new <- unsplit(l, mydf$blocks) # 2 - split-lapply-unsplit with function trasnform to return a new df l <- split(mydf, b) l <- lapply(l, transform, v1.mod = mynorm(v1)) mydf_new <- unsplit(l, b) # 3 - ave() encapsulating split-lapply-unsplit approach mydf_new<-transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm)) # Da: "William Dunlap" <wdun...@tibco.com> A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> Cc: "David L Carlson" <dcarl...@tamu.edu>, "r-help" <r-help@r-project.org> Inviato: Venerdì, 13 maggio 2016 19:22:21 Oggetto: Re: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe) ave() encapsulates the split/lapply/unsplit stuff so transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm)) also gives what you got above. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, May 13, 2016 at 7:44 AM, Massimo Bressan < massimo.bres...@arpa.veneto.it > wrote: yes, thanks you pointed me in the right direction: split/unplist was the trick I completely left behind that possibility! here the final version mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))} mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) g <- mydf$blocks l <- split(mydf, g) l <- lapply(l, transform, v1.mod = mynorm(v1)) mydf_new <- unsplit(l, g) thanks again massimo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bres...@arpa.veneto.it [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)
yes, thanks you pointed me in the right direction: split/unplist was the trick I completely left behind that possibility! here the final version mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))} mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) g <- mydf$blocks l <- split(mydf, g) l <- lapply(l, transform, v1.mod = mynorm(v1)) mydf_new <- unsplit(l, g) thanks again massimo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)
hi I need to apply a user defined formula over some selected columns of a dataframe by subsetting group of rows (blocks) and get back a new dataframe I’ve been managed to get the the calculations right but I’m not satisfied at all by the form of the results please refer to my reproducible example ## # my user function (an example) mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))} # my dataframe to apply the formula by blocks mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) #my attempts (not satisfied by final output) tapply(mydf$v1, mydf$blocks, mynorm) byf<-factor(mydf$blocks) aggregate(mydf[2:3], list(byf), mynorm) aggregate(mydf[2:3], list(mydf$blocks), mynorm, simplify = FALSE) ### please can anyone give me some hints on how to properly proceed? I need a dataframe with all variables as final result sorry but I’m sort of definitely stuck with this… thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'split-lapply' vs. 'aggregate'
this might be a trivial question (eventually sorry for that!) but I definitely can not catch the problem here... please consider the following reproducible example: why of different results through 'split-lapply' vs. 'aggregate'? I've been also through a check against different methods (e.g. data.table, dplyr) and the results were always consistent with 'split-lapply' but apparently not with 'aggregate' I must be certainly wrong! could someone point me in the right direction? thanks ## s <- split(airquality, airquality$Month) ls <- lapply(s, function(x) {colMeans(x[c("Ozone", "Solar.R", "Wind")], na.rm = TRUE)}) do.call(rbind, ls) # slightly different results with aggregate(.~ Month, airquality[-c(4,6)], mean, na.rm=TRUE) ## [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] shift by one column given rows in a dataframe
by considering the following reproducible example: v0-c(a,xxx,c,rep(xxx,2)) v1-c(1,b,3,d,e) v2-c(6,2,8,4,5) v3-c(xxx,7,xxx,9,10) df_start-data.frame(v0,v1,v2,v3) df_start v0-letters[1:5] v1-1:5 v2-6:10 df_end-data.frame(v0,v1,v2) df_end I need to shift by one column some given rows in the initial data frame called df_start so that to get the final structure as in df_end; please consider that the value xxx in the rows of df_start can be anything so that I necessarly need to apply by row index position (in my reproducible example rows: 2, 3, 5); I'm really stuck with that problem and I can not conceive any viable solution up to now any hints? best regards m __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aov and groups coding
please consider the following example: #start code set.seed(123) level-rnorm(18, 10,3) group1-rep(letters[1:3], each=6) summary(aov(level~group1)) group2-rep(1:3,each=6) str(group2) summary(aov(level~group2)) #same result as for group1 summary(aov(level~factor(group2))) #same result ad for aov anova(lm(level~group2)) #end code what I would like to do is to perform an anova among groups (analysis of variance for three different gruops); consider that groups are completely arbitrary: they are not intended to have any sort of scaling or ordinal meaning; in my example same groups are coded in two alternative ways: group1 as chr (factor) and group2 as num; so by keeping in mind my purpose (is there any difference in the level among groups?) I would simply consider the result of aov() for group2 (num) as a non sense (with respect to my specific purpuse) is that a correct interpretation? I hope not having misinterpreted the indications of the following thread http://r.789695.n4.nabble.com/Question-about-factor-that-is-numeric-in-aov-td2164393.html thank you for any help best regards max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by and merge two dataframes
yes thanks, that's correct! here a slight variation inspired by your solution: a cartesian product restricted to non duplicated records to get the logical vector i to be used in the next natural join i-!duplicated(merge(df1$id,df1$item, by=NULL)) merge(df1[i,],df2) thanks Il 08/05/2014 18:43, arun ha scritto: Hi, May be: indx - !duplicated(as.character(interaction(df1[,-3]))) merge(df1[indx,],df2) A.K. On Thursday, May 8, 2014 12:34 PM, Massimo Bressanmbres...@arpa.veneto.it wrote: yes, thank you for all your replies, they worked out correctly indeed... ...but because of my fault, by then working on my real data I fully realised that I should have mentioned something that is changing (quite a lot, in fact) the terms of the problem... please would you consider the following (consistent) variation ? df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), rep(C,2)), v=rnorm(6)) df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio)) and again I need to group the first dataframe df1 both by id and by the first record of v, and then merge with the second dataframe df2 (again by id) now, how to do that? (that's why probably I was pointing in my first post to the use of sqldf) thanks ps: I'm in doubt wheter I must open another thread or keep going with this one (really sorry for the eventual violation of the R-help netiquette) Il 08/05/2014 17:14, arun ha scritto: Hi, May be this helps: merge(unique(df1),df2) A.K. On Thursday, May 8, 2014 5:46 AM, Massimo Bressanmbres...@arpa.veneto.it wrote: given this bare bone example: df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), rep(C,2))) df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio)) I need to group the first dataframe df1 by id and then merge with the second dataframe df2 (again by id) so far I've manged to accomplish the task by something like the following... # start require(sqldf) tmp-sqldf(select * from df1 group by id) merge(tmp, df2) #end now I'm wonderng if there is a more efficient and/or elegant way to perform it (also because in fact I'm dealing with much more heavy dataframes); may be possible through a single sql statement? or by using a different package functions (e.g. dplyr)? my attempts towards these alternative approaches miserably failed ... thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] group by and merge two dataframes
given this bare bone example: df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), rep(C,2))) df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio)) I need to group the first dataframe df1 by id and then merge with the second dataframe df2 (again by id) so far I've manged to accomplish the task by something like the following... # start require(sqldf) tmp-sqldf(select * from df1 group by id) merge(tmp, df2) #end now I'm wonderng if there is a more efficient and/or elegant way to perform it (also because in fact I'm dealing with much more heavy dataframes); may be possible through a single sql statement? or by using a different package functions (e.g. dplyr)? my attempts towards these alternative approaches miserably failed ... thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group by and merge two dataframes
yes, thank you for all your replies, they worked out correctly indeed... ...but because of my fault, by then working on my real data I fully realised that I should have mentioned something that is changing (quite a lot, in fact) the terms of the problem... please would you consider the following (consistent) variation ? df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), rep(C,2)), v=rnorm(6)) df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio)) and again I need to group the first dataframe df1 both by id and by the first record of v, and then merge with the second dataframe df2 (again by id) now, how to do that? (that's why probably I was pointing in my first post to the use of sqldf) thanks ps: I'm in doubt wheter I must open another thread or keep going with this one (really sorry for the eventual violation of the R-help netiquette) Il 08/05/2014 17:14, arun ha scritto: Hi, May be this helps: merge(unique(df1),df2) A.K. On Thursday, May 8, 2014 5:46 AM, Massimo Bressan mbres...@arpa.veneto.it wrote: given this bare bone example: df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), rep(C,2))) df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio)) I need to group the first dataframe df1 by id and then merge with the second dataframe df2 (again by id) so far I've manged to accomplish the task by something like the following... # start require(sqldf) tmp-sqldf(select * from df1 group by id) merge(tmp, df2) #end now I'm wonderng if there is a more efficient and/or elegant way to perform it (also because in fact I'm dealing with much more heavy dataframes); may be possible through a single sql statement? or by using a different package functions (e.g. dplyr)? my attempts towards these alternative approaches miserably failed ... thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sum of two POSIXct objects: date and hour
read in and convert the data yourself, or is this a source that you do not have any control over? If the former, then just use the correct conversion. As shown below, if you have hundredths of a second, that will be converted correctly and you don't need the extra column. x - as.POSIXct(2014-04-29 12:00:00.345) # decimal seconds that are converted x [1] 2014-04-29 12:00:00 EDT format(x, format = %H:%M:%OS3) # print with 3 decimals [1] 12:00:00.345 If you have the choice, start over again and do it correctly. If not, convert the various components to the correct character format for your timezone, combine back together and then use the conversion shown above. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Tue, Apr 29, 2014 at 9:06 AM, Massimo Bressan mbres...@arpa.veneto.it mailto:mbres...@arpa.veneto.itwrote: I have this dataframe: df-structure(list(date = structure(c(1395874800, 1395874800, 1395874800, 1395874800, 1395874800), class = c(POSIXct, POSIXt), tzone = ), hour = structure(c(-2209121804, -2209121567, -2209121005, -2209118616, -2209116160), class = c(POSIXct, POSIXt), tzone = ), s.100 = c(29L, 36L, 6L, 53L, 18L)), .Names = c(date, hour, s.100), row.names = c(NA, -5L), class = data.frame) and I would like to sum first two columns (date and hour) so that to end up with a new column, say date_hour, storing both the information about the date and the hour in one POSIXct object; I have been reading that POSIXct objects are a measure of seconds from a given origin (1st Jan 1970), so that a possible solution is to tranform the column hour into seconds and then add it to the column date; but, is there a staightforward solution for accomplishing this task? I've been trying to extract from the column hour the digits representing hours, minutes and seconds and transform everything into seconds but that seem to me quite cumbersome approach... and finally, one more question: is it possible to represent hundred of seconds as given in the column s.100 of the given dataframe within the same new POSIXct object date_hour? thanksfor the support [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sum of two POSIXct objects: date and hour
I have this dataframe: df-structure(list(date = structure(c(1395874800, 1395874800, 1395874800, 1395874800, 1395874800), class = c(POSIXct, POSIXt), tzone = ), hour = structure(c(-2209121804, -2209121567, -2209121005, -2209118616, -2209116160), class = c(POSIXct, POSIXt), tzone = ), s.100 = c(29L, 36L, 6L, 53L, 18L)), .Names = c(date, hour, s.100), row.names = c(NA, -5L), class = data.frame) and I would like to sum first two columns (date and hour) so that to end up with a new column, say date_hour, storing both the information about the date and the hour in one POSIXct object; I have been reading that POSIXct objects are a measure of seconds from a given origin (1st Jan 1970), so that a possible solution is to tranform the column hour into seconds and then add it to the column date; but, is there a staightforward solution for accomplishing this task? I've been trying to extract from the column hour the digits representing hours, minutes and seconds and transform everything into seconds but that seem to me quite cumbersome approach... and finally, one more question: is it possible to represent hundred of seconds as given in the column s.100 of the given dataframe within the same new POSIXct object date_hour? thanksfor the support [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] print(cenfit object) to a data.frame
thanks rui, it helps indeed.. at first, I've been trying to data.frame the output of mean (mycenfit) by the following: my.df-as.data.frame(do.call(rbind, mean(mycenfit))) and it worked out correctly! ...but because I also needed the information about n and n.cen, which are not provided by mean(mycenfit), I had to switch to print(mycenfit); ...but unfortunately, print(mycenfit) is not so easy (to me at least) to handle now, I'm looking at a different possible ways to extract the same information directly from the object mycenfit (S4), which turned out to be quite hard (to me again) any othe possible ideas? cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] print(cenfit object) to a data.frame
given this reproducible example: #start code df-structure(list(lq = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE), value = c(1, 3, 1, 2, 0.5, 2, 1, 2, 3), group = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 2L), .Label = c(A, B, C), class = factor)), .Names = c(lq, value, group), row.names = c(NA, -9L), class = data.frame) library(NADA) mycenfit-with(df, cenfit(value,lq,group)) print(mycenfit) #end code does anybody knows how to convert the print() of the cenfit object (S4) mycenfit to a data frame? sorry, this might be a trivial question but for some reasons I do not understand I got completely stuck on this... I've seen similar questions pointed out in the mailing list but for a surfit object which do not seem to properly apply in my specific case any help much appreciated, thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot of a bagging tree
by considering this general example ##start code library(ipred) data(Ionosphere, package = mlbench) Ionosphere$V2 - NULL # constant within groups iono-bagging(Class ~ ., data=Ionosphere, coob=TRUE) print(iono) ##end code does anybody knows any possibility to plot the (average) plot of the bagging? does it make any sense at least for a visual presentation? how to *visually* convey the information provided by the bagging model? thank you for any feedback best max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interpretation of MDS plot in random forest
here it is an amended (more general) version library(randomForest) set.seed(1) data(iris) iris.rf - randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) x-MDSplot(iris.rf, iris$Species) #add legend legend(topleft, legend=levels(iris.rf$predicted), fill=brewer.pal(length(levels(iris.rf$predicted)), Set1)) #str(x) # need to identify points? text(x$points,labels=attr(x$points,dimnames)[[1]], cex=0.5) bye m Il 03/12/2013 12:15, mbres...@arpa.veneto.it ha scritto: sorry, in fact it was a trivial question! by just peeping into the function I've worked out this simple solution: MDSplot(iris.rf, iris$Species) legend(topleft, legend=levels(iris$Species), fill=brewer.pal(3, Set1)) thank you thanks andy it's a real honour form me to get a reply by you; I'm still a bit faraway from a proper grasp of the purpose of the plot... may I ask you for a more technical (trivial) issue? is it possible to add a legend in the MDS plot? my problem is to link the color points in the chart to the factor that was used as response to train rf, how to? best max Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help@r-project.org Subject: [R] interpretation of MDS plot in random forest Given this general example: set.seed(1) data(iris) iris.rf - randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I’ve been reading the documentation about random forest (at best of my - poor - knowledge) but I’m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for “the scaling coordinates of the proximity matrix�? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interpretation of MDS plot in random forest
Given this general example: set.seed(1) data(iris) iris.rf - randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I’ve been reading the documentation about random forest (at best of my - poor - knowledge) but I’m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for “the scaling coordinates of the proximity matrix”? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing element number of a list in a column data frame
yes, I like this: a very elegant and neat solution (in my very umble opinion) sometime is so difficult to me to think of a solution in such a simple and effective terms: less is more! thank you max Il 03/10/2013 17:12, David Carlson ha scritto: Try this i=which(!sapply(mytest, is.null)) n=do.call(rbind, mytest[i]) mydf - data.frame(i, n) mydf i n 1 1 45 2 3 18 3 5 99 - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Massimo Bressan Sent: Thursday, October 3, 2013 9:42 AM To: r-help@r-project.org Subject: [R] storing element number of a list in a column data frame #let's suppose I have a list like this mytest-list(45, NULL, 18, NULL, 99) #to note that this is just an amended example because in fact #I'm dealing with a long list (more than 400 elements) #with no evident pattern of the NULL values #I want to end up with a data frame like the following data.frame(i=c(1,3,5), n=c(45,18,99)) #i.e. a data frame storing in #column i the number of corresponding element list #column n the unique component of that element #I've been trying with do.call(rbind, mytest) #or do.call(rbind.data.frame, mytest) #but this approach is not properly achieving the desired result #now I'm in trouble on how to store each element number of the list in the first column data frame #any help for this? #thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] storing element number of a list in a column data frame
#let's suppose I have a list like this mytest-list(45, NULL, 18, NULL, 99) #to note that this is just an amended example because in fact #I'm dealing with a long list (more than 400 elements) #with no evident pattern of the NULL values #I want to end up with a data frame like the following data.frame(i=c(1,3,5), n=c(45,18,99)) #i.e. a data frame storing in #column i the number of corresponding element list #column n the unique component of that element #I've been trying with do.call(rbind, mytest) #or do.call(rbind.data.frame, mytest) #but this approach is not properly achieving the desired result #now I'm in trouble on how to store each element number of the list in the first column data frame #any help for this? #thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing element number of a list in a column data frame
the list I'm dealing with is the follow-up of an lapply() not a native data structure I've been set up for storing data originally; the list it's a data structure I have to manage as a consequence of my previous operations, something like: path-./ files -list.files(path, pattern=.csv) mylist-lapply(files, function(files) na.omit(read.csv(paste0(path,/,files)), header=TRUE)) anyway, thank you for the good hint: is.null seems promising cheers m Il 03/10/2013 16:55, Bert Gunter ha scritto: Have you read An Introduction to R (ships with R) or another of the many excellent R tutorials on the web? I ask, because you do not appear to be using a sensible data structure. As your list appears to be of a single type (probably numeric, maybe integer), it would be preferable to use a vector, like this: y - c(45, NA, 18, NA, 99) (The NULLS must be converted to NA's to hold their places). There would then seem to be little need for the data frame structure, as it tends to slow things down in R. But if you insist, which(is.na(y)) will give you the indices of the NA's. See also: ?is.na ?is.null. Cheers, Bert On Thu, Oct 3, 2013 at 7:41 AM, Massimo Bressanmbres...@arpa.veneto.it wrote: #let's suppose I have a list like this mytest-list(45, NULL, 18, NULL, 99) #to note that this is just an amended example because in fact #I'm dealing with a long list (more than 400 elements) #with no evident pattern of the NULL values #I want to end up with a data frame like the following data.frame(i=c(1,3,5), n=c(45,18,99)) #i.e. a data frame storing in #column i the number of corresponding element list #column n the unique component of that element #I've been trying with do.call(rbind, mytest) #or do.call(rbind.data.frame, mytest) #but this approach is not properly achieving the desired result #now I'm in trouble on how to store each element number of the list in the first column data frame #any help for this? #thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] triax.plot: control legend position and size of point labels
thanks jim, I had to move the legend to a different place (as the default) because for some reasons it appeared to me sliced on the left side (but I'm not sure if that was due to my own configuration, anyway...); I think that the possibility to control the size of the point labels would be a good new feature (even if not essential) in the triax.plot function thank you for your valuable work best massmi Il 20/12/2012 11:50, Jim Lemon ha scritto: On 12/19/2012 11:18 PM, maxbre wrote: Given this example library(plotrix) a-c(34,10,70) b-c(33,10,20) c-c(33,80,10) test-data.frame(A=a,B=b,C=c) triax.plot(test, main =title, at=seq(0.25,0.75,by=0.25), tick.labels=list(l=seq(0.25,0.75,by=0.25), r=seq(0.25,0.75,by=0.25), b=seq(0.25,0.75,by=0.25)), align.labels=TRUE, show.grid=TRUE, cc.axes=TRUE, show.legend=TRUE, label.points=TRUE, point.labels=c(case 1,case 2, case 3), col.symbols=c(red,blue,green), cex.ticks=0.8, cex.axis=0.8, lty.grid=2, pch=17 ) I would like to control the position of the legend (to be moved to a different place) and the size of point labels (to be reduced) Iâve been trying to work out the solution with par() but without much success, any help for this? Hi maxbre, For the legend, I would suggest calling triax.plot with legend=FALSE and then adding the legend where you want it. To change the size of the point labels you would have to modify the function. Here is the triax.points function with the necessary modification: triax.points-function(x,show.legend=FALSE,label.points=FALSE, point.labels=NULL,col.symbols=par(fg),pch=par(pch), bg.symbols=par(bg),cc.axes=FALSE,...) { if(dev.cur() == 1) stop(Cannot add points unless the triax.frame has been drawn) if(missing(x)) stop(Usage: triax.points(x,...)\n\twhere x is a 3 column array of proportions or percentages) if(!is.matrix(x) !is.data.frame(x)) stop(x must be a matrix or data frame with at least 3 columns and one row.) if(any(x 1) || any(x 0)) { if(any(x 0)) stop(All proportions must be between zero and one.) if(any(x 100)) stop(All percentages must be between zero and 100.) # convert percentages to proportions x-x/100 } if(any(abs(rowSums(x)-1) 0.01)) warning(At least one set of proportions does not equal one.) sin60-sin(pi/3) if(cc.axes) { ypos-x[,3]*sin60 xpos-x[,1]+x[,3]*0.5 } else { ypos-x[,3]*sin60 xpos-1-(x[,1]+x[,3]*0.5) } nobs-dim(x)[1] points(x=xpos,y=ypos,pch=pch,col=col.symbols,bg=bg.symbols,type=p,...) if(is.null(point.labels)) point.labels-rownames(x) if(label.points) thigmophobe.labels(xpos,ypos,point.labels,cex=par(cex.axis)) if(show.legend) { legend(0.2,0.7,legend=point.labels,pch=pch,col=col.symbols, xjust=1,yjust=0) } invisible(list(x=xpos,y=ypos)) } Note that some lines above may have been broken by the email client. If so, stitch them back together with a text editor before trying to source the file. I will probably add something like this to triax.plot in the next version of plotrix. Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice dotplot reorder contiguous levels
thank you all for your helpful replies to bert the problem with relation =same is the plotting along y axis of all categories (samp.time) for all groups (sites); instead, I need to plot along y axis just the categories for each group effectively having a corresponding observation to danny your solution with ggplot works like a charm; I will keep an eye to ggplot2 for other new charts but for some reasons now I must stick on lattice (which is also my favourite by the way) to richard result3 is close to what I need but still not exactly what Iâm aiming to; I need y axis to be plotted as categories not as a continuous scale⦠I need the result proposed by danny to be translated into lattice⦠to deepayan thank you so much, Iâll carefully consider your hint and Iâll try to make it works (but still Iâm not sure I fully understand how to do it); as long as I can get to a viable solution Iâll post it back Just for your information my effort up to now (NOT WORKING!) was pointing toward this direction: 1- new character variable from samp.time to be used later as label for the plot: test$samp.lab-as.character(test$samp.time) 2- new factor variable with as many levels as the observations: test$samp.id - gl(length(test$samp.time), 1) 3- new factor variable based on âsamp.timeâ but with the order of levels based on âsiteâ, something like this (I think this is the crucial point where I'm likely to fail): test$samp.time.site - with(test, reorder(samp.time, as.numeric(site))) 4- new numeric level: nl - as.numeric(levels(test$samp.time.site)) 5- plotting (wrong): dotplot(samp.time~conc|site, data=test, ylim=test$samp.lab[nl], scales=list(x=list(log=10), y = list(relation = free)), layout=c(1,5), strip=FALSE, strip.left=TRUE ) Il 21/09/2012 08:55, Deepayan Sarkar ha scritto: On Thu, Sep 20, 2012 at 7:48 PM, maxbrembres...@arpa.veneto.it wrote: my reproducible example test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A, B, C, D, E), class = factor), conc = c(2.32, 0.902, 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442, 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315, 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30, 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61, 3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L, 4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L, 8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c(2, 4, 12, 24, 96, 135, 167, 168, 169), class = factor)), .Names = c(site, conc, samp.time), row.names = c(NA, 52L), class = data.frame) dotplot(samp.time~conc|site, data=test, scales=list(x=list(log=10), y = list(relation = free)), layout=c(1,5), strip=FALSE, strip.left=TRUE ) my objective is to use âsiteâ as conditioning variable but with âsamp.timeâ correctly grouped by âsiteâ; the problem here is to ensure that levels of âsamp.timeâ within each âsiteâ are contiguous as otherwise they would be not contiguous in the dot plot itself (i.e, avoid that sort of holes in between y axis categories -see dotplot -) Iâve been trying with this but without much success test$samp.time.new- with(test,reorder(samp.time,as.numeric(site))) dotplot(samp.time.new~conc|site, data=test, scales=list(x=list(log=10), y = list(relation = free)), layout=c(1,5), strip=FALSE, strip.left=TRUE ) I think (I hope) a possible different solution is to create for ylim a proper character vector of different length to pass to each panel of the dotplot (Iâm not posting this attempt because too much confused up to now) can anyone point me in the right direction? The problem here is that there is crossing between sites and samp.time. You can try some imaginative permutations of site, such as test$samp.time.new - with(test, reorder(samp.time, as.numeric(factor(site, levels = c(A, C, D, B, E) which gets all but site B right. There may be another permutation that works for everything, but it would be much easier to make a nested factor, i.e., test$samp.time.new - with(test, reorder(samp.time:site, as.numeric(site))) That just leaves getting the y-labels right, which I will leave for you to figure out. (Hint: ylim = some_function_of(levels(test$samp.time.new))) -Deepayan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice dotplot reorder contiguous levels
deepayan, is that what you mean? but still the problem persists: nor correct neither contiguous labelling! I must probably reconsider everything from scratch: I'm bit confused now... test$samp.time.new - with(test, reorder(samp.time:site, as.numeric(site))) s-strsplit(levels(test$samp.time.new), :) s1- sapply(s, '[', 1) dotplot(samp.time~conc|site, data=test, ylim=s1, scales=list(x=list(log=10), y = list(relation = free)), layout=c(1,5), strip=FALSE, strip.left=TRUE ) Il 21/09/2012 08:55, Deepayan Sarkar ha scritto: On Thu, Sep 20, 2012 at 7:48 PM, maxbre mbres...@arpa.veneto.it wrote: my reproducible example test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A, B, C, D, E), class = factor), conc = c(2.32, 0.902, 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442, 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315, 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30, 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61, 3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L, 4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L, 8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c(2, 4, 12, 24, 96, 135, 167, 168, 169), class = factor)), .Names = c(site, conc, samp.time), row.names = c(NA, 52L), class = data.frame) dotplot(samp.time~conc|site, data=test, scales=list(x=list(log=10), y = list(relation = free)), layout=c(1,5), strip=FALSE, strip.left=TRUE ) my objective is to use “site” as conditioning variable but with “samp.time” correctly grouped by “site”; the problem here is to ensure that levels of “samp.time” within each “site” are contiguous as otherwise they would be not contiguous in the dot plot itself (i.e, avoid that sort of holes in between y axis categories -see dotplot -) I’ve been trying with this but without much success test$samp.time.new- with(test,reorder(samp.time,as.numeric(site))) dotplot(samp.time.new~conc|site, data=test, scales=list(x=list(log=10), y = list(relation = free)), layout=c(1,5), strip=FALSE, strip.left=TRUE ) I think (I hope) a possible different solution is to create for ylim a proper character vector of different length to pass to each panel of the dotplot (I’m not posting this attempt because too much confused up to now) can anyone point me in the right direction? The problem here is that there is crossing between sites and samp.time. You can try some imaginative permutations of site, such as test$samp.time.new - with(test, reorder(samp.time, as.numeric(factor(site, levels = c(A, C, D, B, E) which gets all but site B right. There may be another permutation that works for everything, but it would be much easier to make a nested factor, i.e., test$samp.time.new - with(test, reorder(samp.time:site, as.numeric(site))) That just leaves getting the y-labels right, which I will leave for you to figure out. (Hint: ylim = some_function_of(levels(test$samp.time.new))) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot lattice vs standard graphics
ok, I see now! here it is the reproducible example along with the final code (aslo with the median line instead of a point) thank you all for the great help max # start code library(lattice) test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A, B, C, D, E), class = factor), conc = c(2.32, 0.902, 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442, 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315, 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30, 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61, 3.39, 20, 4.59)), .Names = c(site, conc), row.names = c(NA, 52L), class = data.frame) mystats - function(x, ...){ # Here ... out - boxplot.stats(10^x, ...) # ...and here!!! out$stats - log10(out$stats) out$conf - log10(out$conf) ## Omit if you don't want notches out$out - log10(out$out) out ## With the boxplot statistics converted to the log10 scale } dev.new() bwplot(conc~site, data=test, pch=|, # this is plotting a line instead of a point scales = list(y=list(log=10)), panel = function(...){ panel.bwplot(..., stats = mystats) } ) # end code Il 17/09/2012 20:26, Rui Barradas ha scritto: Hello, Em 17-09-2012 18:50, David Winsemius escreveu: On Sep 17, 2012, at 4:18 AM, maxbre wrote: here it is, I think (I hope) I'm getting a little closer with this, but still there is something to sort out... error using packet 1 unused argument(s) (coef =1.5, do.out=TRUE) by reading the help for panel.bwplot at the argument stats it says: the function must accept arguments coef and do.out even if they do not use them (a ... argument is good enough). I'm not sure how to couple with this... any help for this ? thanks ## start code mystats - function(x){ out - boxplot.stats(10^x) out$stats - log10(out$stats) out$conf - log10(out$conf) ## Omit if you don't want notches out$out - log10(out$out) out$coef-1.5 #?? out$do.out-TRUE #?? out ## With the boxplot statistics converted to the log10 scale } bwplot(conc~site, data=test, scales=list(y=list(log=10)), panel= function(x,y){ panel.bwplot(x,y,stats=mystats) } ) No example data, so no efforts at running code. Actually there is, in the op. ?panel.bwplot # Notice the Usage at the top of the page. The ... is there for a reason. # And notice that neither 'do.out' nor 'coef' are passed in the stats list # The message was talking about what arguments your 'mystats' would accept, not what it would return. It's another instance of your needing to understand what the ... formalism is doing. ?boxplot.stats # I would be making a concerted effort to return a list with exactly the components listed there. And since I'm terrible at graphics I try to learn as much as possible on R-Help. Here it goes. library(lattice) test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A, B, C, D, E), class = factor), conc = c(2.32, 0.902, 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442, 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315, 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30, 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61, 3.39, 20, 4.59)), .Names = c(site, conc), row.names = c(NA, 52L), class = data.frame) #standard graphics dev.new() with(test,boxplot(conc~site, log=y)) #lattice mystats - function(x, ...){ # Here ... out - boxplot.stats(10^x, ...) # ...and here!!! out$stats - log10(out$stats) out$conf - log10(out$conf) ## Omit if you don't want notches out$out - log10(out$out) out ## With the boxplot statistics converted to the log10 scale } dev.new() bwplot(conc~site, data=test, scales = list(y=list(log=10)), panel = function(...){ panel.bwplot(..., stats = mystats) } ) With a median _line_ it would be perfect. (Not a follow-up, it was already answered some time ago, use pch = | in panel.bwplot.) Rui Barradas ## end code __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot lattice vs standard graphics
thank you for the help, bert unfortunately, for reasons I can not understand (yet) I can not put to wortk it all (I'm always in trouble with the panel functions); max Il 14/09/2012 18:38, Bert Gunter ha scritto: Thanks for the example. Makes it easy to see what you mean. Yes, if I understand you correctly, you are right: boxplot() (base) transforms the axes, so ?boxplot.stats, which is the function that essentially computes the boxplot, does so on the original data. bwplot(lattice) transforms the data first, as the documentation for the log component of the scales list makes clear, and **then** calls boxplot.stats. Although I think the latter makes more sense then the former, I think the way to do it is to modify the stats function in an explicit call to panel.bwplot to something like (UNTESTED!) mystats - function(x){ out - boxplot.stats(10^x) out$stats - log10(out$stats) out$conf - log10(out$conf) ## Omit if you don't want notches out$out - log10(out$out) out ## With the boxplot statistics converted to the log10 scale } I leave it to you to test and modify as necessary. Cheers, Bert On Fri, Sep 14, 2012 at 2:37 AM, maxbre mbres...@arpa.veneto.it wrote: Given my reproducible example test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A, B, C, D, E), class = factor), conc = c(2.32, 0.902, 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442, 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315, 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30, 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61, 3.39, 20, 4.59)), .Names = c(site, conc), row.names = c(NA, 52L), class = data.frame) And the following code #standard graphics with(test,boxplot(conc~site, log=y)) #lattice bwplot(conc~site, data=test, scales=list(y=list(log=10)) ) There is an evident difference for site A, B, D in the way some outliers are plotted by comparing the plot produced by lattice vs. the standard graphics I think to understand this might be due to the different treatment of data: i.e. log transformation (before or after the plotting?) Is it possible to achieve the same plotting result with both graphic facilities? I would like to show the outliers also in lattice… Thank you http://r.789695.n4.nabble.com/file/n4643121/standard.png http://r.789695.n4.nabble.com/file/n4643121/lattice.png -- View this message in context: http://r.789695.n4.nabble.com/Boxplot-lattice-vs-standard-graphics-tp4643121.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change variable names in corrgram diagonal
yes, the argument labels it's working fine! It would be great if the docs will be be updated also with this already implemented feature thank you for your valuable work best max Il 13/08/2012 15:09, Uwe Ligges ha scritto: On 13.08.2012 12:12, maxbre wrote: given this example library(corrgram) corrgram(mtcars[2:6], order=TRUE, upper.panel=panel.conf, lower.panel=panel.pie, diag.panel=panel.minmax, text.panel=panel.txt) I's just try the labels arguemnt and pass the labels there - and then write to the maintainer that the docs need to be updates, since labels work rather than being Not used. Best, Uwe Ligges how can I change the variable names in main diagonal? (so that I can put more informative names of variables) I think to understand that this should be done by modifing the panel.txt function but for some reasons I'm not able to put that into practice any help for this thank you -- View this message in context: http://r.789695.n4.nabble.com/how-to-change-variable-names-in-corrgram-diagonal-tp4640156.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple data frames
thanks don I have here enough to study for a while thank you for your help max - Original Message - From: MacQueen, Don macque...@llnl.gov To: Massimo Bressan mbres...@arpa.veneto.it; r-help@r-project.org Sent: Monday, January 30, 2012 4:47 PM Subject: Re: [R] merge multiple data frames Does this example help? It doesn't handle the problem of common field names, but see below for another example. df1 - data.frame(jn=1:4, a1=letters[1:4], a2=LETTERS[1:4]) df2 - data.frame(jn=2:6, b1=month.abb[2:6]) df3 - data.frame(jn=3:7, x=rnorm(5), y=13:17) dfn - sqldf('select * from df1 left join df2 using (jn) left join df3 using (jn)') In this example, you automatically get all fields from all three data frames, without having to name them in the SQL statement -- but you should not have common names. To deal with common names, I myself would probably rename the variables in the data frames before trying to merge. A general method would be something like: nms1 - names(df1) nms1[nms1 != 'date'] - paste(nms1[nms1 != 'date'],'.1',sep='') names(df1) - nms1 Of course it has to be done for every data frame, but this can be put in a loop, if necessary. However, here is an example where I have changed df1 and df2; they both have a field named 'aa', in addition to the matching field. df1 - data.frame(jn=1:4, aa=letters[1:4], a2=LETTERS[1:4]) df2 - data.frame(jn=2:6, aa=month.abb[2:6]) df3 - data.frame(jn=3:7, x=rnorm(5), y=13:17) dfn - sqldf('select jn, df1.aa aa1, df2.aa aa2, a2, x, y from df1 left join df2 using (jn) left join df3 using (jn)') By the way, you can still select *, even with common names: dfx - sqldf('select * from df1 left join df2 using (jn) left join df3 using (jn)')but you might not like the result. Try it and see! It's my understanding that in the current SQL definition 'as' is no longer required when changing field names (though it is also still allowed in the databases I work with, Oracle and MySQL). Perhaps sqldf does not allow it. I don't know. Hope this helps. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/30/12 4:40 AM, Massimo Bressan mbres...@arpa.veneto.it wrote: hi don I followed your advice about using sqldf package but the problem of labelling the fields persists; for some reasons I can not properly handle the sql 'as' statement a_b-sqldf(select a.*, b.* from a left join b on a.date=b.date) a_b_c-sqldf(select a_b.*, c.* from a_b left join c on a_b.date=c.date) bye max - Original Message - From: MacQueen, Don macque...@llnl.gov To: maxbre mbres...@arpa.veneto.it; r-help@r-project.org Sent: Saturday, January 28, 2012 12:24 AM Subject: Re: [R] merge multiple data frames Not tested, but this might be a case for the sqldf package. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/26/12 9:29 AM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01
Re: [R] merge multiple data frames
hi don I followed your advice about using sqldf package but the problem of labelling the fields persists; for some reasons I can not properly handle the sql 'as' statement a_b-sqldf(select a.*, b.* from a left join b on a.date=b.date) a_b_c-sqldf(select a_b.*, c.* from a_b left join c on a_b.date=c.date) bye max - Original Message - From: MacQueen, Don macque...@llnl.gov To: maxbre mbres...@arpa.veneto.it; r-help@r-project.org Sent: Saturday, January 28, 2012 12:24 AM Subject: Re: [R] merge multiple data frames Not tested, but this might be a case for the sqldf package. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/26/12 9:29 AM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) c-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086, 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142, 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447, 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816 ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056, 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128, 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057 ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029, 4.33258009,
Re: [R] merge multiple data frames
thanks michael it's working like a charm: that's exaclty what I was looking for bye max - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: Massimo Bressan mbres...@arpa.veneto.it Cc: r-help@r-project.org Sent: Friday, January 27, 2012 4:16 PM Subject: Re: [R] merge multiple data frames Oh, sorry -- I assumed that was intentional since my code passed the identical() test with what you said you wanted. Perhaps this gets what you meant you wanted instead (though the treatment of the names is far from elegant) mergeAll - function(..., by = date, all = TRUE) { dotArgs - list(...) dotNames - lapply(dotArgs, names) repNames - Reduce(intersect, dotNames) repNames - repNames[repNames != by] for(i in seq_along(dotArgs)){ wn - which( (names(dotArgs[[i]]) %in% repNames) (names(dotArgs[[i]]) != by)) names(dotArgs[[i]])[wn] - paste(names(dotArgs[[i]])[wn], names(dotArgs)[[i]], sep = .) } Reduce(function(x, y) merge(x, y, by = by, all = all), dotArgs) } print(str(mergeAll(a=a,b=b,c=c))) Is that what you were going for? Michael On Fri, Jan 27, 2012 at 3:19 AM, Massimo Bressan mbres...@arpa.veneto.it wrote: I tested your code: it's OK but there is still the problem of the suffixes for the last dataframe thank you for the support - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: maxbre mbres...@arpa.veneto.it Cc: r-help@r-project.org Sent: Thursday, January 26, 2012 8:19 PM Subject: Re: [R] merge multiple data frames I might do something like this: mergeAll - function(..., by = date, all = TRUE) { dotArgs - list(...) Reduce(function(x, y) merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs), sep = )), dotArgs)} mergeAll(a = a, b = b, c = c) str(.Last.value) You also might be able to set it up to capture names without you having to put a = a etc. using substitute. On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no, no2
Re: [R] merge multiple data frames
I tested your code: it's OK but there is still the problem of the suffixes for the last dataframe thank you for the support - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: maxbre mbres...@arpa.veneto.it Cc: r-help@r-project.org Sent: Thursday, January 26, 2012 8:19 PM Subject: Re: [R] merge multiple data frames I might do something like this: mergeAll - function(..., by = date, all = TRUE) { dotArgs - list(...) Reduce(function(x, y) merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs), sep = )), dotArgs)} mergeAll(a = a, b = b, c = c) str(.Last.value) You also might be able to set it up to capture names without you having to put a = a etc. using substitute. On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) c-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086, 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142, 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447, 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816 ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056, 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128, 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057 ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 55.9453098437477, 56.7648829092446), temp = c(10.24740028,