from:"Massimo Bressan"

[R] overlay geom_contour to ggmap

2021-04-03 Thread massimo bressan

consider this reproducible example

# set the origin of the grid # in cartesian coordinates (epsg 32632)
xmin<-742966
ymin<-5037923
# set x and y axis
x<-seq(xmin, xmin+25*39, by=25)
y<-seq(ymin, ymin+25*39, by =25)
# define a 40 x 40 grid
mygrid<-expand.grid(x = x, y = y)
# set the z value to be interpolated by the contour
set.seed(123)
mygrid$z<- rnorm(nrow(mygrid))

library(tidyverse)
# plot of contour is fine
ggplot(data=mygrid, aes(x=x,y=y,z=z))+
  geom_contour()

library(ggspatial)
# transform coordinates to wgs84 4326# (one of the possible many other
ways to do it)
mygrid_4326<-xy_transform(mygrid$x, mygrid$y, from = 32632, to = 4326)
# create new grid with lon and lat # (geographical coordinates espg 4326)
mygrid_4326<-mygrid_4326%>%
  mutate(z=mygrid$z)
# define the bounding box
my_bb<-c(min(mygrid_4326$x), min(mygrid_4326$y),
 max(mygrid_4326$x),
max(mygrid_4326$y))names(my_bb)<-c('left', 'bottom', 'right', 'top')

library(ggmap)
# get the background map (by a free provider)
mymap<-get_stamenmap(bbox = c(left = my_bb[['left']],
  bottom = my_bb[['bottom']],
  right = my_bb[['right']],
  top = my_bb[['top']]),
 zoom = 15,
 maptype = 'toner-lite')
# plot of the map is fine
mymap%>%
  ggmap()
# overlay the contour of z is failing
mymap%>%
  ggmap()+
  #geom_contour(data=mygrid_4326, mapping=aes(x = x, y = y, z = z))
  stat_contour(data=mygrid_4326, mapping=aes(x = x, y = y, z = z))

Warning messages:1: stat_contour(): Zero contours were generated 2: In
min(x) : no non-missing arguments to min; returning Inf3: In max(x) :
no non-missing arguments to max; returning -Inf



the problem here is the overlay of the contour plot made with ggplot to a
base map made with ggmap

any help?

thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] interpretation of R output for exact permutation test

2019-06-07 Thread massimo bressan

given this reproucible example

library(coin)

independence_test(asat ~ group, data = asat, ## exact null distribution
distribution = "exact")

I'm wondering why the default results are reporting also the critical value
Z by considering that this method is supposed to be "exact", i.e. computing
the direct probability:

pvalue(independence_test(asat ~ group, data = asat, ## exact null
distribution distribution = "exact"))

my question is: what is the correct interpretation (if it exists at all) of
the Z value printed out by the 'plain' function 'independence_test' when it
is asked for an 'exact' test?

am I completely out of track?

sorry but I'm here missing the point somewhere, somehow...

thank you for the feedback

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extract and re-arrange components of data frame

2018-06-12 Thread Massimo Bressan

thank you for your reply 

well, you are resorting to a supposed order of i which is not necessary the 
case, and in fact is not in mine... 

consider this example, please 

d<-data.frame(i=c(8,12,3), s=c('97,918,19','103,1205', '418'), stringsAsFactors 
= FALSE) 
d 

Da: "Bert Gunter"  
A: "Massimo Bressan"  
Cc: "r-help"  
Inviato: Martedì, 12 giugno 2018 16:42:18 
Oggetto: Re: [R] extract and re-arrange components of data frame 

You mean like this? 

> s.new <-with(d, as.numeric(unlist(strsplit(s,"," 

> s.new <- cut(s.new,breaks = c(0,100,110,200),lab = d$i) 

> s.new 
[1] 1 1 1 2 2 3 
Levels: 1 2 3 

(Obviously, this could be a one-liner) 

See ?cut 

Cheers, 
Bert 

Bert Gunter 

"The trouble with having an open mind is that people keep coming along and 
sticking things into it." 
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) 

On Tue, Jun 12, 2018 at 6:32 AM, Massimo Bressan < 
massimo.bres...@arpa.veneto.it > wrote: 

# considering this data.frame as a reproducible example 
d<-data.frame(i=c(1,2,3), s=c('97,98,99','103,105', '118'), stringsAsFactors = 
FALSE) 
d 

#I need to get this final result 
r<-data.frame(i=c(1,1,1,2,2,3), s=c(97, 98, 99, 103, 105, 118)) 
r 

#this is my attempt 

#number of components for each element (3) of the list 
#returned by strsplit 
n<-unlist(lapply(strsplit(d$s,','), length)) 

#extract components of all elements of the list 
s<-cbind(unlist(strsplit(d$s,','))) 

#replicate each element of i 
#by the number of components of each element of the list 
i<-rep(d$i, n) 
i 

#compose final result 
r_final<-data.frame(i,s, stringsAsFactors = FALSE) 
r_final 

#I'm not much satisfied by the approach, it seems to me a bit clumsy... 

#any help for improving it? 
#thanks 
#a novice 

[[alternative HTML version deleted]] 

__ 
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] extract and re-arrange components of data frame

2018-06-12 Thread Massimo Bressan

# considering this data.frame as a reproducible example 
d<-data.frame(i=c(1,2,3), s=c('97,98,99','103,105', '118'), stringsAsFactors = 
FALSE) 
d 

#I need to get this final result 
r<-data.frame(i=c(1,1,1,2,2,3), s=c(97, 98, 99, 103, 105, 118)) 
r 

#this is my attempt 

#number of components for each element (3) of the list 
#returned by strsplit 
n<-unlist(lapply(strsplit(d$s,','), length)) 

#extract components of all elements of the list 
s<-cbind(unlist(strsplit(d$s,','))) 

#replicate each element of i 
#by the number of components of each element of the list 
i<-rep(d$i, n) 
i 

#compose final result 
r_final<-data.frame(i,s, stringsAsFactors = FALSE) 
r_final 

#I'm not much satisfied by the approach, it seems to me a bit clumsy... 

#any help for improving it? 
#thanks 
#a novice 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

#ok, finally this is my final "best and more compact" solution of the problem 
by merging different contributions (thanks to all indeed) 

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
 
l<-sapply(unique(t$A), function(x) t$id[which(t$A==x)]) 
r<-data.frame(unique_A= unique(t$A), list_id=unlist(lapply(l, paste, collapse = 
", "))) 
r 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

thank you for the help 

this is my solution based on your valuable hint but without the need to pass 
through the use of a 'tibble' 

x<-data.frame(id=LETTERS[1:10], A=c(123,345,123,678,345,123,789,345,123,789)) 
uA<-unique(x$A) 
idx<-lapply(uA, function(v) which(x$A %in% v)) 
vals<- lapply(idx, function(index) x$id[index]) 
data.frame(unique_A = uA, list_vals=unlist(lapply(vals, paste, collapse = ", 
"))) 

best 



Da: "Ben Tupper"  
A: "Massimo Bressan"  
Cc: "r-help"  
Inviato: Giovedì, 7 giugno 2018 14:47:55 
Oggetto: Re: [R] aggregate and list elements of variables in data.frame 

Hi, 

Does this do what you want? I had to change the id values to something more 
obvious. It uses tibbles which allow each variable to be a list. 

library(tibble) 
library(dplyr) 
x <- tibble(id=LETTERS[1:10], 
A=c(123,345,123,678,345,123,789,345,123,789)) 
uA <- unique(x$A) 
idx <- lapply(uA, function(v) which(x$A %in% v)) 
vals <- lapply(idx, function(index) x$id[index]) 

r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals) 


> r 
# A tibble: 4 x 3 
unique_A list_idx list_vals 
   
1 123.   
2 345.   
3 678.   
4 789.   
> r$list_idx[1] 
[[1]] 
[1] 1 3 6 9 

> r$list_vals[1] 
[[1]] 
[1] "A" "C" "F" "I" 


Cheers, 
ben 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

sorry, but by further looking at the example I just realised that the posted 
solution it's not completely what I need because in fact I do not need to get 
back the 'indices' but instead the corrisponding values of column A 

#please consider this new example 

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
 
t 

# I need to get this result 
r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('18,20,27,4','91,54,15','68','26,97')) 
r 

# any help for this, please? 





Da: "Massimo Bressan"  
A: "r-help"  
Inviato: Giovedì, 7 giugno 2018 10:09:55 
Oggetto: Re: aggregate and list elements of variables in data.frame 

thanks for the help 

I'm posting here the complete solution 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t$A <- factor(t$A) 
l<-sapply(levels(t$A), function(x) which(t$A==x)) 
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) 
r<-cbind(unique_A=row.names(r),r) 
row.names(r)<-NULL 
r 

best 



Da: "Massimo Bressan"  
A: "r-help"  
Inviato: Mercoledì, 6 giugno 2018 10:13:10 
Oggetto: aggregate and list elements of variables in data.frame 

#given the following reproducible and simplified example 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t 

#I need to get the following result 

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
r 

# i.e. aggregate over the variable "A" and list all elements of the variable 
"id" satisfying the criteria of having the same corrisponding value of "A" 
#any help for that? 

#so far I've just managed to "aggregate" and "count", like: 

library(sqldf) 
sqldf('select count(*) as count_id, A as unique_A from t group by A') 

library(dplyr) 
t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 

# thank you 


-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
-------- 


-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

thanks for the help 

I'm posting here the complete solution 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t$A <- factor(t$A) 
l<-sapply(levels(t$A), function(x) which(t$A==x)) 
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) 
r<-cbind(unique_A=row.names(r),r) 
row.names(r)<-NULL 
r 

best 



Da: "Massimo Bressan"  
A: "r-help"  
Inviato: Mercoledì, 6 giugno 2018 10:13:10 
Oggetto: aggregate and list elements of variables in data.frame 

#given the following reproducible and simplified example 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t 

#I need to get the following result 

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
r 

# i.e. aggregate over the variable "A" and list all elements of the variable 
"id" satisfying the criteria of having the same corrisponding value of "A" 
#any help for that? 

#so far I've just managed to "aggregate" and "count", like: 

library(sqldf) 
sqldf('select count(*) as count_id, A as unique_A from t group by A') 

library(dplyr) 
t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 

# thank you 


-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate and list elements of variables in data.frame

2018-06-06 Thread Massimo Bressan

#given the following reproducible and simplified example 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t 

#I need to get the following result 

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
r 

# i.e. aggregate over the variable "A" and list all elements of the variable 
"id" satisfying the criteria of having the same corrisponding value of "A" 
#any help for that? 

#so far I've just managed to "aggregate" and "count", like: 

library(sqldf) 
sqldf('select count(*) as count_id, A as unique_A from t group by A') 

library(dplyr) 
t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 

# thank you 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assign NA to rows by test on multiple columns of a data frame

2017-11-22 Thread Massimo Bressan

yes, it works, even if I do not really get how and why it's working the 
combination of logical results (could you provide some insights for that?)

moreover, and most of all, I was hoping for a compact solution because I need 
to deal with MANY columns (more than 40) in data frame with the same basic 
structure as the simplified example I posted 

thanks

m


- Messaggio originale -
Da: "Bert Gunter" <bgunter.4...@gmail.com>
A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it>
Cc: "r-help" <r-help@r-project.org>
Inviato: Mercoledì, 22 novembre 2017 17:32:33
Oggetto: Re: [R] assign NA to rows by test on multiple columns of a data frame

Do you mean like this:

mydf <- within(mydf, {
  is.na(A)<- !A_flag
  is.na(B)<- !B_flag
  }
   )

> mydf
   A A_flag  B B_flag
1  8 10  5 12
2 NA  0  6  9
3 10  1 NA  0
4 NA  0  1  5
5  5  2 NA  0


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Nov 22, 2017 at 2:34 AM, Massimo Bressan <
massimo.bres...@arpa.veneto.it> wrote:

>
>
> Given this data frame (a simplified, essential reproducible example)
>
>
>
>
> A<-c(8,7,10,1,5)
>
> A_flag<-c(10,0,1,0,2)
>
> B<-c(5,6,2,1,0)
>
> B_flag<-c(12,9,0,5,0)
>
>
>
>
> mydf<-data.frame(A, A_flag, B, B_flag)
>
>
>
>
> # this is my initial df
>
> mydf
>
>
>
>
> I want to get to this final situation
>
>
>
>
> i<-which(mydf$A_flag==0)
>
> mydf$A[i]<-NA
>
>
>
>
> ii<-which(mydf$B_flag==0)
>
> mydf$B[ii]<-NA
>
>
>
>
> # this is my final df
>
> mydf
>
>
>
>
> By considering that I have to perform this task in a data frame with many
> columns I’m wondering if there is a compact and effective way to get the
> final result with just one ‘sweep’ of the dataframe?
>
>
>
>
> I was thinking to the function apply or lapply but I can not properly
> conceive how to…
>
>
>
>
> any hint for that?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assign NA to rows by test on multiple columns of a data frame

2017-11-22 Thread Massimo Bressan

...well, I don't think this is exactly the expected result (see my post)

to be noted that the columns affected should be "A" and "B"

thanks for the help

max

- Messaggio originale -
Da: "Rui Barradas" <ruipbarra...@sapo.pt>
A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it>, "r-help" 
<r-help@r-project.org>
Inviato: Mercoledì, 22 novembre 2017 11:49:08
Oggetto: Re: [R] assign NA to rows by test on multiple columns of a data frame

Hello,

Try the following.


icol <- which(grepl("flag", names(mydf)))
mydf[icol] <- lapply(mydf[icol], function(x){
 is.na(x) <- x == 0
 x
 })

mydf
#   A A_flag B B_flag
#1  8 10 5 12
#2  7 NA 6  9
#3 10  1 2 NA
#4  1 NA 1  5
#5  5      2 0 NA


Hope this helps,

Rui Barradas

On 11/22/2017 10:34 AM, Massimo Bressan wrote:
> 
> 
> Given this data frame (a simplified, essential reproducible example)
> 
> 
> 
> 
> A<-c(8,7,10,1,5)
> 
> A_flag<-c(10,0,1,0,2)
> 
> B<-c(5,6,2,1,0)
> 
> B_flag<-c(12,9,0,5,0)
> 
> 
> 
> 
> mydf<-data.frame(A, A_flag, B, B_flag)
> 
> 
> 
> 
> # this is my initial df
> 
> mydf
> 
> 
> 
> 
> I want to get to this final situation
> 
> 
> 
> 
> i<-which(mydf$A_flag==0)
> 
> mydf$A[i]<-NA
> 
> 
> 
> 
> ii<-which(mydf$B_flag==0)
> 
> mydf$B[ii]<-NA
> 
> 
> 
> 
> # this is my final df
> 
> mydf
> 
> 
> 
> 
> By considering that I have to perform this task in a data frame with many 
> columns I’m wondering if there is a compact and effective way to get the 
> final result with just one ‘sweep’ of the dataframe?
> 
> 
> 
> 
> I was thinking to the function apply or lapply but I can not properly 
> conceive how to…
> 
> 
> 
> 
> any hint for that?
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 

Massimo Bressan 

ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545
fax: +39 0422 558516
e-mail: massimo.bres...@arpa.veneto.it

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] assign NA to rows by test on multiple columns of a data frame

2017-11-22 Thread Massimo Bressan



Given this data frame (a simplified, essential reproducible example) 




A<-c(8,7,10,1,5) 

A_flag<-c(10,0,1,0,2) 

B<-c(5,6,2,1,0) 

B_flag<-c(12,9,0,5,0) 




mydf<-data.frame(A, A_flag, B, B_flag) 




# this is my initial df 

mydf 




I want to get to this final situation 




i<-which(mydf$A_flag==0) 

mydf$A[i]<-NA 




ii<-which(mydf$B_flag==0) 

mydf$B[ii]<-NA 




# this is my final df 

mydf 




By considering that I have to perform this task in a data frame with many 
columns I’m wondering if there is a compact and effective way to get the final 
result with just one ‘sweep’ of the dataframe? 




I was thinking to the function apply or lapply but I can not properly conceive 
how to… 




any hint for that? 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighted average grouped by variables

2017-11-09 Thread Massimo Bressan

hi thierry 

thanks for your reply 

yes, you are right, your solution is more straightforward 

best 


Da: "Thierry Onkelinx" <thierry.onkel...@inbo.be> 
A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> 
Cc: "r-help" <r-help@r-project.org> 
Inviato: Giovedì, 9 novembre 2017 15:17:31 
Oggetto: Re: [R] weighted average grouped by variables 

Dear Massimo, 

It seems straightforward to use weighted.mean() in a dplyr context 

library(dplyr) 
mydf %>% 
group_by(date_time, type) %>% 
summarise(vel = weighted.mean(speed, n_vehicles)) 

Best regards, 



ir. Thierry Onkelinx 
Statisticus / Statistician 

Vlaamse Overheid / Government of Flanders 
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND 
FOREST 
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance 
thierry.onkel...@inbo.be 
Kliniekstraat 25, B-1070 Brussel 
www.inbo.be 

///
 
To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of. ~ Sir Ronald Aylmer Fisher 
The plural of anecdote is not data. ~ Roger Brinner 
The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data. ~ John 
Tukey 
///
 


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel 
naar het Herman Teirlinckgebouw op de site Thurn & Taxis. 
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. 

///
 



-- 

-------- 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighted average grouped by variables

2017-11-09 Thread Massimo Bressan

Hello 

an update about my question: I worked out the following solution (with the 
package "dplyr") 

library(dplyr) 

mydf%>% 
mutate(speed_vehicles=n_vehicles*mydf$speed) %>% 
group_by(date_time,type) %>% 
summarise( 
sum_n_times_speed=sum(speed_vehicles), 
n_vehicles=sum(n_vehicles), 
vel=sum(speed_vehicles)/sum(n_vehicles) 
) 


In fact I was hoping to manage everything in a "one-go": i.e. without the need 
to create the "intermediate" variable called "speed_vehicles" and with the use 
of the function weighted.mean() 

any hints for a different approach much appreciated 

thanks 



Da: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> 
A: "r-help" <r-help@r-project.org> 
Inviato: Giovedì, 9 novembre 2017 12:20:52 
Oggetto: weighted average grouped by variables 

hi all 

I have this dataframe (created as a reproducible example) 

mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = 
c("POSIXct", "POSIXt"), tzone = ""), 
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), 
class = "factor"), 
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", 
"heavy_duty", "motorcycle"), class = "factor"), 
avg_speed = c(41.1029082774049, 40.3, 40.3157894736842, 
36.0869565217391, 33.4065155807365, 37.6, 35.5), 
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), 
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"), 
row.names = c(NA, -7L), 
class = "data.frame") 

mydf 

and I need to get to this final result 

mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 
1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), 
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", 
"heavy_duty", "motorcycle"), class = "factor"), 
weighted_avg_speed = c(36.39029, 38.56521, 37.5, 36.08696), 
n_vehicles = c(1153L,69L,45L,23L)), 
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), 
row.names = c(NA, -4L), 
class = "data.frame") 

mydf_final 


my question: 
how to compute a weighted mean i.e. "weighted_avg_speed" 
from "speed" (the values whose weighted mean is to be computed) and 
"n_vehicles" (the weights) 
grouped by "date_time" and "type"? 

to be noted the complication of the case "motorcycle" (not present in both 
directions) 

any help for that? 

thank you 

max 



-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] weighted average grouped by variables

2017-11-09 Thread Massimo Bressan

hi all 

I have this dataframe (created as a reproducible example) 

mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = 
c("POSIXct", "POSIXt"), tzone = ""), 
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), 
class = "factor"), 
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", 
"heavy_duty", "motorcycle"), class = "factor"), 
avg_speed = c(41.1029082774049, 40.3, 40.3157894736842, 
36.0869565217391, 33.4065155807365, 37.6, 35.5), 
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), 
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"), 
row.names = c(NA, -7L), 
class = "data.frame") 

mydf 

and I need to get to this final result 

mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 
1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), 
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", 
"heavy_duty", "motorcycle"), class = "factor"), 
weighted_avg_speed = c(36.39029, 38.56521, 37.5, 36.08696), 
n_vehicles = c(1153L,69L,45L,23L)), 
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), 
row.names = c(NA, -4L), 
class = "data.frame") 

mydf_final 


my question: 
how to compute a weighted mean i.e. "weighted_avg_speed" 
from "speed" (the values whose weighted mean is to be computed) and 
"n_vehicles" (the weights) 
grouped by "date_time" and "type"? 

to be noted the complication of the case "motorcycle" (not present in both 
directions) 

any help for that? 

thank you 

max 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] average at specific hour "endpoints" of the day

2017-04-07 Thread Massimo Bressan

hi jeff

thank you for your code, there is lot to think about it...

In the meanwhile I've managed to work out a (sort of) solution but I'm still 
not completely satisfied with it

I would like to keep it all more elegant and possibly general

here it is, so far...



mydate<-seq(ISOdatetime(2017,1, 1, 0, 0, 0), by="hour", length.out = 48)
v1<-1:48
mydf<-data.frame(mydate,v1)

library(zoo)

z<-zoo(mydf[,-1], mydf[,1])

z8<-rollapply(z, width=8, FUN=mean, align="right")
iz8<-which(as.numeric(strftime(index(z8), '%H'))==6)
z8<-z8[iz8]

z16<-rollapply(z, width=16, FUN=mean, align="right")
iz16<-which(as.numeric(strftime(index(z16), '%H'))==22)
z16<-z16[iz16]

fortify.zoo(z16)
fortify.zoo(z8)

#and then any sort of manipulation with dataframes



bye

- Messaggio originale -
Da: "Jeff Newmiller" <jdnew...@dcn.davis.ca.us>
A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it>
Cc: "r-help" <r-help@r-project.org>
Inviato: Giovedì, 6 aprile 2017 18:19:29
Oggetto: Re: [R] average at specific hour "endpoints" of the day

On Thu, 6 Apr 2017, Massimo Bressan wrote:

> hello
>
> given my reproducible example
>
> #---
> date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48)
> v1<-1:48
> df<-data.frame(date,v1)
>
> #--

"date" and "df" are functions in base R... best to avoid hiding them by 
re-using those names in the global environment

ISOdate forces GMT, which many data sets that you might work with do NOT 
use. It is better to use ISOdatetime to avoid letting hidden code 
determine the timezone that is applied to (or compared with) your data.

>
> I need to calculate the average of variable v1 at specific hour "endpoints" 
> of the day: i.e. at hours 6.00 and 22.00 respectively
>
> the desired result is
>
> date v1
> 01/01/17 22:00 15.5
> 02/01/17 06:00 27.5
> 02/01/17 22:00 39.5
>
> at hour 06:00 of each day the average is calculated by considering the 8 
> previous records (hours from 23:00 to 6:00)
> at hour 22:00 of each day the average is calculated by considering the 16 
> previous records (hours from 7:00 to 22:00)
>
> any hint please?
>
> I've been trying with some functions within the "xts" package but withouth 
> much result...

I am not sure how I would do this with xts, but the below code is one 
fairly literal approach (implemented two ways) to translate your 
requirements that is also potentially extensible if the data or 
requirements change.

### Base R

Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
# the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
  , by="hour"
  , length.out = 48
  )
  , v1 = 1:48
  )
dta$nrec <- 1
dta$date <- as.POSIXct( trunc.POSIXt( dta$datetime, units="days" ) )
dta$tod <- as.numeric( dta$datetime - dta$date, units = "hours" )
dta$timeslot <- factor( ifelse( 6 < dta$tod & dta$tod <= 22
   , "Day"
   , "Night"
   )
   , levels = c( "Night", "Day" )
   )
dta$slotdatetime <- dta$date + as.difftime( ifelse( "Day" == dta$timeslot
   , 22
   , ifelse( 22 < dta$tod
   , 24+6
   , 6
   )
   )
   , units="hours"
   )
dta2 <- aggregate( dta[ , c( "v1", "nrec" ) ]
  , dta[ , c( "timeslot", "slotdatetime" ), drop=FALSE ]
  , FUN = sum
  )
dta2 <- subset( dta2, nrec == ifelse( "Day"==timeslot, 16, 8 ) )
dta2$v1mean <- dta2$v1 / dta2$nrec

 or if you don't mind the tidyverse

library(dplyr) # wonderland of non-standard evaluation... beware, Alice!
Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
# the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
  , by="hour"
  , length.out = 48
  )

[R] average at specific hour "endpoints" of the day

2017-04-06 Thread Massimo Bressan

hello 

given my reproducible example 

#--- 
date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48) 
v1<-1:48 
df<-data.frame(date,v1) 

#-- 

I need to calculate the average of variable v1 at specific hour "endpoints" of 
the day: i.e. at hours 6.00 and 22.00 respectively 

the desired result is 

date v1 
01/01/17 22:00 15.5 
02/01/17 06:00 27.5 
02/01/17 22:00 39.5 

at hour 06:00 of each day the average is calculated by considering the 8 
previous records (hours from 23:00 to 6:00) 
at hour 22:00 of each day the average is calculated by considering the 16 
previous records (hours from 7:00 to 22:00) 

any hint please? 

I've been trying with some functions within the "xts" package but withouth much 
result... 

thanks for the help 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

2016-05-14 Thread Massimo Bressan

thank you, what a nice compact solution with ave() 

I learned something new about the subtleties of R 

let me here summarize the alternative solutions, just in case someonelse might 
be interested... 

thanks, bye 

# 

# my user function (an example) 
mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, 
na.rm=TRUE))} 

# my dataframe to apply the formula by blocks 
mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), 
v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) 

# blocks (factors) to be used for splitting 
b <- mydf$blocks 

# 1 - split-lapply-unsplit with anonimous function to return a new df 
s <- split(mydf, b) 
l<- lapply(s, function(x) data.frame(x, v1mod=mynorm(x$v1))) 
mydf_new <- unsplit(l, mydf$blocks) 

# 2 - split-lapply-unsplit with function trasnform to return a new df 
l <- split(mydf, b) 
l <- lapply(l, transform, v1.mod = mynorm(v1)) 
mydf_new <- unsplit(l, b) 

# 3 - ave() encapsulating split-lapply-unsplit approach 
mydf_new<-transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm)) 

# 





Da: "William Dunlap" <wdun...@tibco.com> 
A: "Massimo Bressan" <massimo.bres...@arpa.veneto.it> 
Cc: "David L Carlson" <dcarl...@tamu.edu>, "r-help" <r-help@r-project.org> 
Inviato: Venerdì, 13 maggio 2016 19:22:21 
Oggetto: Re: [R] apply formula over columns by subset of rows in a dataframe 
(to get a new dataframe) 

ave() encapsulates the split/lapply/unsplit stuff so 
transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm)) 
also gives what you got above. 

Bill Dunlap 
TIBCO Software 
wdunlap tibco.com 

On Fri, May 13, 2016 at 7:44 AM, Massimo Bressan < 
massimo.bres...@arpa.veneto.it > wrote: 


yes, thanks 

you pointed me in the right direction: split/unplist was the trick 

I completely left behind that possibility! 

here the final version 

 

mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, 
na.rm=TRUE))} 

mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), 
v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) 

g <- mydf$blocks 
l <- split(mydf, g) 
l <- lapply(l, transform, v1.mod = mynorm(v1)) 
mydf_new <- unsplit(l, g) 

 

thanks again 

massimo 

__ 
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code. 





-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

2016-05-13 Thread Massimo Bressan

yes, thanks

you pointed me in the right direction: split/unplist was the trick 

I completely left behind that possibility!

here the final version



mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, 
na.rm=TRUE))} 

mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), 
v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) 

g <- mydf$blocks
l <- split(mydf, g)
l <- lapply(l, transform, v1.mod = mynorm(v1))
mydf_new <- unsplit(l, g)



thanks again

massimo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

2016-05-13 Thread Massimo Bressan

hi 

I need to apply a user defined formula over some selected columns of a 
dataframe by subsetting group of rows (blocks) and get back a new dataframe 

I’ve been managed to get the the calculations right but I’m not satisfied at 
all by the form of the results 

please refer to my reproducible example 

## 
# my user function (an example) 
mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, 
na.rm=TRUE))} 

# my dataframe to apply the formula by blocks 
mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), 
v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0)) 


#my attempts (not satisfied by final output) 

tapply(mydf$v1, mydf$blocks, mynorm) 

byf<-factor(mydf$blocks) 
aggregate(mydf[2:3], list(byf), mynorm) 
aggregate(mydf[2:3], list(mydf$blocks), mynorm, simplify = FALSE) 

### 

please can anyone give me some hints on how to properly proceed? 

I need a dataframe with all variables as final result 
sorry but I’m sort of definitely stuck with this… 

thanks 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 'split-lapply' vs. 'aggregate'

2016-03-27 Thread Massimo Bressan

this might be a trivial question (eventually sorry for that!) but I definitely 
can not catch the problem here... 

please consider the following reproducible example: why of different results 
through 'split-lapply' vs. 'aggregate'? 
I've been also through a check against different methods (e.g. data.table, 
dplyr) and the results were always consistent with 'split-lapply' but 
apparently not with 'aggregate' 

I must be certainly wrong! 
could someone point me in the right direction? 

thanks 

## 

s <- split(airquality, airquality$Month) 
ls <- lapply(s, function(x) {colMeans(x[c("Ozone", "Solar.R", "Wind")], na.rm = 
TRUE)}) 
do.call(rbind, ls) 

# slightly different results with 
aggregate(.~ Month, airquality[-c(4,6)], mean, na.rm=TRUE) 

## 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] shift by one column given rows in a dataframe

2015-07-23 Thread Massimo Bressan


by considering the following reproducible example:

v0-c(a,xxx,c,rep(xxx,2))
v1-c(1,b,3,d,e)
v2-c(6,2,8,4,5)
v3-c(xxx,7,xxx,9,10)

df_start-data.frame(v0,v1,v2,v3)
df_start

v0-letters[1:5]
v1-1:5
v2-6:10

df_end-data.frame(v0,v1,v2)
df_end

I need  to shift by one column some given rows in the initial data frame 
called df_start so that to get the final structure as in df_end;
please consider that the value xxx in the rows of df_start can be 
anything so that I necessarly need to apply by row index position (in my 
reproducible example rows: 2, 3, 5);


I'm really stuck with that problem and I can not conceive any viable 
solution up to now


any hints?

best regards

m

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aov and groups coding

2014-10-01 Thread Massimo Bressan


please consider the following example:

#start code

set.seed(123)
level-rnorm(18, 10,3)

group1-rep(letters[1:3], each=6)
summary(aov(level~group1))

group2-rep(1:3,each=6)
str(group2)
summary(aov(level~group2))

#same result as for group1
summary(aov(level~factor(group2)))

#same result ad for aov
anova(lm(level~group2))

#end code

what I would like to do is to perform an anova among groups (analysis of 
variance for three different gruops);
consider that groups are completely arbitrary: they are not intended to 
have any sort of scaling or ordinal meaning;


in my example same groups are coded in two alternative ways: group1 as 
chr (factor) and group2 as num; so by keeping in mind my purpose (is 
there any difference in the level among groups?) I would simply consider 
the result of aov()  for group2 (num) as a non sense (with respect to my 
specific purpuse)

is that a correct interpretation?
I hope not having misinterpreted the indications of the following thread
http://r.789695.n4.nabble.com/Question-about-factor-that-is-numeric-in-aov-td2164393.html


thank you for any help

best regards

max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] group by and merge two dataframes

2014-05-09 Thread Massimo Bressan


yes thanks, that's correct!

here a slight variation inspired by your solution: a cartesian product 
restricted to non duplicated records to get the logical vector i to be 
used in the next natural join


i-!duplicated(merge(df1$id,df1$item, by=NULL))
merge(df1[i,],df2)

thanks

Il 08/05/2014 18:43, arun ha scritto:

Hi,
May be:
indx - !duplicated(as.character(interaction(df1[,-3])))
merge(df1[indx,],df2)
A.K.




On Thursday, May 8, 2014 12:34 PM, Massimo Bressanmbres...@arpa.veneto.it  
wrote:
yes, thank you for all your replies, they worked out correctly indeed...

...but because of my fault, by then working on my real data I fully
realised that I should have mentioned something that is changing (quite
a lot, in fact) the terms of the problem...

please would you consider the following (consistent) variation ?

df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2),
rep(C,2)), v=rnorm(6))
df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio))

and again I need to group the first dataframe df1 both by id and by
the first record of v, and then merge with the second dataframe df2
(again by id)

now, how to do that?
(that's why probably I was pointing in my first post to the use of sqldf)

thanks

ps: I'm in doubt wheter I must open another thread or keep going with
this one (really sorry for the eventual violation of the R-help netiquette)


Il 08/05/2014 17:14, arun ha scritto:

Hi,
May be this helps:
merge(unique(df1),df2)
A.K.





On Thursday, May 8, 2014 5:46 AM, Massimo Bressanmbres...@arpa.veneto.it  
wrote:
given this bare bone example:

df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2),
rep(C,2)))
df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio))

I need to group the first dataframe df1 by id and then merge with
the second dataframe df2 (again by id)
so far I've manged to accomplish the task by something like the following...

# start

require(sqldf)
tmp-sqldf(select * from df1 group by id)
merge(tmp, df2)

#end

now I'm wonderng if there is a more efficient and/or elegant way to
perform it (also because in fact I'm dealing with much more heavy
dataframes);

may be possible through a single sql statement?  or by using a different
package functions (e.g. dplyr)?
my attempts towards these alternative approaches miserably failed ...

thanks

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] group by and merge two dataframes

2014-05-08 Thread Massimo Bressan


given this bare bone example:

df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), 
rep(C,2)))

df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio))

I need to group the first dataframe df1 by id and then merge with 
the second dataframe df2 (again by id)

so far I've manged to accomplish the task by something like the following...

# start

require(sqldf)
tmp-sqldf(select * from df1 group by id)
merge(tmp, df2)

#end

now I'm wonderng if there is a more efficient and/or elegant way to 
perform it (also because in fact I'm dealing with much more heavy 
dataframes);


may be possible through a single sql statement?  or by using a different 
package functions (e.g. dplyr)?

my attempts towards these alternative approaches miserably failed ...

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] group by and merge two dataframes

2014-05-08 Thread Massimo Bressan


yes, thank you for all your replies, they worked out correctly indeed...

...but because of my fault, by then working on my real data I fully 
realised that I should have mentioned something that is changing (quite 
a lot, in fact) the terms of the problem...


please would you consider the following (consistent) variation ?

df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2), 
rep(C,2)), v=rnorm(6))

df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio))

and again I need to group the first dataframe df1 both by id and by 
the first record of v, and then merge with the second dataframe df2 
(again by id)


now, how to do that?
(that's why probably I was pointing in my first post to the use of sqldf)

thanks

ps: I'm in doubt wheter I must open another thread or keep going with 
this one (really sorry for the eventual violation of the R-help netiquette)


Il 08/05/2014 17:14, arun ha scritto:

Hi,
May be this helps:
  merge(unique(df1),df2)
A.K.





On Thursday, May 8, 2014 5:46 AM, Massimo Bressan mbres...@arpa.veneto.it 
wrote:
given this bare bone example:

df1 - data.frame(id=rep(1:3,each=2), item=c(rep(A,2), rep(B,2),
rep(C,2)))
df2 - data.frame(id=c(1,2,3), who=c(tizio,caio,sempronio))

I need to group the first dataframe df1 by id and then merge with
the second dataframe df2 (again by id)
so far I've manged to accomplish the task by something like the following...

# start

require(sqldf)
tmp-sqldf(select * from df1 group by id)
merge(tmp, df2)

#end

now I'm wonderng if there is a more efficient and/or elegant way to
perform it (also because in fact I'm dealing with much more heavy
dataframes);

may be possible through a single sql statement?  or by using a different
package functions (e.g. dplyr)?
my attempts towards these alternative approaches miserably failed ...

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sum of two POSIXct objects: date and hour

2014-04-30 Thread Massimo Bressan

 read in and convert the data yourself, or is this a
 source that
  you do not have any control over?  If the former, then just use the
  correct
  conversion.  As shown below, if you have hundredths of a second,
 that will
  be converted correctly and you don't need the extra column.
 
  x - as.POSIXct(2014-04-29 12:00:00.345)  # decimal seconds
 that are
  converted
 
  x
  [1] 2014-04-29 12:00:00 EDT
  format(x, format = %H:%M:%OS3)  # print with 3 decimals
  [1] 12:00:00.345
 
  If you have the choice, start over again and do it correctly.
  If not,
  convert the various components to the correct character format
 for your
  timezone, combine back together and then use the conversion
 shown above.
 
 
  Jim Holtman
  Data Munger Guru
 
  What is the problem that you are trying to solve?
  Tell me what you want to do, not how you want to do it.
 
 
  On Tue, Apr 29, 2014 at 9:06 AM, Massimo Bressan
  mbres...@arpa.veneto.it mailto:mbres...@arpa.veneto.itwrote:
 
  I have this dataframe:
 
  df-structure(list(date = structure(c(1395874800, 1395874800,
  1395874800,
  1395874800, 1395874800), class = c(POSIXct, POSIXt), tzone
 = ),
   hour = structure(c(-2209121804, -2209121567, -2209121005,
   -2209118616, -2209116160), class = c(POSIXct, POSIXt),
 tzone =
  ),
   s.100 = c(29L, 36L, 6L, 53L, 18L)), .Names = c(date, hour,
  s.100), row.names = c(NA, -5L), class = data.frame)
 
 
  and I would like to sum first two columns (date and hour)
 so that to
  end up with a new column, say date_hour, storing both the
 information
  about the date and the hour in one POSIXct object;
 
  I have been reading that POSIXct objects are a measure of
 seconds from a
  given origin (1st Jan 1970), so that a possible solution is to
 tranform
  the
  column hour into seconds and then add it to the column date;
 
  but, is there a staightforward solution for accomplishing this
 task?
  I've been trying to extract from the column hour the digits
  representing
  hours, minutes and seconds and transform everything into
 seconds but
  that
  seem to me quite cumbersome approach...
 
  and finally, one more question: is it possible to represent
 hundred of
  seconds as given in the column s.100 of the given dataframe
 within the
  same new POSIXct object date_hour?
 
 
  thanksfor the support
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sum of two POSIXct objects: date and hour

2014-04-29 Thread Massimo Bressan

I have this dataframe:

df-structure(list(date = structure(c(1395874800, 1395874800, 1395874800,
1395874800, 1395874800), class = c(POSIXct, POSIXt), tzone = ),
 hour = structure(c(-2209121804, -2209121567, -2209121005,
 -2209118616, -2209116160), class = c(POSIXct, POSIXt), tzone = ),
 s.100 = c(29L, 36L, 6L, 53L, 18L)), .Names = c(date, hour,
s.100), row.names = c(NA, -5L), class = data.frame)


and I would like to sum first two columns (date and hour) so that to end up 
with a new column, say date_hour, storing both the information about the 
date and the hour in one POSIXct object;

I have been reading that POSIXct objects are a measure of seconds from a given 
origin (1st Jan 1970), so that a possible solution is to tranform the column 
hour into seconds and then add it to the column date;

but, is there a staightforward solution for accomplishing this task?
I've been trying to extract from the column hour the digits representing 
hours, minutes and seconds and transform everything into seconds but that seem 
to me quite cumbersome approach...

and finally, one more question: is it possible to represent hundred of seconds 
as given in the column s.100 of the given dataframe within the same new 
POSIXct object date_hour?


thanksfor the support

  


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] print(cenfit object) to a data.frame

2014-04-11 Thread Massimo Bressan


thanks rui, it helps indeed..

at first, I've been trying to data.frame the output of mean (mycenfit) 
by the following:

my.df-as.data.frame(do.call(rbind, mean(mycenfit)))
and it worked out correctly!

...but because I also needed the information about n and n.cen, 
which are not provided by mean(mycenfit), I had to switch to 
print(mycenfit);
...but unfortunately, print(mycenfit) is not so easy (to me at least) to 
handle


now, I'm looking at a different possible ways to extract the same 
information directly from the object mycenfit (S4), which turned out 
to be quite hard (to me again)


any othe possible ideas?

cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] print(cenfit object) to a data.frame

2014-04-10 Thread Massimo Bressan

given this reproducible example:

#start code

df-structure(list(lq = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
FALSE, FALSE), value = c(1, 3, 1, 2, 0.5, 2, 1, 2, 3), group = 
structure(c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 2L), .Label = c(A, B, 
C), class = factor)), .Names = c(lq, value, group), row.names 
= c(NA, -9L), class = data.frame)

library(NADA)

mycenfit-with(df, cenfit(value,lq,group))

print(mycenfit)

#end code

does anybody knows how to convert the print() of the cenfit object (S4) 
mycenfit to a data frame?

sorry, this might be a trivial question but for some reasons I do not 
understand I got completely stuck on this...
I've seen similar questions pointed out in the mailing list but for a 
surfit object which do not seem to properly apply in my specific case

any help much appreciated, thank you



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plot of a bagging tree

2013-12-05 Thread Massimo Bressan


by considering this general example

##start code

library(ipred)
data(Ionosphere, package = mlbench)
Ionosphere$V2 - NULL # constant within groups
iono-bagging(Class ~ ., data=Ionosphere, coob=TRUE)
print(iono)

##end code

does anybody knows any possibility to plot the (average) plot of the 
bagging?

does it make any sense at least for a visual presentation?
how to *visually* convey the information provided by the bagging model?

thank you for any feedback

best

max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] interpretation of MDS plot in random forest

2013-12-03 Thread Massimo Bressan


here it is an amended (more general) version

library(randomForest)
set.seed(1)
data(iris)
iris.rf - randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE)

x-MDSplot(iris.rf, iris$Species)
#add legend
legend(topleft, legend=levels(iris.rf$predicted), 
fill=brewer.pal(length(levels(iris.rf$predicted)), Set1))

#str(x)
# need to identify points?
text(x$points,labels=attr(x$points,dimnames)[[1]], cex=0.5)

bye

m


Il 03/12/2013 12:15, mbres...@arpa.veneto.it ha scritto:

sorry, in fact it was a trivial question!

by just peeping into the function I've worked out this simple solution:

MDSplot(iris.rf, iris$Species)
legend(topleft, legend=levels(iris$Species), fill=brewer.pal(3, Set1))

thank you


thanks andy

it's a real honour form me to get a reply by you;
I'm still a bit faraway from a proper grasp of the purpose of the plot...

may I ask you for a more technical (trivial) issue?
is it possible to add a legend in the MDS plot?
my problem is to link the color points in the chart to the factor that was
used as response to train rf, how to?

best

max


Yes, that's part of the intention anyway.  One can also use them to do
clustering.

Best,
Andy

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Massimo Bressan
Sent: Monday, December 02, 2013 6:34 AM
To: r-help@r-project.org
Subject: [R] interpretation of MDS plot in random forest

Given this general example:

set.seed(1)

data(iris)

iris.rf - randomForest(Species ~ ., iris, proximity=TRUE,
keep.forest=TRUE)

#varImpPlot(iris.rf)

#varUsed(iris.rf)

MDSplot(iris.rf, iris$Species)

Iâ€™ve been reading the documentation about random forest (at best of my
-
poor - knowledge) but Iâ€™m in trouble with the correct interpretation
of
the MDS plot and I hope someone can give me some clues

What is intended for â€œthe scaling coordinates of the proximity
matrixâ€�?


I think to understand that the objective is here to present the distance
among species in a parsimonious and visual way (of lower dimensionality)

Is therefore a parallelism to what are intended the principal components
in a classical PCA?

Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the
proximity matrix?

If that is correct, how would you find the eigenvalues for that
eigenvectors? And what are the eigenvalues repreenting?


What are saying these two dimensions in the plot about the different
iris species? Their relative distance in terms of proximity within the
space DIM1 and DIM2?

How to choose for the k parameter (number of dimensions for the scaling
coordinates)?

And finally how would you explain the plot in simple terms?

Thank you for any feedback
Best regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you
are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] interpretation of MDS plot in random forest

2013-12-02 Thread Massimo Bressan


Given this general example:

set.seed(1)

data(iris)

iris.rf - randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE)

#varImpPlot(iris.rf)

#varUsed(iris.rf)

MDSplot(iris.rf, iris$Species)

I’ve been reading the documentation about random forest (at best of my - 
poor - knowledge) but I’m in trouble with the correct interpretation of 
the MDS plot and I hope someone can give me some clues


What is intended for “the scaling coordinates of the proximity matrix”?


I think to understand that the objective is here to present the distance 
among species in a parsimonious and visual way (of lower dimensionality)


Is therefore a parallelism to what are intended the principal components 
in a classical PCA?


Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the 
proximity matrix?


If that is correct, how would you find the eigenvalues for that 
eigenvectors? And what are the eigenvalues repreenting?



What are saying these two dimensions in the plot about the different 
iris species? Their relative distance in terms of proximity within the 
space DIM1 and DIM2?


How to choose for the k parameter (number of dimensions for the scaling 
coordinates)?


And finally how would you explain the plot in simple terms?

Thank you for any feedback
Best regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] storing element number of a list in a column data frame

2013-10-04 Thread Massimo Bressan

yes, I like this: a very elegant and neat solution (in my very umble 
opinion)
sometime is so difficult to me to think of a solution in such a simple 
and effective terms: less is more!

thank you
max


Il 03/10/2013 17:12, David Carlson ha scritto:

Try this


i=which(!sapply(mytest, is.null))
n=do.call(rbind, mytest[i])
mydf - data.frame(i, n)
mydf

   i  n
1 1 45
2 3 18
3 5 99

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Massimo
Bressan
Sent: Thursday, October 3, 2013 9:42 AM
To: r-help@r-project.org
Subject: [R] storing element number of a list in a column data
frame

#let's suppose I have a list like this

mytest-list(45, NULL, 18, NULL, 99)

#to note that this is just an amended example because in fact

#I'm dealing with a long list (more than 400 elements)

#with no evident pattern of the NULL values

#I want to end up with a data frame like the following

data.frame(i=c(1,3,5), n=c(45,18,99))

#i.e. a data frame storing in

#column i the number of corresponding element list

#column n the unique component of that element

#I've been trying with

do.call(rbind, mytest)

#or

do.call(rbind.data.frame, mytest)

#but this approach is not properly achieving the desired result

#now I'm in trouble on how to store each element number of the
list in
the first column data frame

#any help for this?

#thanks


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] storing element number of a list in a column data frame

2013-10-03 Thread Massimo Bressan

#let's suppose I have a list like this

mytest-list(45, NULL, 18, NULL, 99)

#to note that this is just an amended example because in fact

#I'm dealing with a long list (more than 400 elements)

#with no evident pattern of the NULL values

#I want to end up with a data frame like the following

data.frame(i=c(1,3,5), n=c(45,18,99))

#i.e. a data frame storing in

#column i the number of corresponding element list

#column n the unique component of that element

#I've been trying with

do.call(rbind, mytest)

#or

do.call(rbind.data.frame, mytest)

#but this approach is not properly achieving the desired result

#now I'm in trouble on how to store each element number of the list in 
the first column data frame

#any help for this?

#thanks


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] storing element number of a list in a column data frame

2013-10-03 Thread Massimo Bressan

the list I'm dealing with is the follow-up of an lapply() not a native 
data structure I've been set up for storing data originally;
the list it's a data structure I have to manage as a consequence of my 
previous operations, something like:

path-./
files -list.files(path, pattern=.csv)
mylist-lapply(files, function(files) 
na.omit(read.csv(paste0(path,/,files)), header=TRUE))


anyway, thank you for the good hint: is.null seems promising
cheers
m


Il 03/10/2013 16:55, Bert Gunter ha scritto:

Have you read An Introduction to R (ships with R) or another of the
many excellent R tutorials on the web? I ask, because you do not
appear to be using a sensible data structure. As your list appears to
be of a single type (probably numeric, maybe integer), it would be
preferable to use a vector, like this:

y - c(45, NA, 18, NA, 99)

(The NULLS must be converted to NA's to hold their places).

There would then seem to be little need for the data frame structure,
as it tends to slow things down in R. But if you insist,

which(is.na(y))

will give you the indices of the NA's.

See also: ?is.na  ?is.null.


Cheers,
Bert



On Thu, Oct 3, 2013 at 7:41 AM, Massimo Bressanmbres...@arpa.veneto.it  wrote:

#let's suppose I have a list like this

mytest-list(45, NULL, 18, NULL, 99)

#to note that this is just an amended example because in fact

#I'm dealing with a long list (more than 400 elements)

#with no evident pattern of the NULL values

#I want to end up with a data frame like the following

data.frame(i=c(1,3,5), n=c(45,18,99))

#i.e. a data frame storing in

#column i the number of corresponding element list

#column n the unique component of that element

#I've been trying with

do.call(rbind, mytest)

#or

do.call(rbind.data.frame, mytest)

#but this approach is not properly achieving the desired result

#now I'm in trouble on how to store each element number of the list in
the first column data frame

#any help for this?

#thanks


 [[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] triax.plot: control legend position and size of point labels

2012-12-20 Thread Massimo Bressan

thanks jim,
I had to move the legend to a different place (as the default) because 
for some reasons it appeared to me sliced on the left side (but I'm 
not sure if that was due to my own configuration, anyway...); I think 
that the possibility to control the size of the point labels would be a 
good new feature (even if not essential) in the triax.plot function
thank you for your valuable work
best

massmi



Il 20/12/2012 11:50, Jim Lemon ha scritto:
 On 12/19/2012 11:18 PM, maxbre wrote:
 Given this example

 library(plotrix)

 a-c(34,10,70)
 b-c(33,10,20)
 c-c(33,80,10)

 test-data.frame(A=a,B=b,C=c)

 triax.plot(test,
 main =title,
 at=seq(0.25,0.75,by=0.25),
 tick.labels=list(l=seq(0.25,0.75,by=0.25),
 r=seq(0.25,0.75,by=0.25),
 b=seq(0.25,0.75,by=0.25)),
 align.labels=TRUE,
 show.grid=TRUE,
 cc.axes=TRUE,
 show.legend=TRUE,
 label.points=TRUE,
 point.labels=c(case 1,case 2, case 3),
 col.symbols=c(red,blue,green),
 cex.ticks=0.8,
 cex.axis=0.8,
 lty.grid=2,
 pch=17
 )

 I would like to control the position of the legend (to be moved to a
 different place) and the size of point labels (to be reduced)

 Iâve been trying to work out the solution with par() but without much
 success, any help for this?

 Hi maxbre,
 For the legend, I would suggest calling triax.plot  with legend=FALSE 
 and then adding the legend where you want it. To change the size of 
 the point labels you would have to modify the function. Here is the 
 triax.points function with the necessary modification:

 triax.points-function(x,show.legend=FALSE,label.points=FALSE,
  point.labels=NULL,col.symbols=par(fg),pch=par(pch),
  bg.symbols=par(bg),cc.axes=FALSE,...) {

  if(dev.cur() == 1)
   stop(Cannot add points unless the triax.frame has been drawn)
  if(missing(x))
   stop(Usage: triax.points(x,...)\n\twhere x is a 3 column array of 
 proportions or percentages)
  if(!is.matrix(x)  !is.data.frame(x))
   stop(x must be a matrix or data frame with at least 3 columns and 
 one row.)
  if(any(x  1) || any(x  0)) {
   if(any(x  0))
stop(All proportions must be between zero and one.)
   if(any(x  100))
stop(All percentages must be between zero and 100.)
   # convert percentages to proportions
   x-x/100
  }
  if(any(abs(rowSums(x)-1)  0.01))
   warning(At least one set of proportions does not equal one.)
  sin60-sin(pi/3)
  if(cc.axes) {
   ypos-x[,3]*sin60
   xpos-x[,1]+x[,3]*0.5
  }
  else {
   ypos-x[,3]*sin60
   xpos-1-(x[,1]+x[,3]*0.5)
  }
  nobs-dim(x)[1]
  points(x=xpos,y=ypos,pch=pch,col=col.symbols,bg=bg.symbols,type=p,...)
  if(is.null(point.labels)) point.labels-rownames(x)
  if(label.points) 
 thigmophobe.labels(xpos,ypos,point.labels,cex=par(cex.axis))
  if(show.legend) {
   legend(0.2,0.7,legend=point.labels,pch=pch,col=col.symbols,
xjust=1,yjust=0)
  }
  invisible(list(x=xpos,y=ypos))
 }

 Note that some lines above may have been broken by the email client. 
 If so, stitch them back together with a text editor before trying to 
 source the file. I will probably add something like this to 
 triax.plot in the next version of plotrix.

 Jim




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice dotplot reorder contiguous levels

2012-09-21 Thread Massimo Bressan

thank you all for your helpful replies

to bert
the problem with relation =same is the plotting along y axis of all 
categories (samp.time) for all groups (sites); instead, I need to 
plot along y axis just the categories for each group effectively having 
a corresponding observation

to danny
your solution with ggplot works like a charm; I will keep an eye to 
ggplot2 for other new charts but for some reasons now I must stick on 
lattice (which is also my favourite by the way)

to richard
result3 is close to what I need but still not exactly what Iâm aiming to;
I need y axis to be plotted as categories not as a continuous scaleâ¦
I need the result proposed by danny to be translated into latticeâ¦

to deepayan
thank you so much, Iâll carefully consider your hint and Iâll try to 
make it works (but still Iâm not sure I fully understand how to do it);
as long as I can get to a viable solution Iâll post it back


Just for your information my effort up to now (NOT WORKING!) was 
pointing toward this direction:
1- new character variable from samp.time to be used later as label for 
the plot:
test$samp.lab-as.character(test$samp.time)

2- new factor variable with as many levels as the observations:
test$samp.id - gl(length(test$samp.time), 1)

3- new factor variable based on âsamp.timeâ but with the order of levels 
based on âsiteâ, something like this (I think this is the crucial point 
where I'm likely to fail):
test$samp.time.site - with(test, reorder(samp.time, as.numeric(site)))

4- new numeric level:
nl - as.numeric(levels(test$samp.time.site))

5- plotting (wrong):
dotplot(samp.time~conc|site, data=test,
ylim=test$samp.lab[nl],
scales=list(x=list(log=10), y = list(relation = free)),
layout=c(1,5), strip=FALSE, strip.left=TRUE
)

Il 21/09/2012 08:55, Deepayan Sarkar ha scritto:
 On Thu, Sep 20, 2012 at 7:48 PM, maxbrembres...@arpa.veneto.it  wrote:
 my reproducible example

 test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A,
 B, C, D, E), class = factor), conc = c(2.32, 0.902,
 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
 3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L,
 4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L,
 3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L,
 8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c(2,
 4, 12, 24, 96, 135, 167, 168, 169), class = factor)),
 .Names = c(site,
 conc, samp.time), row.names = c(NA, 52L), class = data.frame)



 dotplot(samp.time~conc|site, data=test,
  scales=list(x=list(log=10), y = list(relation = free)),
  layout=c(1,5), strip=FALSE, strip.left=TRUE
  )


 my objective is to use âsiteâ as conditioning variable but with 
 âsamp.timeâ
 correctly grouped by âsiteâ; the problem here is to ensure that levels of
 âsamp.timeâ within each âsiteâ are contiguous as otherwise they 
 would be not
 contiguous in the dot plot itself (i.e, avoid that sort of holes in between
 y axis categories -see dotplot -)


 Iâve been trying with this but without much success

 test$samp.time.new-
with(test,reorder(samp.time,as.numeric(site)))


 dotplot(samp.time.new~conc|site, data=test,
  scales=list(x=list(log=10), y = list(relation = free)),
  layout=c(1,5), strip=FALSE, strip.left=TRUE
  )

 I think (I hope) a possible different solution is to create for ylim a
 proper character vector of different length to pass to each panel of the
 dotplot (Iâm not posting this attempt because too much confused up to now)

 can anyone point me in the right direction?
 The problem here is that there is crossing between sites and
 samp.time. You can try some imaginative permutations of site, such as

 test$samp.time.new - with(test, reorder(samp.time,
 as.numeric(factor(site, levels = c(A, C, D, B, E)

 which gets all but site B right. There may be another permutation that
 works for everything, but it would be much easier to make a nested
 factor, i.e.,

 test$samp.time.new - with(test, reorder(samp.time:site, as.numeric(site)))

 That just leaves getting the y-labels right, which I will leave for
 you to figure out.

 (Hint: ylim = some_function_of(levels(test$samp.time.new)))

 -Deepayan




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice dotplot reorder contiguous levels

2012-09-21 Thread Massimo Bressan


deepayan, is that what you mean?
but still the problem persists: nor correct neither contiguous labelling!
I must probably reconsider everything from scratch: I'm bit confused now...

test$samp.time.new - with(test, reorder(samp.time:site, as.numeric(site)))

s-strsplit(levels(test$samp.time.new), :)
s1- sapply(s, '[', 1)

dotplot(samp.time~conc|site, data=test,
ylim=s1,
scales=list(x=list(log=10), y = list(relation = free)),
layout=c(1,5), strip=FALSE, strip.left=TRUE
)

Il 21/09/2012 08:55, Deepayan Sarkar ha scritto:

On Thu, Sep 20, 2012 at 7:48 PM, maxbre mbres...@arpa.veneto.it wrote:

my reproducible example

test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A,
B, C, D, E), class = factor), conc = c(2.32, 0.902,
0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L,
4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L,
3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L,
8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c(2,
4, 12, 24, 96, 135, 167, 168, 169), class = factor)),
.Names = c(site,
conc, samp.time), row.names = c(NA, 52L), class = data.frame)



dotplot(samp.time~conc|site, data=test,
 scales=list(x=list(log=10), y = list(relation = free)),
 layout=c(1,5), strip=FALSE, strip.left=TRUE
 )


my objective is to use “site” as conditioning variable but with “samp.time”
correctly grouped by “site”; the problem here is to ensure that levels of
“samp.time” within each “site” are contiguous as otherwise they would be not
contiguous in the dot plot itself (i.e, avoid that sort of holes in between
y axis categories -see dotplot -)


I’ve been trying with this but without much success

test$samp.time.new-
   with(test,reorder(samp.time,as.numeric(site)))


dotplot(samp.time.new~conc|site, data=test,
 scales=list(x=list(log=10), y = list(relation = free)),
 layout=c(1,5), strip=FALSE, strip.left=TRUE
 )

I think (I hope) a possible different solution is to create for ylim a
proper character vector of different length to pass to each panel of the
dotplot (I’m not posting this attempt because too much confused up to now)

can anyone point me in the right direction?

The problem here is that there is crossing between sites and
samp.time. You can try some imaginative permutations of site, such as

test$samp.time.new - with(test, reorder(samp.time,
as.numeric(factor(site, levels = c(A, C, D, B, E)

which gets all but site B right. There may be another permutation that
works for everything, but it would be much easier to make a nested
factor, i.e.,

test$samp.time.new - with(test, reorder(samp.time:site, as.numeric(site)))

That just leaves getting the y-labels right, which I will leave for
you to figure out.

(Hint: ylim = some_function_of(levels(test$samp.time.new)))

-Deepayan



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Boxplot lattice vs standard graphics

2012-09-18 Thread Massimo Bressan


ok, I see now!
here it is the reproducible example along with the final code (aslo with 
the median line instead of a point)


thank you all for the great help

max

# start code

library(lattice)

test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A,
B, C, D, E), class = factor), conc = c(2.32, 0.902,
0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
3.39, 20, 4.59)), .Names = c(site, conc), row.names = c(NA,
52L), class = data.frame)

mystats - function(x, ...){ # Here ...
  out - boxplot.stats(10^x, ...)  # ...and here!!!
  out$stats - log10(out$stats)
  out$conf - log10(out$conf) ## Omit if you don't want notches
  out$out - log10(out$out)
  out ## With the boxplot statistics converted to the log10 scale
}

dev.new()
bwplot(conc~site, data=test,
   pch=|,  # this is plotting a line instead of a point
   scales = list(y=list(log=10)),
   panel = function(...){
 panel.bwplot(..., stats = mystats)
   }
)

# end code

Il 17/09/2012 20:26, Rui Barradas ha scritto:

Hello,

Em 17-09-2012 18:50, David Winsemius escreveu:

On Sep 17, 2012, at 4:18 AM, maxbre wrote:

here it is, I think (I hope)  I'm getting a little closer with this, 
but

still there is something  to sort out...

error using packet 1
unused argument(s)  (coef =1.5, do.out=TRUE)

by reading the help for panel.bwplot at the argument stats it 
says: the
function must accept arguments coef and do.out even if they do not 
use them

(a ... argument is good enough). 
I'm not sure how to couple with this...

any help for this ?

thanks


## start code


mystats - function(x){
  out - boxplot.stats(10^x)
  out$stats - log10(out$stats)
  out$conf - log10(out$conf) ## Omit if you don't want notches
  out$out - log10(out$out)
  out$coef-1.5 #??
  out$do.out-TRUE #??
  out ## With the boxplot statistics converted to the log10 scale
}

bwplot(conc~site, data=test,
   scales=list(y=list(log=10)),
   panel= function(x,y){
 panel.bwplot(x,y,stats=mystats)
   }
   )

No example data, so no efforts at running code.


Actually there is, in the op.



?panel.bwplot

# Notice the Usage at the top of the page. The ... is there for a 
reason.


# And notice that neither 'do.out' nor 'coef' are passed in the 
stats list


# The message was talking about what arguments your 'mystats' would 
accept,  not what it would return. It's another instance of your 
needing to understand what the ... formalism is doing.


?boxplot.stats

# I would be making a concerted effort to return a list with exactly 
the components listed there.


And since I'm terrible at graphics I try to learn as much as possible 
on R-Help. Here it goes.



library(lattice)

test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A,
B, C, D, E), class = factor), conc = c(2.32, 0.902,
0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
3.39, 20, 4.59)), .Names = c(site, conc), row.names = c(NA,
52L), class = data.frame)


#standard graphics
dev.new()
with(test,boxplot(conc~site, log=y))

#lattice
mystats - function(x, ...){ # Here ...
out - boxplot.stats(10^x, ...)  # ...and here!!!
out$stats - log10(out$stats)
out$conf - log10(out$conf) ## Omit if you don't want notches
out$out - log10(out$out)
out ## With the boxplot statistics converted to the log10 scale
}

dev.new()
bwplot(conc~site, data=test,
   scales = list(y=list(log=10)),
   panel = function(...){
 panel.bwplot(..., stats = mystats)
   }
)

With a median _line_ it would be perfect.
(Not a follow-up, it was already answered some time ago, use pch = | 
in panel.bwplot.)


Rui Barradas



## end code







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Boxplot lattice vs standard graphics

2012-09-17 Thread Massimo Bressan


thank you for the help, bert

unfortunately, for reasons I can not understand (yet) I can not put to 
wortk it all

(I'm always in trouble with the panel functions);

max

Il 14/09/2012 18:38, Bert Gunter ha scritto:

Thanks for the example. Makes it easy to see what you mean.

Yes, if I understand you correctly, you are right:
boxplot() (base) transforms the axes, so ?boxplot.stats, which is the
function that essentially computes the boxplot, does so on the
original data.
bwplot(lattice) transforms the data first, as the documentation for
the log component of the scales list makes clear, and **then** calls
boxplot.stats.

Although I think the latter makes more sense then the former, I think
the way to do it is to modify the stats function in an explicit call
to panel.bwplot to something like (UNTESTED!)
mystats - function(x){
out - boxplot.stats(10^x)
out$stats - log10(out$stats)
out$conf - log10(out$conf) ## Omit if you don't want notches
out$out - log10(out$out)
out ## With the boxplot statistics converted to the log10 scale
}

I leave it to you to test and modify as necessary.

Cheers,
Bert

On Fri, Sep 14, 2012 at 2:37 AM, maxbre mbres...@arpa.veneto.it wrote:

Given my reproducible example

test-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c(A,
B, C, D, E), class = factor), conc = c(2.32, 0.902,
0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
3.39, 20, 4.59)), .Names = c(site, conc), row.names = c(NA,
52L), class = data.frame)



And the following code

#standard graphics
with(test,boxplot(conc~site, log=y))

#lattice
bwplot(conc~site, data=test,
scales=list(y=list(log=10))
)

There is an evident difference for site A, B, D in the way some outliers are
plotted by comparing the plot produced by lattice vs. the standard graphics

I think to understand this might be due to the different treatment of data:
i.e. log transformation (before or after the plotting?)

Is it possible to achieve the same plotting result with both graphic
facilities?
I would like to show the outliers also in lattice…

Thank you

http://r.789695.n4.nabble.com/file/n4643121/standard.png

http://r.789695.n4.nabble.com/file/n4643121/lattice.png





--
View this message in context: 
http://r.789695.n4.nabble.com/Boxplot-lattice-vs-standard-graphics-tp4643121.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to change variable names in corrgram diagonal

2012-08-14 Thread Massimo Bressan


yes, the argument labels it's working fine!

It would be great if the docs will be be updated also with this already 
implemented feature


thank you for your valuable work

best
max

Il 13/08/2012 15:09, Uwe Ligges ha scritto:



On 13.08.2012 12:12, maxbre wrote:

given this example

library(corrgram)

corrgram(mtcars[2:6], order=TRUE, upper.panel=panel.conf,
  lower.panel=panel.pie,
  diag.panel=panel.minmax,
  text.panel=panel.txt)




I's just try the labels arguemnt and pass the labels there - and 
then write to the maintainer that the docs need to be updates, since 
labels work rather than being Not used.


Best,
Uwe Ligges






how can I change  the variable names in main diagonal?
(so that I can put more informative names of variables)

I think to understand that this should be done by modifing the panel.txt
function but for some reasons I'm not able to put that into practice
any help for this

thank you



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-change-variable-names-in-corrgram-diagonal-tp4640156.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge multiple data frames

2012-01-31 Thread Massimo Bressan


thanks don
I have here enough to study for a while
thank you for your help
max
- Original Message - 
From: MacQueen, Don macque...@llnl.gov

To: Massimo Bressan mbres...@arpa.veneto.it; r-help@r-project.org
Sent: Monday, January 30, 2012 4:47 PM
Subject: Re: [R] merge multiple data frames



Does this example help? It doesn't handle the problem of common field
names, but see below for another example.

df1 - data.frame(jn=1:4, a1=letters[1:4], a2=LETTERS[1:4])
df2 - data.frame(jn=2:6, b1=month.abb[2:6])
df3 - data.frame(jn=3:7, x=rnorm(5), y=13:17)

dfn - sqldf('select * from df1 left join df2 using (jn) left join df3
using (jn)')

In this example, you automatically get all fields from all three data
frames, without having to name them in the SQL statement -- but you should
not have common names.


To deal with common names, I myself would probably rename the variables in
the data frames before trying to merge.

A general method would be something like:
 nms1 - names(df1)
 nms1[nms1 != 'date'] - paste(nms1[nms1 != 'date'],'.1',sep='')
 names(df1) - nms1
Of course it has to be done for every data frame, but this can be put in a
loop, if necessary.


However, here is an example where I have changed df1 and df2; they both
have a field named 'aa', in addition to the matching field.

df1 - data.frame(jn=1:4, aa=letters[1:4], a2=LETTERS[1:4])
df2 - data.frame(jn=2:6, aa=month.abb[2:6])
df3 - data.frame(jn=3:7, x=rnorm(5), y=13:17)

dfn - sqldf('select jn, df1.aa aa1, df2.aa aa2,
 a2, x, y
  from df1 left join df2 using (jn) left join df3 using (jn)')

By the way, you can still select *, even with common names:


 dfx - sqldf('select *   from df1 left join df2 using (jn) left join df3
using (jn)')but you might not like the result. Try it and see!




It's my understanding that in the current SQL definition 'as' is no longer
required when changing field names (though it is also still allowed in the
databases I work with, Oracle and MySQL). Perhaps sqldf does not allow it.
I don't know.

Hope this helps.

-Don



--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/30/12 4:40 AM, Massimo Bressan mbres...@arpa.veneto.it wrote:


hi don

I followed your advice about using sqldf package but the problem of
labelling the fields persists;
for some reasons I can not properly handle the sql 'as' statement

a_b-sqldf(select a.*, b.* from a left join b on a.date=b.date)
a_b_c-sqldf(select a_b.*, c.* from a_b left join c on a_b.date=c.date)

bye

max





- Original Message -
From: MacQueen, Don macque...@llnl.gov
To: maxbre mbres...@arpa.veneto.it; r-help@r-project.org
Sent: Saturday, January 28, 2012 12:24 AM
Subject: Re: [R] merge multiple data frames


Not tested, but this might be a case for the sqldf package.

-Don

--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/26/12 9:29 AM, maxbre mbres...@arpa.veneto.it wrote:


This is my reproducible example (three data frames: a, b, c)

a-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0.799401398190476, 0, 0,
0.0100453950434783,
0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
110.425185027727), no = c(48.8543691516522, 88.7197448817391,
93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
49.2986070321739, 46.5978461731818), co = c(0.618856168125,
0.99659347508,
0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043
), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
c(167.541954974667,
252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
319.753674830936, 33.8713897347193, 228.368119533759), temp =
c(7.9197282588,
3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
)), .Names = c(date, so2, nox, no, no2, co, o3,
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


b-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01

Re: [R] merge multiple data frames

2012-01-30 Thread Massimo Bressan


hi don

I followed your advice about using sqldf package but the problem of 
labelling the fields persists;

for some reasons I can not properly handle the sql 'as' statement

a_b-sqldf(select a.*, b.* from a left join b on a.date=b.date)
a_b_c-sqldf(select a_b.*, c.* from a_b left join c on a_b.date=c.date)

bye

max





- Original Message - 
From: MacQueen, Don macque...@llnl.gov

To: maxbre mbres...@arpa.veneto.it; r-help@r-project.org
Sent: Saturday, January 28, 2012 12:24 AM
Subject: Re: [R] merge multiple data frames


Not tested, but this might be a case for the sqldf package.

-Don

--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/26/12 9:29 AM, maxbre mbres...@arpa.veneto.it wrote:


This is my reproducible example (three data frames: a, b, c)

a-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0.799401398190476, 0, 0,
0.0100453950434783,
0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
110.425185027727), no = c(48.8543691516522, 88.7197448817391,
93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
49.2986070321739, 46.5978461731818), co = c(0.618856168125,
0.99659347508,
0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043
), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
c(167.541954974667,
252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
319.753674830936, 33.8713897347193, 228.368119533759), temp =
c(7.9197282588,
3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
)), .Names = c(date, so2, nox, no, no2, co, o3,
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


b-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
)), .Names = c(date, so2, nox, no, no2, co, o3,
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


c-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086,
0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
), wd = c(58.15639877, 64.5657153143848, 39.9754269501381,
24.0739884380921,
55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029,
4.33258009,

Re: [R] merge multiple data frames

2012-01-30 Thread Massimo Bressan


thanks michael

it's working like a charm: that's exaclty what I was looking for

bye

max

- Original Message - 
From: R. Michael Weylandt michael.weyla...@gmail.com

To: Massimo Bressan mbres...@arpa.veneto.it
Cc: r-help@r-project.org
Sent: Friday, January 27, 2012 4:16 PM
Subject: Re: [R] merge multiple data frames


Oh, sorry -- I assumed that was intentional since my code passed the
identical() test with what you said you wanted.

Perhaps this gets what you meant you wanted instead (though the
treatment of the names is far from elegant)

mergeAll - function(..., by = date, all = TRUE) {
 dotArgs - list(...)
 dotNames - lapply(dotArgs, names)
 repNames - Reduce(intersect, dotNames)
 repNames - repNames[repNames != by]
 for(i in seq_along(dotArgs)){
   wn - which( (names(dotArgs[[i]]) %in% repNames) 
(names(dotArgs[[i]]) != by))
   names(dotArgs[[i]])[wn] - paste(names(dotArgs[[i]])[wn],
names(dotArgs)[[i]], sep = .)
 }
 Reduce(function(x, y) merge(x, y, by = by, all = all), dotArgs)
}

print(str(mergeAll(a=a,b=b,c=c)))

Is that what you were going for?

Michael

On Fri, Jan 27, 2012 at 3:19 AM, Massimo Bressan
mbres...@arpa.veneto.it wrote:

I tested your code: it's OK but there is still the problem of the suffixes
for the last dataframe
thank you for the support


- Original Message - From: R. Michael Weylandt
michael.weyla...@gmail.com
To: maxbre mbres...@arpa.veneto.it
Cc: r-help@r-project.org
Sent: Thursday, January 26, 2012 8:19 PM
Subject: Re: [R] merge multiple data frames


I might do something like this:

mergeAll - function(..., by = date, all = TRUE) {
dotArgs - list(...)
Reduce(function(x, y)
merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs),
sep = )),
dotArgs)}

mergeAll(a = a, b = b, c = c)

str(.Last.value)

You also might be able to set it up to capture names without you
having to put a = a etc. using substitute.

On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote:


This is my reproducible example (three data frames: a, b, c)

a-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0.799401398190476, 0, 0, 
0.0100453950434783,

0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
110.425185027727), no = c(48.8543691516522, 88.7197448817391,
93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
49.2986070321739, 46.5978461731818), co = c(0.618856168125,
0.99659347508,
0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043
), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
c(167.541954974667,
252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
319.753674830936, 33.8713897347193, 228.368119533759), temp =
c(7.9197282588,
3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
)), .Names = c(date, so2, nox, no, no2, co, o3,
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


b-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
)), .Names = c(date, so2, nox, no, no2

Re: [R] merge multiple data frames

2012-01-27 Thread Massimo Bressan

I tested your code: it's OK but there is still the problem of the suffixes 
for the last dataframe

thank you for the support


- Original Message - 
From: R. Michael Weylandt michael.weyla...@gmail.com

To: maxbre mbres...@arpa.veneto.it
Cc: r-help@r-project.org
Sent: Thursday, January 26, 2012 8:19 PM
Subject: Re: [R] merge multiple data frames


I might do something like this:

mergeAll - function(..., by = date, all = TRUE) {
 dotArgs - list(...)
 Reduce(function(x, y)
 merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs),
sep = )),
 dotArgs)}

mergeAll(a = a, b = b, c = c)

str(.Last.value)

You also might be able to set it up to capture names without you
having to put a = a etc. using substitute.

On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote:

This is my reproducible example (three data frames: a, b, c)

a-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783,
0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
110.425185027727), no = c(48.8543691516522, 88.7197448817391,
93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
49.2986070321739, 46.5978461731818), co = c(0.618856168125,
0.99659347508,
0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043
), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
c(167.541954974667,
252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
319.753674830936, 33.8713897347193, 228.368119533759), temp =
c(7.9197282588,
3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
)), .Names = c(date, so2, nox, no, no2, co, o3,
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


b-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
)), .Names = c(date, so2, nox, no, no2, co, o3,
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


c-structure(list(date = structure(1:6, .Label = c(2012-01-03,
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086,
0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 
24.0739884380921,

55.9453098437477, 56.7648829092446), temp = c(10.24740028,

46 matches

Mail list logo