Re: [R] Plotting bar charts by Month

2017-05-09 Thread Jeff Newmiller
For this kind of plot I usually use day-of-month for for the x-axis instead of 
a date or timestamp. 
-- 
Sent from my phone. Please excuse my brevity.

On May 9, 2017 6:55:27 PM PDT, Jeff Reichman  wrote:
>r-help
>
> 
>
>Trying to figure out how to plot by month bar charts. The follow code
>plots
>the monthly portion on a yearly x-scale.  So I either I create 12
>individual
>month plots or maybe there is some sort of "break" to tell R separate
>by
>month and use the months dates as the x-scale; so that Jan's scale is 1
>- 31
>Jan , Feb scale is 1 - 28 Feb etc.  As it is now I get the Jan values
>ploted
>with a 1-Jan to 31 Dec x-scale; Feb's value are ploted on a 1-Jan to 31
>Dec
>x-scale etc.
>
> 
>
>ggplot(data = df, aes(x = date, y = height)) +
>
>geom_bar(stat = "identity") +
>
>geom_bar(aes(x = action, y = height), color = "red", stat =
>"identity") +
>
>facet_wrap(~month, nrow = 3)
>
> 
>
>Jeff
>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] basic query relating to GLMM model design

2017-05-09 Thread Sharada Ramadass
Hello,
  I am a newbie to R and GLMM and having a difficult time
understanding the model design that best captures my test scenario.

I am interested in the following question:
1. whether average values of a variable explain a certain response
lesser than individual values.

1.1. For this, I have a single response, say y.
1.2. I have a bunch of fixed predictors, say x1, x2, x3 and I can
derive my models for those.
1.3 I have two kinds of random effects - a site (r1) and a species
(r2), within the site. My average values of some of the fixed
predictors is based on the species (r2).
I am not especially interested in looking at site level variations,
but I did build it into the model, all the same.

So, I was able to develop a set of models with the individual values like so:

y ~ x1+ x2 + x3 + (1|r1/r2)
I was able to get some output in terms of significance for certain
parameter estimates. So far, its ok.

Now, I wanted to test whether the average values of x1 and x2 based on
r2 will predict y with less powerful estimates. My doubt is whether in
that case, r2 should be removed from the random variable since I now
actually have average values for all x1 and x2 for a certain value of
r2.
Basically is the below model with average values logically wrong?
y ~ x1avg + x2avg + x3 + (1|r1/r2)

my averages for x1 and x2 are over each value of r2.
Should r2 move to a fixed effect or be removed totally from the model?
Any inputs would be appreciated.

Thanks and Regards,
Sharada

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting bar charts by Month

2017-05-09 Thread Jeff Reichman
r-help

 

Trying to figure out how to plot by month bar charts. The follow code plots
the monthly portion on a yearly x-scale.  So I either I create 12 individual
month plots or maybe there is some sort of "break" to tell R separate by
month and use the months dates as the x-scale; so that Jan's scale is 1 - 31
Jan , Feb scale is 1 - 28 Feb etc.  As it is now I get the Jan values ploted
with a 1-Jan to 31 Dec x-scale; Feb's value are ploted on a 1-Jan to 31 Dec
x-scale etc.

 

ggplot(data = df, aes(x = date, y = height)) +

geom_bar(stat = "identity") +

geom_bar(aes(x = action, y = height), color = "red", stat =
"identity") +

facet_wrap(~month, nrow = 3)

 

Jeff


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to replace missing values by mean of subgroup of a group

2017-05-09 Thread Boris Steipe
Of course, and I neglected to point this out:

"The other thread" would be how to properly impute missing values,
"The other site" could be https://stats.stackexchange.com/
... there is _lots_ of information available if you search for it.

An applicable R Package is MICE and you can find an introduction here:
  https://datascienceplus.com/imputing-missing-data-with-r-mice-package/

(Besides the usual documentation.)


B.



> On May 9, 2017, at 8:44 PM, Bert Gunter  wrote:
> 
> Of course, statistically, one should not do this. But that's another thread 
> on another site.
> 
> Cheers,
> Bert
> 
> 
> 
> On May 9, 2017 3:36 PM, "Boris Steipe"  wrote:
> Great.
> 
> I am CC'ing the list - this is important so that others who may come across 
> this thread in the archives know that this question has been resolved.
> 
> Cheers,
> 
> B.
> 
> 
> > On May 9, 2017, at 5:18 PM, Olu Ola  wrote:
> >
> > Thank you!!! It worked.
> >
> > Regards,
> > Olu
> >
> >
> > On Tuesday, May 9, 2017 4:20 PM, Boris Steipe  
> > wrote:
> >
> >
> > Pedestrian code, so you can analyze this easily. However entirely untested 
> > since I have no ambitionto recreate your input data as a data frame. This 
> > code assumes:
> > - your data _is_ a data frame
> > - the desired column is called food.price, not "food price" (cf. 
> > ?make.names )
> >
> > # define a function that imputes NA values in the same city, for the same 
> > food
> > imputeFoodPrice <- function(DF, i) {
> >   sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
> >   imputed <- mean(DF$food.price[sel], na.rm = TRUE)
> >   if (is.nan(imputed)) { # careful, there might be no other match
> > imputed <- NA
> >   }
> >   return(imputed)
> > }
> >
> >
> > # apply the function to replace NA values
> > for (iMissing in which(is.na(myDF$food.price))) {
> >   myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
> > }
> >
> >
> > B.
> >
> >
> >
> > > On May 9, 2017, at 3:14 PM, Olu Ola via R-help  
> > > wrote:
> > >
> > > Hello,I have the following food data with some NA values in the food 
> > > prices. I will like to replace the NA values in the food price column for 
> > > each food item by the mean price of the specific food item for each city. 
> > > For example, the price of bean for the household with hhid 102 in the 
> > > data set is missing. I will like to replace the missing value with the 
> > > mean price of bean for the households living in Paxton city (that is 
> > > households 101 and 103). the data set is presented below. Any help will 
> > > be greatly appreciated.
> > >
> > > | hhid | city | food | food price |
> > > | 101 | Paxton | rice | 10 |
> > > | 101 | Paxton | beans | 30 |
> > > | 101 | Paxton | flour | NA |
> > > | 101 | Paxton | eggs | 20 |
> > > | 102 | Paxton | rice | NA |
> > > | 102 | Paxton | beans | NA |
> > > | 102 | Paxton | flour | 34 |
> > > | 102 | Paxton | eggs | 21 |
> > > | 103 | Paxton | rice | 15 |
> > > | 103 | Paxton | beans | 28 |
> > > | 103 | Paxton | flour | 32 |
> > > | 103 | Paxton | eggs | NA |
> > > | 104 | Hull | rice | NA |
> > > | 104 | Hull | beans | 34 |
> > > | 104 | Hull | flour | NA |
> > > | 104 | Hull | eggs | 24 |
> > > | 105 | Hull | rice | 18 |
> > > | 105 | Hull | beans | 38 |
> > > | 105 | Hull | flour | 36 |
> > > | 105 | Hull | eggs | 26 |
> > > | 106 | Hull | rice | NA |
> > > | 106 | Hull | beans | NA |
> > > | 106 | Hull | flour | 40 |
> > > | 106 | Hull | eggs | NA |
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to replace missing values by mean of subgroup of a group

2017-05-09 Thread Bert Gunter
Of course, statistically, one should not do this. But that's another thread
on another site.

Cheers,
Bert



On May 9, 2017 3:36 PM, "Boris Steipe"  wrote:

Great.

I am CC'ing the list - this is important so that others who may come across
this thread in the archives know that this question has been resolved.

Cheers,

B.


> On May 9, 2017, at 5:18 PM, Olu Ola  wrote:
>
> Thank you!!! It worked.
>
> Regards,
> Olu
>
>
> On Tuesday, May 9, 2017 4:20 PM, Boris Steipe 
wrote:
>
>
> Pedestrian code, so you can analyze this easily. However entirely
untested since I have no ambitionto recreate your input data as a data
frame. This code assumes:
> - your data _is_ a data frame
> - the desired column is called food.price, not "food price" (cf.
?make.names )
>
> # define a function that imputes NA values in the same city, for the same
food
> imputeFoodPrice <- function(DF, i) {
>   sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
>   imputed <- mean(DF$food.price[sel], na.rm = TRUE)
>   if (is.nan(imputed)) { # careful, there might be no other match
> imputed <- NA
>   }
>   return(imputed)
> }
>
>
> # apply the function to replace NA values
> for (iMissing in which(is.na(myDF$food.price))) {
>   myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
> }
>
>
> B.
>
>
>
> > On May 9, 2017, at 3:14 PM, Olu Ola via R-help 
wrote:
> >
> > Hello,I have the following food data with some NA values in the food
prices. I will like to replace the NA values in the food price column for
each food item by the mean price of the specific food item for each city.
For example, the price of bean for the household with hhid 102 in the data
set is missing. I will like to replace the missing value with the mean
price of bean for the households living in Paxton city (that is households
101 and 103). the data set is presented below. Any help will be greatly
appreciated.
> >
> > | hhid | city | food | food price |
> > | 101 | Paxton | rice | 10 |
> > | 101 | Paxton | beans | 30 |
> > | 101 | Paxton | flour | NA |
> > | 101 | Paxton | eggs | 20 |
> > | 102 | Paxton | rice | NA |
> > | 102 | Paxton | beans | NA |
> > | 102 | Paxton | flour | 34 |
> > | 102 | Paxton | eggs | 21 |
> > | 103 | Paxton | rice | 15 |
> > | 103 | Paxton | beans | 28 |
> > | 103 | Paxton | flour | 32 |
> > | 103 | Paxton | eggs | NA |
> > | 104 | Hull | rice | NA |
> > | 104 | Hull | beans | 34 |
> > | 104 | Hull | flour | NA |
> > | 104 | Hull | eggs | 24 |
> > | 105 | Hull | rice | 18 |
> > | 105 | Hull | beans | 38 |
> > | 105 | Hull | flour | 36 |
> > | 105 | Hull | eggs | 26 |
> > | 106 | Hull | rice | NA |
> > | 106 | Hull | beans | NA |
> > | 106 | Hull | flour | 40 |
> > | 106 | Hull | eggs | NA |
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Estimating cluster standard errors in Diff-in-Diff panel models with plm

2017-05-09 Thread Renzo Giudice
Hi,
I want to estimate the cluster SE of a differences-in-differences
panel model with 100 groups, 6,156 individuals and 15 years. Some of
the individuals are repeated (4,201 unique) because they are part of a
matched sample obtained with a one-to-one, with replacement, matching
method.
I have been using plm to estimate the model coefficients, after
transforming my matched sample into a pdata.frame by using indivuals
and years as indexes. I have also been able to estimate the cluster
standard errors at the individual level by using the vcovHC function.
However, these individuals are clustered within the groups, and
therefore I want to cluster at this higher level of aggregation rather
than at the individual level. Unfortunately, it is not clear to me how
to proceed. Of course if I replace the individuals for groups in the
index I get repeated row.names and then I can´t estimate the panel
model with plm. I get the following error message:

Error in `row.names<-.data.frame`(`*tmp*`, value = c("1-1", "1-1",
"1-1",  : duplicate 'row.names' are not allowed

For simplicity, I make my case using the following example (copied
from: http://www.richard-bluhm.com/clustered-ses-in-r-and-stata-2/):
# load packages
require(plm)
require(lmtest)
# get data and load as pdata.frame
url <- 
"http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt;
p.df <- read.table(url)
names(p.df) <- c("firmid", "year", "x", "y")
#Introduce group (State) Id
p.df$State <- rep(1:100, each=50)
p.df2 <- pdata.frame(p.df, index = c("State", "year"), drop.index = F,
row.names = T)
# fit model with plm
pm1 <- plm(y ~ x, data = p.df2, model = "within") #this is where the
error occurs.

So is there any way I could cluster SE at the group level using plm?
Any other comments would be highly appreciated.

Thanks in advance!
Renzo
Center for Development Research
University of Bonn

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to replace missing values by mean of subgroup of a group

2017-05-09 Thread Boris Steipe
Great. 

I am CC'ing the list - this is important so that others who may come across 
this thread in the archives know that this question has been resolved.

Cheers,

B.


> On May 9, 2017, at 5:18 PM, Olu Ola  wrote:
> 
> Thank you!!! It worked.
> 
> Regards,
> Olu
> 
> 
> On Tuesday, May 9, 2017 4:20 PM, Boris Steipe  
> wrote:
> 
> 
> Pedestrian code, so you can analyze this easily. However entirely untested 
> since I have no ambitionto recreate your input data as a data frame. This 
> code assumes:
> - your data _is_ a data frame
> - the desired column is called food.price, not "food price" (cf. ?make.names )
> 
> # define a function that imputes NA values in the same city, for the same food
> imputeFoodPrice <- function(DF, i) {
>   sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
>   imputed <- mean(DF$food.price[sel], na.rm = TRUE)
>   if (is.nan(imputed)) { # careful, there might be no other match
> imputed <- NA
>   }
>   return(imputed)
> }
> 
> 
> # apply the function to replace NA values
> for (iMissing in which(is.na(myDF$food.price))) {
>   myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
> }
> 
> 
> B.
> 
> 
> 
> > On May 9, 2017, at 3:14 PM, Olu Ola via R-help  wrote:
> > 
> > Hello,I have the following food data with some NA values in the food 
> > prices. I will like to replace the NA values in the food price column for 
> > each food item by the mean price of the specific food item for each city. 
> > For example, the price of bean for the household with hhid 102 in the data 
> > set is missing. I will like to replace the missing value with the mean 
> > price of bean for the households living in Paxton city (that is households 
> > 101 and 103). the data set is presented below. Any help will be greatly 
> > appreciated.
> > 
> > | hhid | city | food | food price |
> > | 101 | Paxton | rice | 10 |
> > | 101 | Paxton | beans | 30 |
> > | 101 | Paxton | flour | NA |
> > | 101 | Paxton | eggs | 20 |
> > | 102 | Paxton | rice | NA |
> > | 102 | Paxton | beans | NA |
> > | 102 | Paxton | flour | 34 |
> > | 102 | Paxton | eggs | 21 |
> > | 103 | Paxton | rice | 15 |
> > | 103 | Paxton | beans | 28 |
> > | 103 | Paxton | flour | 32 |
> > | 103 | Paxton | eggs | NA |
> > | 104 | Hull | rice | NA |
> > | 104 | Hull | beans | 34 |
> > | 104 | Hull | flour | NA |
> > | 104 | Hull | eggs | 24 |
> > | 105 | Hull | rice | 18 |
> > | 105 | Hull | beans | 38 |
> > | 105 | Hull | flour | 36 |
> > | 105 | Hull | eggs | 26 |
> > | 106 | Hull | rice | NA |
> > | 106 | Hull | beans | NA |
> > | 106 | Hull | flour | 40 |
> > | 106 | Hull | eggs | NA |
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factors and Alternatives

2017-05-09 Thread peter dalgaard
Inline...

> On 9 May 2017, at 12:12 , g.maub...@weinwolf.de wrote:
> 
> Hi All,
> 
> I am using factors in a study for the social sciences.
> 
> I discovered the following:
> 
> -- cut --
> 
> library(dplyr)
> 
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
> 
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
> 
> test3 <- factor(test1, 
>levels = c(0, 1),
>labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
> 
> d_test1 %>% filter(test1 == 0)  # works OK
> d_test2 %>% filter(test2 == 0)  # works OK
> d_test3 %>% filter(test3 == 0)  # does not work, why?
> 



test3 does not have a level 0. You want

test3 == "WITHOUT Contact"


Notice that once test3 is  created, the input levels are lost, and thus "test3 
== 0" becomes meaningless.

-pd


> myf <- function(ds) {
>  print(levels(ds$test3))
>  print(labels(ds$test3))
>  print(as.numeric(ds$test3))
>  print(as.character(ds$test3))
> }
> 
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
> 
> -- cut --
> 
> Why is it not possible to use a factor with labels for filtering with the 
> original values?
> Is there a data structure that works like a factor but gives also access 
> to the original values?
> 
> Kind regards
> 
> Georg
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating samples from truncated multivariate Student-t distribution

2017-05-09 Thread David Winsemius

> On May 9, 2017, at 2:33 PM, David Winsemius  wrote:
> 
> 
>> On May 9, 2017, at 2:05 PM, Czarek Kowalski  wrote:
>> 
>> I have already posted that in attachement - pdf file.
> 
> I see that now. I failed to scroll to the 3rd page.
> 
>> I am posting
>> plain text here:
>> 
>>> library(tmvtnorm)
>>> meann = c(55, 40, 50, 35, 45, 30)
>>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>>   1, 16, -6, -6, -2, 12,
>>   0, -6, 4, 2, -2, -5,
>>   2, -6, 2, 25, 0, -17,
>>  -1, -2, -2, 0, 9, -5,
>>  -1, 12, -5, -17, -5, 36), 6, 6)
>> df = 4
>> lower = c(20, 20, 20, 20, 20, 20)
>> upper = c(60, 60, 60, 60, 60, 60)
>> X1 <- rtmvt(n=10, meann, covv, df, lower, upper)
>> 
>> 
>>> sum(X1[,1]) / 10
>> [1] 54.98258
>> sum(X1[,2]) / 10
>> [1] 40.36153
>> sum(X1[,3]) / 10
>> [1] 49.83571
>> sum(X1[,4]) / 10
>> [1] 34.69571  # "4th element of mean vector"
>> sum(X1[,5]) / 10
>> [1] 44.81081
>> sum(X1[,6]) / 10
>> [1] 31.10834
>> 
>> And corresponding results received using equation (3) from pdf file:
>> [54.97,
>> 40,
>> 49.95,
>> 35.31, #  "4th element of mean vector"
>> 44.94,
>> 31.32]
>> 
> 
> I get similar results for the output from your code, 
> 
> My 100-fold run of your calculations were:
> 
> meansBig <- replicate(100, {Xbig <- rtmvt(n=10, meann, covv, df, lower, 
> upper)
> colMeans(Xbig)} )
> 
> describe(meansBig[4,])  # describe is from Hmisc package
> 
> meansBig[4, ] 
>   n  missing distinct Info Mean  Gmd  .05  .10  
> .25 
> 1000  1001 34.7  0.0195434.6834.68
> 34.69 
> .50  .75  .90  .95 
>   34.7034.7234.7234.73 
> 
> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
> 
> 
> So agree, 35.31 is outside the plausible range of an RV formed with that 
> package, but I don't have any code relating to your calculations from theory.

Further investigation:

covDiag <- covv*( row(covv)==col(covv) )  # just the diagonal means

Repeat with all zero covariances:

> meansDiag <- replicate(100, {Xbig <- rtmvt(n=10, meann, covDiag, df, 
> lower, upper)
+ colMeans(Xbig)} )
> describe(meansDiag[4,])
meansDiag[4, ] 
   n  missing distinct Info Mean  Gmd  .05  .10  
.25 
 1000  100135.23  0.0207435.2135.21
35.22 
 .50  .75  .90  .95 
   35.2335.2535.2635.26 

lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
highest: 35.26367 35.26635 35.26791 35.27251 35.27302

So failing to account for the covariances in your theoretical calculations 
mostly explains the apparent discrepancy, although your value of 35.31 would be 
at the  far end of a statistical distribution and I wonder about some sort of 
error in your theoretical calculation, which didn't appear to take into account 
the covariance matrix.

Best;
David.



> 
> Best;
> David.
> 
> 
>> On 9 May 2017 at 22:17, David Winsemius  wrote:
>>> 
 On May 9, 2017, at 1:11 PM, Czarek Kowalski  wrote:
 
 Of course I have expected the difference between theory and a sample
 of realizations of RV's and result mean should still be a random
 variable. But, for example for 4th element of mean vector: 35.31 -
 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
 expected that the difference would be smaller because of law of large
 numbers (for 10mln samples the difference is quite similar).
>>> 
>>> I for one have no idea what is meant by a "4th element of mean vector". So 
>>> I have now idea what to consider "big". I have found that my intuitions 
>>> about multivariate distributions, especially those where the covariate 
>>> structure is as complex as you have assumed, are often far from simulated 
>>> results.
>>> 
>>> I suggest you post some code and results.
>>> 
>>> --
>>> David.
>>> 
>>> 
 
 On 9 May 2017 at 21:40, David Winsemius  wrote:
> 
>> On May 9, 2017, at 10:09 AM, Czarek Kowalski  
>> wrote:
>> 
>> Dear Members,
>> I am working with 6-dimensional Student-t distribution with 4 degrees
>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>> from truncated multivariate Student-t distribution using rtmvt
>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>> using equation (3) from attached pdf. The problem is, that after
>> summing all elements in one column of rtmvt result (and dividing by
>> 100 000) I do not receive the same result as using (3) equation. Could
>> You tell me, what is incorrect, why there is a difference?
> 
> I guess the question is why you would NOT expect a difference between 
> theory and a 

Re: [R] Generating samples from truncated multivariate Student-t distribution

2017-05-09 Thread David Winsemius

> On May 9, 2017, at 2:05 PM, Czarek Kowalski  wrote:
> 
> I have already posted that in attachement - pdf file.

I see that now. I failed to scroll to the 3rd page.

> I am posting
> plain text here:
> 
>> library(tmvtnorm)
>> meann = c(55, 40, 50, 35, 45, 30)
>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>1, 16, -6, -6, -2, 12,
>0, -6, 4, 2, -2, -5,
>2, -6, 2, 25, 0, -17,
>   -1, -2, -2, 0, 9, -5,
>   -1, 12, -5, -17, -5, 36), 6, 6)
> df = 4
> lower = c(20, 20, 20, 20, 20, 20)
> upper = c(60, 60, 60, 60, 60, 60)
> X1 <- rtmvt(n=10, meann, covv, df, lower, upper)
> 
> 
>> sum(X1[,1]) / 10
> [1] 54.98258
> sum(X1[,2]) / 10
> [1] 40.36153
> sum(X1[,3]) / 10
> [1] 49.83571
> sum(X1[,4]) / 10
> [1] 34.69571  # "4th element of mean vector"
> sum(X1[,5]) / 10
> [1] 44.81081
> sum(X1[,6]) / 10
> [1] 31.10834
> 
> And corresponding results received using equation (3) from pdf file:
> [54.97,
> 40,
> 49.95,
> 35.31, #  "4th element of mean vector"
> 44.94,
> 31.32]
> 

I get similar results for the output from your code, 

My 100-fold run of your calculations were:

meansBig <- replicate(100, {Xbig <- rtmvt(n=10, meann, covv, df, lower, 
upper)
colMeans(Xbig)} )

describe(meansBig[4,])  # describe is from Hmisc package

meansBig[4, ] 
   n  missing distinct Info Mean  Gmd  .05  .10  
.25 
 1000  1001 34.7  0.0195434.6834.68
34.69 
 .50  .75  .90  .95 
   34.7034.7234.7234.73 

lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
highest: 34.72939 34.73012 34.73051 34.73742 34.74441


So agree, 35.31 is outside the plausible range of an RV formed with that 
package, but I don't have any code relating to your calculations from theory.

Best;
David.


> On 9 May 2017 at 22:17, David Winsemius  wrote:
>> 
>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski  wrote:
>>> 
>>> Of course I have expected the difference between theory and a sample
>>> of realizations of RV's and result mean should still be a random
>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>>> expected that the difference would be smaller because of law of large
>>> numbers (for 10mln samples the difference is quite similar).
>> 
>> I for one have no idea what is meant by a "4th element of mean vector". So I 
>> have now idea what to consider "big". I have found that my intuitions about 
>> multivariate distributions, especially those where the covariate structure 
>> is as complex as you have assumed, are often far from simulated results.
>> 
>> I suggest you post some code and results.
>> 
>> --
>> David.
>> 
>> 
>>> 
>>> On 9 May 2017 at 21:40, David Winsemius  wrote:
 
> On May 9, 2017, at 10:09 AM, Czarek Kowalski  
> wrote:
> 
> Dear Members,
> I am working with 6-dimensional Student-t distribution with 4 degrees
> of freedom truncated to [20; 60]. I have generated 100 000 samples
> from truncated multivariate Student-t distribution using rtmvt
> function from package ‘tmvtnorm’. I have also calculated  mean vector
> using equation (3) from attached pdf. The problem is, that after
> summing all elements in one column of rtmvt result (and dividing by
> 100 000) I do not receive the same result as using (3) equation. Could
> You tell me, what is incorrect, why there is a difference?
 
 I guess the question is why you would NOT expect a difference between 
 theory and a sample of realizations of RV's? The result mean should still 
 be a random variable, night wahr?
 
 
> Yours faithfully
> Czarek Kowalski
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius
 Alameda, CA, USA
 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating samples from truncated multivariate Student-t distribution

2017-05-09 Thread Czarek Kowalski
I have already posted that in attachement - pdf file. I am posting
plain text here:

> library(tmvtnorm)

> meann = c(55, 40, 50, 35, 45, 30)

> covv = matrix(c(  1, 1, 0, 2, -1, -1,

+   1, 16, -6, -6, -2, 12,

+   0, -6, 4, 2, -2, -5,

+   2, -6, 2, 25, 0, -17,

+  -1, -2, -2, 0, 9, -5,

+  -1, 12, -5, -17, -5, 36), 6, 6)

> df = 4

> lower = c(20, 20, 20, 20, 20, 20)

> upper = c(60, 60, 60, 60, 60, 60)

> X1 <- rtmvt(n=10, meann, covv, df, lower, upper)





> sum(X1[,1]) / 10

[1] 54.98258

> sum(X1[,2]) / 10

[1] 40.36153

> sum(X1[,3]) / 10

[1] 49.83571

> sum(X1[,4]) / 10

[1] 34.69571  # "4th element of mean vector"

> sum(X1[,5]) / 10

[1] 44.81081

> sum(X1[,6]) / 10

[1] 31.10834




And corresponding results received using equation (3) from pdf file:
[54.97,
40,
49.95,
35.31, #  "4th element of mean vector"
44.94,
31.32]

On 9 May 2017 at 22:17, David Winsemius  wrote:
>
>> On May 9, 2017, at 1:11 PM, Czarek Kowalski  wrote:
>>
>> Of course I have expected the difference between theory and a sample
>> of realizations of RV's and result mean should still be a random
>> variable. But, for example for 4th element of mean vector: 35.31 -
>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>> expected that the difference would be smaller because of law of large
>> numbers (for 10mln samples the difference is quite similar).
>
> I for one have no idea what is meant by a "4th element of mean vector". So I 
> have now idea what to consider "big". I have found that my intuitions about 
> multivariate distributions, especially those where the covariate structure is 
> as complex as you have assumed, are often far from simulated results.
>
> I suggest you post some code and results.
>
> --
> David.
>
>
>>
>> On 9 May 2017 at 21:40, David Winsemius  wrote:
>>>
 On May 9, 2017, at 10:09 AM, Czarek Kowalski  
 wrote:

 Dear Members,
 I am working with 6-dimensional Student-t distribution with 4 degrees
 of freedom truncated to [20; 60]. I have generated 100 000 samples
 from truncated multivariate Student-t distribution using rtmvt
 function from package ‘tmvtnorm’. I have also calculated  mean vector
 using equation (3) from attached pdf. The problem is, that after
 summing all elements in one column of rtmvt result (and dividing by
 100 000) I do not receive the same result as using (3) equation. Could
 You tell me, what is incorrect, why there is a difference?
>>>
>>> I guess the question is why you would NOT expect a difference between 
>>> theory and a sample of realizations of RV's? The result mean should still 
>>> be a random variable, night wahr?
>>>
>>>
 Yours faithfully
 Czarek Kowalski
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to replace missing values by mean of subgroup of a group

2017-05-09 Thread Boris Steipe
Pedestrian code, so you can analyze this easily. However entirely untested 
since I have no ambitionto recreate your input data as a data frame. This code 
assumes:
 - your data _is_ a data frame
 - the desired column is called food.price, not "food price" (cf. ?make.names )

# define a function that imputes NA values in the same city, for the same food
imputeFoodPrice <- function(DF, i) {
  sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
  imputed <- mean(DF$food.price[sel], na.rm = TRUE)
  if (is.nan(imputed)) { # careful, there might be no other match
imputed <- NA
  }
  return(imputed)
}


# apply the function to replace NA values
for (iMissing in which(is.na(myDF$food.price))) {
  myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
}


B.



> On May 9, 2017, at 3:14 PM, Olu Ola via R-help  wrote:
> 
> Hello,I have the following food data with some NA values in the food prices. 
> I will like to replace the NA values in the food price column for each food 
> item by the mean price of the specific food item for each city. For example, 
> the price of bean for the household with hhid 102 in the data set is missing. 
> I will like to replace the missing value with the mean price of bean for the 
> households living in Paxton city (that is households 101 and 103). the data 
> set is presented below. Any help will be greatly appreciated.
> 
> | hhid | city | food | food price |
> | 101 | Paxton | rice | 10 |
> | 101 | Paxton | beans | 30 |
> | 101 | Paxton | flour | NA |
> | 101 | Paxton | eggs | 20 |
> | 102 | Paxton | rice | NA |
> | 102 | Paxton | beans | NA |
> | 102 | Paxton | flour | 34 |
> | 102 | Paxton | eggs | 21 |
> | 103 | Paxton | rice | 15 |
> | 103 | Paxton | beans | 28 |
> | 103 | Paxton | flour | 32 |
> | 103 | Paxton | eggs | NA |
> | 104 | Hull | rice | NA |
> | 104 | Hull | beans | 34 |
> | 104 | Hull | flour | NA |
> | 104 | Hull | eggs | 24 |
> | 105 | Hull | rice | 18 |
> | 105 | Hull | beans | 38 |
> | 105 | Hull | flour | 36 |
> | 105 | Hull | eggs | 26 |
> | 106 | Hull | rice | NA |
> | 106 | Hull | beans | NA |
> | 106 | Hull | flour | 40 |
> | 106 | Hull | eggs | NA |
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating samples from truncated multivariate Student-t distribution

2017-05-09 Thread David Winsemius

> On May 9, 2017, at 1:11 PM, Czarek Kowalski  wrote:
> 
> Of course I have expected the difference between theory and a sample
> of realizations of RV's and result mean should still be a random
> variable. But, for example for 4th element of mean vector: 35.31 -
> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
> expected that the difference would be smaller because of law of large
> numbers (for 10mln samples the difference is quite similar).

I for one have no idea what is meant by a "4th element of mean vector". So I 
have now idea what to consider "big". I have found that my intuitions about 
multivariate distributions, especially those where the covariate structure is 
as complex as you have assumed, are often far from simulated results.

I suggest you post some code and results.

-- 
David.


> 
> On 9 May 2017 at 21:40, David Winsemius  wrote:
>> 
>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski  wrote:
>>> 
>>> Dear Members,
>>> I am working with 6-dimensional Student-t distribution with 4 degrees
>>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>>> from truncated multivariate Student-t distribution using rtmvt
>>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>>> using equation (3) from attached pdf. The problem is, that after
>>> summing all elements in one column of rtmvt result (and dividing by
>>> 100 000) I do not receive the same result as using (3) equation. Could
>>> You tell me, what is incorrect, why there is a difference?
>> 
>> I guess the question is why you would NOT expect a difference between theory 
>> and a sample of realizations of RV's? The result mean should still be a 
>> random variable, night wahr?
>> 
>> 
>>> Yours faithfully
>>> Czarek Kowalski
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating samples from truncated multivariate Student-t distribution

2017-05-09 Thread Czarek Kowalski
Of course I have expected the difference between theory and a sample
of realizations of RV's and result mean should still be a random
variable. But, for example for 4th element of mean vector: 35.31 -
34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
expected that the difference would be smaller because of law of large
numbers (for 10mln samples the difference is quite similar).

On 9 May 2017 at 21:40, David Winsemius  wrote:
>
>> On May 9, 2017, at 10:09 AM, Czarek Kowalski  wrote:
>>
>> Dear Members,
>> I am working with 6-dimensional Student-t distribution with 4 degrees
>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>> from truncated multivariate Student-t distribution using rtmvt
>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>> using equation (3) from attached pdf. The problem is, that after
>> summing all elements in one column of rtmvt result (and dividing by
>> 100 000) I do not receive the same result as using (3) equation. Could
>> You tell me, what is incorrect, why there is a difference?
>
> I guess the question is why you would NOT expect a difference between theory 
> and a sample of realizations of RV's? The result mean should still be a 
> random variable, night wahr?
>
>
>> Yours faithfully
>> Czarek Kowalski
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating samples from truncated multivariate Student-t distribution

2017-05-09 Thread David Winsemius

> On May 9, 2017, at 10:09 AM, Czarek Kowalski  wrote:
> 
> Dear Members,
> I am working with 6-dimensional Student-t distribution with 4 degrees
> of freedom truncated to [20; 60]. I have generated 100 000 samples
> from truncated multivariate Student-t distribution using rtmvt
> function from package ‘tmvtnorm’. I have also calculated  mean vector
> using equation (3) from attached pdf. The problem is, that after
> summing all elements in one column of rtmvt result (and dividing by
> 100 000) I do not receive the same result as using (3) equation. Could
> You tell me, what is incorrect, why there is a difference?

I guess the question is why you would NOT expect a difference between theory 
and a sample of realizations of RV's? The result mean should still be a random 
variable, night wahr?


> Yours faithfully
> Czarek Kowalski
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Representación de missing values

2017-05-09 Thread Carlos Ortega
Hola,

1. Para el scatterplot: puedes utilizar primero la función na.omit() para
quedarte con un data.frame sin NAs. Y sobre ese data.frame "limpio" hacer
el scatterplot.
2. Lo del boxplot, mira las opciones que puedes parametrizar porque una de
ellas es "na.omit". Lo de la "barra de abajo", no sé a qué te refieres...

Gracias,
Carlos Ortega
www.qualityexcellence.es

El 9 de mayo de 2017, 20:25, Joan Giménez Verdugo 
escribió:

> Hola,
>
> ayer me llegó el siguiente link:
>
> https://www.r-bloggers.com/graphical-presentation-of-
> missing-data-vim-package/
>
> Me gustaría hacer algo parecido pero sin imputar los datos. Es decir, tengo
> dos variables y para algunos casos tengo valores faltantes de una de las
> variables pero no para la otra. Por lo tanto para los datos completos lo
> quiero representar con sus dos valores en el scatterplot pero para los que
> me falta el dato de una variable los quiero representar en la barra de
> abajo y que el boxplot correspondiente me los considere.
>
> ¿Alguien me puede dar alguna idea de como hacerlo en R?
>
> Muchas gracias.
>
> --
> *Joan Giménez Verdugo*
> *PhD Student* *Severo Ochoa*
> Estación Biológica de Doñana (EBD-CSIC)
> Department of Conservation Biology
> Americo Vespucio Ave, s/n
> 41092 Sevilla (Spain)
> www.ebd.csic.es
> ---
> Research Gate: Joan Giménez
> 
> Phone: +34 619 176 849
> ü Please consider the environment before printing this E-mail
>
>
> ;
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Joining tables with different order and matched values

2017-05-09 Thread Boris Steipe
myDf1 <- data.frame(drugs = c("Ibuprofen", "Simvastatin", "Losartan"),
indications = c("pain", "hyperlipidemia", "hypertension"),
stringsAsFactors = FALSE)

myDf2 <- data.frame(drugs = c("Simvastatin", "Losartan", "Ibuprofen", 
"Metformin"),
stringsAsFactors = FALSE)

myDf3 <- merge(myDf2, myDf1, all = TRUE, sort = FALSE)


R > myDf3
drugsindications
1 Simvastatin hyperlipidemia
2Losartan   hypertension
3   Ibuprofen   pain
4   Metformin   


R > str(myDf3)
'data.frame':   4 obs. of  2 variables:
 $ drugs  : chr  "Simvastatin" "Losartan" "Ibuprofen" "Metformin"
 $ indications: chr  "hyperlipidemia" "hypertension" "pain" NA



-

Minimum working example!
Don't post in HTML!
... you should know better by now.



> On May 9, 2017, at 1:21 PM, abo dalash  wrote:
> 
> I'm repeating my question and hope to find someone to help.
> 
> 
> I have been trying for hours but without results, I have done previous 
> suggestions but still struggling.
> 
> 
> I believe that join functions in dplyr will do the work but I'm confusing 
> with the correct syntax.
> 
> 
> I have 2 tables and I'm trying to have some information from the 1st table to 
> appear in the 2nd table.
> 
> 
> let's say this is my 1st table :-
> 
> 
>> df1
> Drug name   indications
> 
> IbuprofenPain
> 
> Simvastatinhyperlipidemia
> 
> losartan   hypertension
> 
> 
> 
> my 2nd table contains the same list of drugs under the first column BUT with 
> different order :-
> 
>> df2
> Drug name   indications
> 
> 
> Simvastatin
> 
> losartan
> 
> Ibuprofen
> 
> Metformin
> 
> Simply, I want to produce a table like df1 but in the order of the 1st column 
> of my df2.
> 
> This would be like this
> 
>> joined tables
> Drug name   indications
> 
> 
> Simvastatin hyperlipidemia
> 
> losartan   hypertension
> 
> Ibuprofen   pain
> 
> MetforminN/A
> 
> 
> Please note that it is important to keep the order of drugs in df2 as it and 
> to see the appropriate indication of each drug(which is withdrawn from df1) 
> next to it under "indications" column.
> 
> 
> 
> 
> From: Ulrik Stervbo 
> Sent: 09 May 2017 06:31 PM
> To: abo dalash
> Subject: Re: [R] Joining tables with different order and matched values
> 
> Hi Abo,
> 
> Please keep the list in cc - 1) the comments are accessible to everyone, 2) 
> there is a chance that someone else might reply.
> 
> If the merge does what you intend, but you are unhappy with the order, you 
> can arrange the resulting data.frame:
> 
> df <- data.frame(x = c(5, 4,2,3,6, 1), y = letters[1:6])
> 
> df
> df[order(df$x), ]
> 
> HTH
> Ulrik
> 
> 
> 
> On Tue, 9 May 2017 at 16:17 abo dalash 
> > wrote:
> 
> 
> I still cannot produce the table I wish. I tried the following with the same 
> results.
> 
> 
> A <-merge(dt1, dt2, by = "Drug name", all.x = TRUE)
> 
> 
> A <-join_query(dt1, dt2, by = "Drug name")
> 
> This returns a table showing results with changing the order of drugs in the 
> 2nd data frame. I want to see the results under
> "indications" column without changing the order of drugs in my 2nd data 
> frame. I have been trying for many hours, so please
> help me to know what is the mistake I have done and what is the correct 
> syntax.
> 
> 
> Regards
> 
> From: Ulrik Stervbo >
> Sent: 09 May 2017 12:22 PM
> To: abo dalash; R-help
> 
> Subject: Re: [R] Joining tables with different order and matched values
> Hi Abo,
> 
> Please keep the list in cc.
> 
> I think the function documentation is pretty straight forward - two 
> data.frames are required, and if you wish to keep elements that are not 
> present in both data.frames, you set the flag all = TRUE. You also have the 
> option to specify which columns to join by.
> 
> If you need more assistance with joining two data.frames, you should provide 
> a reproducible example, and if you have trouble with a function you should 
> provide an example of what you have tried so far.
> 
> Best wishes,
> Ulrik
> 
> 
> 
> On Tue, 9 May 2017 at 10:00 abo dalash 
> > wrote:
> Could you please teach me about the correct formation of the syntax?. I hav
> n but wasn't able to formulate the correct syntax.
> 
> 
> Sent from my Samsung device
> 
> 
>  Original message 
> From: Ulrik Stervbo >
> Date: 09/05/2017 7:42 a.m. (GMT+00:00)
> To: abo dalash >, 
> "r-help@R-project.org" >
> Subject: Re: [R] Joining tables with different order and matched values
> 
> Hi Abo,
> 
> ?merge
> 
> or 

[R] How to replace missing values by mean of subgroup of a group

2017-05-09 Thread Olu Ola via R-help
 Hello,I have the following food data with some NA values in the food prices. I 
will like to replace the NA values in the food price column for each food item 
by the mean price of the specific food item for each city. For example, the 
price of bean for the household with hhid 102 in the data set is missing. I 
will like to replace the missing value with the mean price of bean for the 
households living in Paxton city (that is households 101 and 103). the data set 
is presented below. Any help will be greatly appreciated.

| hhid | city | food | food price |
| 101 | Paxton | rice | 10 |
| 101 | Paxton | beans | 30 |
| 101 | Paxton | flour | NA |
| 101 | Paxton | eggs | 20 |
| 102 | Paxton | rice | NA |
| 102 | Paxton | beans | NA |
| 102 | Paxton | flour | 34 |
| 102 | Paxton | eggs | 21 |
| 103 | Paxton | rice | 15 |
| 103 | Paxton | beans | 28 |
| 103 | Paxton | flour | 32 |
| 103 | Paxton | eggs | NA |
| 104 | Hull | rice | NA |
| 104 | Hull | beans | 34 |
| 104 | Hull | flour | NA |
| 104 | Hull | eggs | 24 |
| 105 | Hull | rice | 18 |
| 105 | Hull | beans | 38 |
| 105 | Hull | flour | 36 |
| 105 | Hull | eggs | 26 |
| 106 | Hull | rice | NA |
| 106 | Hull | beans | NA |
| 106 | Hull | flour | 40 |
| 106 | Hull | eggs | NA |


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] passing arguments to simple plotting program.

2017-05-09 Thread Gerard Smits
Seems so simple when you explain it.  Thanks very much.  Gerard


> On May 9, 2017, at 9:40 AM, Ulrik Stervbo  wrote:
> 
> Hi Gerard,
> Quotation marks are used for strings. In you function body you try to use the 
> strings "indata" and "fig_descrip" (the latter will work but is not what you 
> want).
> 
> In your current function call you pass the variable Figure as the value to 
> the argument fig_descrip, followed by a lot of other stuff your function 
> doesn't know what to do with.
> 
> Remove the quotation marks around indata and fig_descrip in the function 
> body, call your function with:
> 
> plot_f1(indata=v5, n1=114, n2=119, n3=116, fig_descrip="Figure 2a\nChange in 
> Composite Score at Visit 5 (Day 31)\nPer Protocol Population")
> 
> and you should be fine.
> 
> HTH
> 
> Ulrik
> 
> Gerard Smits > 
> schrieb am Di., 9. Mai 2017, 18:27:
> Hi Ulrik,
> 
> If I can trouble you with one more question.
> 
> Now trying to send a string to the main= .  I was able to pass the data name 
> in data=in_data, but same logic is not working in passion the main string.
> 
> 
> plot_f1 <-function(indata,n1,n2,n3,fig_descrip) {
>   par(oma=c(2,2,2,2))
>   boxplot(formula = d_comp ~ rx_grp,
>   data="indata”,# <- worked fine here.
>   main="fig_descrip",
>   ylim=c(-10,5),
>   names=c(paste0("Placebo(N=", n1,  ")"),
> paste0("Low Dose(N=", n2, ")"),
> paste0("High Dose(N=", n3,")")),
>   ylab='Change from Baseline')
>   abline(h=c(0), col="lightgray")
> }
> 
> plot_f1(indata=v5, n1=114, n2=119, n3=116, fig_descrip=Figure 2a\nChange in 
> Composite Score at Visit 5 (Day 31)\nPer Protocol Population)
> 
> Error Message: Error: unexpected numeric constant in "plot_f1(indata=v5, 
> n1=114, n2=119, n3=116, fig_descrip=Figure 2”
> 
> Even this call gives the same error:  plot_f1(indata=v5, n1=114, n2=119, 
> n3=116, fig_descrip=Figure)
> 
> 
> Thanks, 
> 
> Gerard
> 
> 
> 
> 
> 
> 
>> On May 8, 2017, at 11:40 PM, Ulrik Stervbo > > wrote:
>> 
> 
>> HI Gerard,
>> 
>> You get the literals because the variables are not implicitly expanded - 
>> 'Placebo(N=n1)  ' is just a string indicating the N = n1. 
>> 
>> What you want is to use paste() or paste0(): 
>> c(paste0("Placebo(N=", n1, ")"), paste0("Low Dose (N=", n2, ")"), 
>> paste0("High Dose (N=", n3, ")"))
>> should do it.
>> 
>> I was taught a long ago that attach() should be avoided to avoid name 
>> conflicts. Also, it makes it difficult to figure out which data is actually 
>> being used.
>> 
>> HTH
>> Ulrik
>> 
>> On Tue, 9 May 2017 at 06:44 Gerard Smits > > wrote:
>> Hi All,
>> 
>> I thought I’d try to get a function working instead of block copying code 
>> and editing. My backorund is more SAS, so using a SAS Macro would be easy, 
>> but not so lucky with R functions.
>> 
>> 
>> R being used on Mac Sierra 10.12.4:
>> 
>> R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
>> Copyright (C) 2016 The R Foundation for Statistical Computing
>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>> 
>> 
>> resp<-read.csv("//users//gerard//gs//r_work//xyz.csv", header = TRUE)
>> 
>> v5  <-subset(resp, subset=visit==5 & pp==1)
>> 
>> plot_f1 <-function(n1,n2,n3) {
>>   attach(v8)
>>   par(oma=c(2,2,2,2))
>>   boxplot(formula = d_comp ~ rx_grp,
>>   main="Figure 2\nChange in Composite Score at Visit 5 (Day 31)\nPer 
>> Protocol Population",
>>   ylim=c(-10,5),
>>   names=c('Placebo(N=n1)  ',
>>   'Low Dose(N=n2) ',
>>   'High Dose(N=n3)'),
>>   ylab='Change from Baseline')
>>   abline(h=c(0), col="lightgray")
>> }
>> 
>> plot_f1(n1=114, n2=119, n3=116)
>> 
>> The above is a simplified example where I am trying to pass 3 arguments, 
>> n1-n3, to be shown in the x-axis tables,  Instead of the numbers, I get the 
>> literal n1, n2, n3.
>> 
>> Any help appreciated.
>> 
>> Thanks,
>> 
>> Gerard
>> 
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org  mailing list -- To 
>> UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help 
>> 
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
>> 
>> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

Re: [R] passing arguments to simple plotting program.

2017-05-09 Thread Gerard Smits
Hi Ulrik,

If I can trouble you with one more question.

Now trying to send a string to the main= .  I was able to pass the data name in 
data=in_data, but same logic is not working in passion the main string.


plot_f1 <-function(indata,n1,n2,n3,fig_descrip) {
  par(oma=c(2,2,2,2))
  boxplot(formula = d_comp ~ rx_grp,
  data="indata”,# <- worked fine here.
  main="fig_descrip",
  ylim=c(-10,5),
  names=c(paste0("Placebo(N=", n1,  ")"),
  paste0("Low Dose(N=", n2, ")"),
  paste0("High Dose(N=", n3,")")),
  ylab='Change from Baseline')
  abline(h=c(0), col="lightgray")
}

plot_f1(indata=v5, n1=114, n2=119, n3=116, fig_descrip=Figure 2a\nChange in 
Composite Score at Visit 5 (Day 31)\nPer Protocol Population)

Error Message: Error: unexpected numeric constant in "plot_f1(indata=v5, 
n1=114, n2=119, n3=116, fig_descrip=Figure 2”

Even this call gives the same error:  plot_f1(indata=v5, n1=114, n2=119, 
n3=116, fig_descrip=Figure)


Thanks, 

Gerard






> On May 8, 2017, at 11:40 PM, Ulrik Stervbo  wrote:
> 
> HI Gerard,
> 
> You get the literals because the variables are not implicitly expanded - 
> 'Placebo(N=n1)  ' is just a string indicating the N = n1. 
> 
> What you want is to use paste() or paste0(): 
> c(paste0("Placebo(N=", n1, ")"), paste0("Low Dose (N=", n2, ")"), 
> paste0("High Dose (N=", n3, ")"))
> should do it.
> 
> I was taught a long ago that attach() should be avoided to avoid name 
> conflicts. Also, it makes it difficult to figure out which data is actually 
> being used.
> 
> HTH
> Ulrik
> 
> On Tue, 9 May 2017 at 06:44 Gerard Smits  > wrote:
> Hi All,
> 
> I thought I’d try to get a function working instead of block copying code and 
> editing. My backorund is more SAS, so using a SAS Macro would be easy, but 
> not so lucky with R functions.
> 
> 
> R being used on Mac Sierra 10.12.4:
> 
> R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
> Copyright (C) 2016 The R Foundation for Statistical Computing
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> 
> 
> resp<-read.csv("//users//gerard//gs//r_work//xyz.csv", header = TRUE)
> 
> v5  <-subset(resp, subset=visit==5 & pp==1)
> 
> plot_f1 <-function(n1,n2,n3) {
>   attach(v8)
>   par(oma=c(2,2,2,2))
>   boxplot(formula = d_comp ~ rx_grp,
>   main="Figure 2\nChange in Composite Score at Visit 5 (Day 31)\nPer 
> Protocol Population",
>   ylim=c(-10,5),
>   names=c('Placebo(N=n1)  ',
>   'Low Dose(N=n2) ',
>   'High Dose(N=n3)'),
>   ylab='Change from Baseline')
>   abline(h=c(0), col="lightgray")
> }
> 
> plot_f1(n1=114, n2=119, n3=116)
> 
> The above is a simplified example where I am trying to pass 3 arguments, 
> n1-n3, to be shown in the x-axis tables,  Instead of the numbers, I get the 
> literal n1, n2, n3.
> 
> Any help appreciated.
> 
> Thanks,
> 
> Gerard
> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org  mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help 
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> 
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] passing arguments to simple plotting program.

2017-05-09 Thread Gerard Smits
Hi Ulrik,

That worked perfectly.  Thanks for your help. Much appreciated.

Gerard


> On May 8, 2017, at 11:40 PM, Ulrik Stervbo  wrote:
> 
> HI Gerard,
> 
> You get the literals because the variables are not implicitly expanded - 
> 'Placebo(N=n1)  ' is just a string indicating the N = n1. 
> 
> What you want is to use paste() or paste0(): 
> c(paste0("Placebo(N=", n1, ")"), paste0("Low Dose (N=", n2, ")"), 
> paste0("High Dose (N=", n3, ")"))
> should do it.
> 
> I was taught a long ago that attach() should be avoided to avoid name 
> conflicts. Also, it makes it difficult to figure out which data is actually 
> being used.
> 
> HTH
> Ulrik
> 
> On Tue, 9 May 2017 at 06:44 Gerard Smits  > wrote:
> Hi All,
> 
> I thought I’d try to get a function working instead of block copying code and 
> editing. My backorund is more SAS, so using a SAS Macro would be easy, but 
> not so lucky with R functions.
> 
> 
> R being used on Mac Sierra 10.12.4:
> 
> R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
> Copyright (C) 2016 The R Foundation for Statistical Computing
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> 
> 
> resp<-read.csv("//users//gerard//gs//r_work//xyz.csv", header = TRUE)
> 
> v5  <-subset(resp, subset=visit==5 & pp==1)
> 
> plot_f1 <-function(n1,n2,n3) {
>   attach(v8)
>   par(oma=c(2,2,2,2))
>   boxplot(formula = d_comp ~ rx_grp,
>   main="Figure 2\nChange in Composite Score at Visit 5 (Day 31)\nPer 
> Protocol Population",
>   ylim=c(-10,5),
>   names=c('Placebo(N=n1)  ',
>   'Low Dose(N=n2) ',
>   'High Dose(N=n3)'),
>   ylab='Change from Baseline')
>   abline(h=c(0), col="lightgray")
> }
> 
> plot_f1(n1=114, n2=119, n3=116)
> 
> The above is a simplified example where I am trying to pass 3 arguments, 
> n1-n3, to be shown in the x-axis tables,  Instead of the numbers, I get the 
> literal n1, n2, n3.
> 
> Any help appreciated.
> 
> Thanks,
> 
> Gerard
> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org  mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help 
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> 
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Joining tables with different order and matched values

2017-05-09 Thread abo dalash
I'm repeating my question and hope to find someone to help.


I have been trying for hours but without results, I have done previous 
suggestions but still struggling.


I believe that join functions in dplyr will do the work but I'm confusing with 
the correct syntax.


I have 2 tables and I'm trying to have some information from the 1st table to 
appear in the 2nd table.


let's say this is my 1st table :-


>df1
Drug name   indications

 IbuprofenPain

 Simvastatinhyperlipidemia

losartan   hypertension



my 2nd table contains the same list of drugs under the first column BUT with 
different order :-

>df2
Drug name   indications


Simvastatin

losartan

Ibuprofen

Metformin

Simply, I want to produce a table like df1 but in the order of the 1st column 
of my df2.

This would be like this

>joined tables
Drug name   indications


Simvastatin hyperlipidemia

losartan   hypertension

Ibuprofen   pain

MetforminN/A


Please note that it is important to keep the order of drugs in df2 as it and to 
see the appropriate indication of each drug(which is withdrawn from df1) next 
to it under "indications" column.




From: Ulrik Stervbo 
Sent: 09 May 2017 06:31 PM
To: abo dalash
Subject: Re: [R] Joining tables with different order and matched values

Hi Abo,

Please keep the list in cc - 1) the comments are accessible to everyone, 2) 
there is a chance that someone else might reply.

If the merge does what you intend, but you are unhappy with the order, you can 
arrange the resulting data.frame:

df <- data.frame(x = c(5, 4,2,3,6, 1), y = letters[1:6])

df
df[order(df$x), ]

HTH
Ulrik



On Tue, 9 May 2017 at 16:17 abo dalash 
> wrote:


I still cannot produce the table I wish. I tried the following with the same 
results.


A <-merge(dt1, dt2, by = "Drug name", all.x = TRUE)


A <-join_query(dt1, dt2, by = "Drug name")

This returns a table showing results with changing the order of drugs in the 
2nd data frame. I want to see the results under
"indications" column without changing the order of drugs in my 2nd data frame. 
I have been trying for many hours, so please
help me to know what is the mistake I have done and what is the correct syntax.


Regards

From: Ulrik Stervbo >
Sent: 09 May 2017 12:22 PM
To: abo dalash; R-help

Subject: Re: [R] Joining tables with different order and matched values
Hi Abo,

Please keep the list in cc.

I think the function documentation is pretty straight forward - two data.frames 
are required, and if you wish to keep elements that are not present in both 
data.frames, you set the flag all = TRUE. You also have the option to specify 
which columns to join by.

If you need more assistance with joining two data.frames, you should provide a 
reproducible example, and if you have trouble with a function you should 
provide an example of what you have tried so far.

Best wishes,
Ulrik



On Tue, 9 May 2017 at 10:00 abo dalash 
> wrote:
Could you please teach me about the correct formation of the syntax?. I hav
n but wasn't able to formulate the correct syntax.


Sent from my Samsung device


 Original message 
From: Ulrik Stervbo >
Date: 09/05/2017 7:42 a.m. (GMT+00:00)
To: abo dalash >, 
"r-help@R-project.org" >
Subject: Re: [R] Joining tables with different order and matched values

Hi Abo,

?merge

or the join functions from dplyr.

HTH
Ulrik

On Tue, 9 May 2017 at 06:44 abo dalash 
> wrote:
Hi All ..,


I have 2 tables and I'm trying to have some information from the 1st table to 
appear in the second table with different order.


For Example, let's say this is my 1st table :-



Drug name   indications

 IbuprofenPain

 Simvastatinhyperlipidemia

losartan   hypertension



my 2nd table is in different order for the 1st column :-


Drug name   indications


Simvastatin

losartan

Ibuprofen

Metformin


I wish to see the indication of each drug in my 2nd table subsisted from the 
information in my 1st table so the final table

would be like this


Drug name   indications


Simvastatin hyperlipidemia

losartan   hypertension

Ibuprofen   pain

MetforminN/A


I have been trying to use Sqldf package and right join function but not able to 
formulate the correct syntax.


I'm also trying to identify rows contain at least one shared value  in a 
dataset called 'Values":


>Values

A B

1,2,5   3,8,7

2,4,6   7,6,3



Columns A & B in the first row 

Re: [R] loading edited functions already in saved workspace automatically

2017-05-09 Thread Ralf Goertz
Am Tue, 09 May 2017 10:00:17 -0700
schrieb Jeff Newmiller :

> This boils down to the fact that some "my ways" are more effective in
> the long run than others.. but I really want to address the complaint
> 
> "... sometimes tedious to rebuild my environment by reexecuting
> commands in the history"
> 
> by asserting that letting R re-run a script that loads my functions
> and packages (though perhaps not the data analysis steps) is always
> very fast and convenient to do explicitly. I almost never use the
> history file feature, because I type nearly every R instruction I use
> into a script file and execute/edit it until it does what I want. I
> keep functions in a separate file or package, and steps dealing with
> a particular data set in their own file that uses source() to load
> the functions) even when I am executing the lines interactively. My
> goal is to regularly re-execute the whole script so that
> tomorrow/next year/whenever someone notices something was wrong then
> I can re-execute the sequence without following the dead ends I went
> down the first time (as using a history file does) and I don't have a
> separate clean-up-the-history-file step to go through to create it.
> When I have confirmed that the script still works as it did before
> then I can find where the analysis/data problem went wrong and fix it. 

My usual work with R is probably a bit different from yours. As I said
before I work on many projects (often simultaneously) but I do routine
work. For that I have my super function, the one I want to reload every
time R starts, at the moment about 250 lines of code. This is always
work in progress. In almost every project there is something that makes
me edit this function. But in order to apply my function I need to
prepare the data, e.g. getting them from a database or csv files,
renaming the columns of data.frames etc. This is all tedious and not
worth putting in scripts because these steps are very specific to the
project and are rarely needed more than once. Sometimes one or two data
records in project on which I worked a few days before turn out to be
wrong and need to be changed. That's why I want to keep the data because
changing the data.frame directly is much easier then starting from
scratch. Meanwhile my function has evolved. But in the .RData file is
still the old version, which is bad.

However, I found a solution! .Last() gets executed before saving here,
too. I simply had forgotten that I need to use rm() with pos=1, i.e.
rm(myfun,pos=1) because otherwise rm wants to delete myfun from within
the context of the function .Last() where it doesn't live. I changed my
.Rprofile to:

.First=function(){
assign("myfun",eval(parse(file=("~/R/myfun.R"))),pos=1)
}
.Last=function(){
rm(.First,pos=1)
rm(myfun,pos=1)
rm(.Last,pos=1)
}

and everything works as I want it. So no design flaw but still way too
complicated in my opinion. Thanks to everybody who came up with
suggestions.

Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading edited functions already in saved workspace automatically

2017-05-09 Thread Jeff Newmiller
This boils down to the fact that some "my ways" are more effective in the long 
run than others.. but I really want to address the complaint

"... sometimes tedious to rebuild my environment by reexecuting commands in the 
history"

by asserting that letting R re-run a script that loads my functions and 
packages (though perhaps not the data analysis steps) is always very fast and 
convenient to do explicitly. I almost never use the history file feature, 
because I type nearly every R instruction I use into a script file and 
execute/edit it until it does what I want. I keep functions in a separate file 
or package, and steps dealing with a particular data set in their own file that 
uses source() to load the functions) even when I am executing the lines 
interactively. My goal is to regularly re-execute the whole script so that 
tomorrow/next year/whenever someone notices something was wrong then I can 
re-execute the sequence without following the dead ends I went down the first 
time (as using a history file does) and I don't have a separate 
clean-up-the-history-file step to go through to create it. When I have 
confirmed that the script still works as it did before then I can find where 
the analysis/data problem went wrong and fix it. 

This does not mean I never use RData files to reduce how often I re-do slow 
calculations... but it does mean that I always have my script that loads the 
necessary packages and functions rather than using versions of functions in the 
RData file. It is useful to avoid becoming dependent on saved intermediate 
saved data files so you don't continue to encounter the effects of script 
errors that you have already fixed earlier in the analysis. 

It becomes more convenient to minimize dependency on automatic startup behavior 
when you want to share your script with someone else or run it yourself on a 
different computer (say, a powerful server). If you have the script habit then 
these hiccups with moving around are non-issues, and you can perform more and 
more complex analyses over time because you don't have to remember all the 
individual steps nor do you have to sort through the dead ends in your history 
file over and over. 

The editor you use can make a huge difference in making this work... get one 
that has a hot key that lets you execute one line at a time straight from the 
editor rather than requiring an explicit copy/paste. RStudio, Notepad++/NppToR, 
IntelliJ IDEA, vim-r, and ESS are a few options I am aware of. RStudio also 
supports full screen debugging of R so you can more easily reproduce the exact 
conditions where things go wrong inside functions as well. 

-- 
Sent from my phone. Please excuse my brevity.

On May 9, 2017 8:49:22 AM PDT, Michael Friendly  wrote:
>Ralf:
>
>You are afflicted with several mind bugs:
>* the "my-way mind bug" -- "I want to do it MY WAY, because that's sort
>
>of what
>I know" and also,
>* the "my-square-peg-should-fit-into-this-round-hole mind bug" -- "R 
>should be able to
>do it MY WAY, but it puts obstacles in my path," perhaps a subsidiary, 
>but more technical:
>* the "loading-a-function-or-data-is-the-same mind bug"
>As in many things R, you can't always get to MY WAY from there, at
>least 
>not without a tortuous journey.
>
>You think you should be able to do everything you want in .Rprofile,
>but 
>then you posed two separate problems:
>(a) save/reload history
>(b) save/reload functions and data
>
>If you recognize them as two separate problems, there is an easier
>path:
>(a) use .Rprofile only for making your history persistent, as I
>described
>(b) Put your functions & data you always want available in a package; 
>you can load it from .Rprofile
>
>I originally defined a bunch of handy functions (e.g., cd(), a setwd() 
>replacement, that works more like `cd` on unix, in that `cd()` returns 
>to the previous directory; it also changes the Windows title to
>`RGui:`  abbreviation of getwd() )
>
>I moved them all out of .Rprofile, made a package `myutil` and now load
>
>them from there with
>
>  #==
>  # load default packages
>  #==
>  if (!require(myutil)) warning("myutil functions not available")
>
>hope this helps,
>-Michael
>
>On 5/9/2017 10:20 AM, Ralf Goertz wrote:
>> Am Sat, 6 May 2017 11:17:42 -0400
>> schrieb Michael Friendly :
>>
>>> On 5/5/2017 10:23 AM, Ralf Goertz wrote:
 Am Fri, 05 May 2017 07:14:36 -0700
 schrieb Jeff Newmiller :
   
> R normally prompts you to save .RData, but it just automatically
> saves .Rhistory... the two are unrelated.
 Not here. If I say "n" to the prompted question "Save workspace
 image? [y/n/c]: " my history doesn't get saved.

 Version:

 R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
 Copyright (C) 2016 The R Foundation for Statistical Computing
 Platform: x86_64-suse-linux-gnu (64-bit)
   
>>> On 

Re: [R] Problem with choose.files(default=..., multi=FALSE)

2017-05-09 Thread Duncan Murdoch

On 09/05/2017 12:06 PM, Keith Jewell wrote:

I'm very hesitant to suggest that there's a bug in such a venerable R
function, but I can't see what I'm doing wrong. Any comments are welcome


Yes, it looks like a bug.  One other thing I find a little strange: the 
starting directory seems wrong when I have the pathlong default.  Did 
you see that?  (I'm in Windows 10, not the same version as you.)


Duncan Murdoch



When using choose.files() where:
 default = something
 multi = FALSE
 selected file path is shorter than the default
... then the returned value is at least as long as the default,
characters from default appearing (wrongly) at the end of the returned
value.

Example, in which all but the first choose.files() select
"M:\\test\\target.dat". Note the last result.

 > pathlong <- choose.files(caption = "long")
 > pathlong # long file name to use as default for short selection
[1]
"M:\\test\\Averyveryveryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\target.dat"
 > choose.files(caption = "short")  # no default without multi works
[1] "M:\\test\\target.dat"
 > choose.files(default=pathlong, caption = "short") # default without
multi= works
[1] "M:\\test\\target.dat"
 > choose.files(caption = "short", multi = FALSE) # multi = FALSE
without default works
[1] "M:\\test\\target.dat"
 > choose.files(default=pathlong, caption = "short", multi = TRUE) #
multi = TRUE with default works
[1] "M:\\test\\target.dat"
 > choose.files(default=pathlong, caption = "short", multi = FALSE) #
multi = FALSE with default fails
[1]
"M:\\test\\target.dat\\ryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\target.dat"

 > # in case it's relevant
 > sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C

[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] graphics  grDevices datasets  stats tcltk utils tools
   methods
[9] base

other attached packages:
  [1] CBRIutils_1.0   stringr_1.2.0   svSocket_0.9-57 TinnR_1.0-5
R2HTML_2.3.2
  [6] Hmisc_4.0-3 ggplot2_2.2.1   Formula_1.2-1   survival_2.41-3
lattice_0.20-35

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-2  htmlTable_1.9   digest_0.6.12
htmltools_0.3.6
  [5] splines_3.4.0   scales_0.4.1grid_3.4.0
checkmate_1.8.2
  [9] devtools_1.12.0 knitr_1.15.1munsell_0.4.3
compiler_3.4.0
[13] tibble_1.3.0nnet_7.3-12 acepack_1.4.1
Matrix_1.2-10
[17] svMisc_0.9-70   plyr_1.8.4  base64enc_0.1-3
data.table_1.10.4
[21] stringi_1.1.5   magrittr_1.5gtable_0.2.0
colorspace_1.3-2
[25] foreign_0.8-68  cluster_2.0.6   gridExtra_2.2.1
htmlwidgets_0.8
[29] withr_1.0.2 lazyeval_0.2.0  backports_1.0.5
memoise_1.1.0
[33] rpart_4.1-11Rcpp_0.12.10latticeExtra_0.6-28
 >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing arguments to simple plotting program.

2017-05-09 Thread Ulrik Stervbo
Hi Gerard,

Quotation marks are used for strings. In you function body you try to use
the strings "indata" and "fig_descrip" (the latter will work but is not
what you want).

In your current function call you pass the variable Figure as the value to
the argument fig_descrip, followed by a lot of other stuff your function
doesn't know what to do with.

Remove the quotation marks around indata and fig_descrip in the function
body, call your function with:

plot_f1(indata=v5, n1=114, n2=119, n3=116, fig_descrip="Figure 2a\nChange
in Composite Score at Visit 5 (Day 31)\nPer Protocol Population")

and you should be fine.

HTH

Ulrik
Gerard Smits  schrieb am Di., 9. Mai 2017, 18:27:

> Hi Ulrik,
>
> If I can trouble you with one more question.
>
> Now trying to send a string to the main= .  I was able to pass the data
> name in data=in_data, but same logic is not working in passion the main
> string.
>
>
> plot_f1 <-function(indata,n1,n2,n3,fig_descrip) {
>   par(oma=c(2,2,2,2))
>   boxplot(formula = d_comp ~ rx_grp,
>   data="indata”,# <- worked fine here.
>   main="fig_descrip",
>   ylim=c(-10,5),
>   names=c(paste0("Placebo(N=", n1,  ")"),
>  paste0("Low Dose(N=", n2, ")"),
>  paste0("High Dose(N=", n3,")")),
>   ylab='Change from Baseline')
>   abline(h=c(0), col="lightgray")
> }
>
> plot_f1(indata=v5, n1=114, n2=119, n3=116, fig_descrip=Figure 2a\nChange
> in Composite Score at Visit 5 (Day 31)\nPer Protocol Population)
>
> Error Message: Error: unexpected numeric constant in "plot_f1(indata=v5,
> n1=114, n2=119, n3=116, fig_descrip=Figure 2”
>
> Even this call gives the same error:  plot_f1(indata=v5, n1=114, n2=119,
> n3=116, fig_descrip=Figure)
>
>
> Thanks,
>
> Gerard
>
>
>
>
>
>
> On May 8, 2017, at 11:40 PM, Ulrik Stervbo 
> wrote:
>
> HI Gerard,
>
> You get the literals because the variables are not implicitly expanded -
> 'Placebo(N=n1)  ' is just a string indicating the N = n1.
>
> What you want is to use paste() or paste0():
> c(paste0("Placebo(N=", n1, ")"), paste0("Low Dose (N=", n2, ")"),
> paste0("High Dose (N=", n3, ")"))
> should do it.
>
> I was taught a long ago that attach() should be avoided to avoid name
> conflicts. Also, it makes it difficult to figure out which data is actually
> being used.
>
> HTH
> Ulrik
>
> On Tue, 9 May 2017 at 06:44 Gerard Smits  wrote:
>
>> Hi All,
>>
>> I thought I’d try to get a function working instead of block copying code
>> and editing. My backorund is more SAS, so using a SAS Macro would be easy,
>> but not so lucky with R functions.
>>
>>
>> R being used on Mac Sierra 10.12.4:
>>
>> R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
>> Copyright (C) 2016 The R Foundation for Statistical Computing
>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>
>>
>> resp<-read.csv("//users//gerard//gs//r_work//xyz.csv", header = TRUE)
>>
>> v5  <-subset(resp, subset=visit==5 & pp==1)
>>
>> plot_f1 <-function(n1,n2,n3) {
>>   attach(v8)
>>   par(oma=c(2,2,2,2))
>>   boxplot(formula = d_comp ~ rx_grp,
>>   main="Figure 2\nChange in Composite Score at Visit 5 (Day
>> 31)\nPer Protocol Population",
>>   ylim=c(-10,5),
>>   names=c('Placebo(N=n1)  ',
>>   'Low Dose(N=n2) ',
>>   'High Dose(N=n3)'),
>>   ylab='Change from Baseline')
>>   abline(h=c(0), col="lightgray")
>> }
>>
>> plot_f1(n1=114, n2=119, n3=116)
>>
>> The above is a simplified example where I am trying to pass 3 arguments,
>> n1-n3, to be shown in the x-axis tables,  Instead of the numbers, I get the
>> literal n1, n2, n3.
>>
>> Any help appreciated.
>>
>> Thanks,
>>
>> Gerard
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About calculating average values from several matrices

2017-05-09 Thread William Michels via R-help
Dear Lily,

Harold is telling you to type "?round" at the R command prompt to pull
up the "round" help page.

>?round
>help("round")

AFAIK, the above two commands are equivalent, in general.

Best, Bill.

W. Michels, Ph.D.



On Tue, May 9, 2017 at 8:11 AM, Doran, Harold  wrote:
> ?round
>
>
> From: lily li [mailto:chocol...@gmail.com]
> Sent: Tuesday, May 09, 2017 11:10 AM
> To: Charles Determan 
> Cc: Doran, Harold ; R mailing list 
> Subject: Re: [R] About calculating average values from several matrices
>
> Thanks very much, it works. But how to round the values to have only 1 
> decimal digit or 2 decimal digits? I think by dividing, the values are double 
> type now. Thanks again.
>
>
> On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
> > wrote:
> If you want the mean of each element across you list of matrices the 
> following should provide what you are looking for where Reduce sums all your 
> matrix elements across matrices and the simply divided my the number of 
> matrices for the element-wise mean.
>
> Reduce(`+`, mylist)/length(mylist)
> Regards,
> Charles
>
> On Tue, May 9, 2017 at 9:52 AM, lily li 
> > wrote:
> I meant for each cell, it takes the average from other dataframes at the
> same cell. I don't know how to deal with row names and col names though, so
> it has the error message.
>
> On Tue, May 9, 2017 at 8:50 AM, Doran, Harold 
> > wrote:
>
>> It’s not clear to me what your actual structure is. Can you provide
>> str(object)? Assuming it is a list, and you want the mean over all cells or
>> columns, you might want like this:
>>
>>
>>
>> myData <- vector("list", 3)
>>
>>
>>
>> for(i in 1:3){
>>
>> myData[[i]] <- matrix(rnorm(100), 10, 10)
>>
>> }
>>
>>
>>
>> ### mean over all cells
>>
>> sapply(myData, function(x) mean(x))
>>
>>
>>
>> ### mean over all columns
>>
>> sapply(myData, function(x) colMeans(x))
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* lily li [mailto:chocol...@gmail.com]
>> *Sent:* Tuesday, May 09, 2017 10:44 AM
>> *To:* Doran, Harold >
>> *Cc:* R mailing list >
>> *Subject:* Re: [R] About calculating average values from several matrices
>
>>
>>
>>
>> I'm trying to get a new dataframe or whatever to call, which has the same
>> structure with each file as listed above. For each cell in the new
>> dataframe or the new file, it is the average value from former dataframes
>> at the same location. Thanks.
>>
>>
>>
>> On Tue, May 9, 2017 at 8:41 AM, Doran, Harold 
>> > wrote:
>>
>> Are you trying to take the mean over all cells, or over rows/columns
>> within each dataframe. Also, are these different dataframes stored within a
>> list or are they standalone?
>>
>>
>>
>>
>> -Original Message-
>> From: R-help 
>> [mailto:r-help-boun...@r-project.org] 
>> On Behalf Of lily li
>> Sent: Tuesday, May 09, 2017 10:39 AM
>> To: R mailing list >
>> Subject: [R] About calculating average values from several matrices
>>
>> Hi R users,
>>
>> I have a question about manipulating the data.
>> For example, there are several such data frames or matrices, and I want to
>> calculate the average value from all the data frames or matrices. How to do
>> it? Also, should I convert them to data frame or matrix first? Right now,
>> when I use typeof() function, each one is a list.
>>
>> file1
>> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>>
>> app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
>> app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
>> app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1
>>
>> file2
>> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>>
>> app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
>> app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
>> app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2
>>
>> file3 has the similar structure and values...
>>
>> There are eight such files, and when I use the function mean(file1, file2,
>> file3, ..., file8), it returns the error below. Thanks for your help.
>>
>> Warning message:
>> In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>>   argument is not numeric or logical: returning NA
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To 
>> UNSUBSCRIBE and more, see
>> 

[R] Problem with choose.files(default=..., multi=FALSE)

2017-05-09 Thread Keith Jewell
I'm very hesitant to suggest that there's a bug in such a venerable R 
function, but I can't see what I'm doing wrong. Any comments are welcome


When using choose.files() where:
default = something
multi = FALSE
selected file path is shorter than the default
... then the returned value is at least as long as the default, 
characters from default appearing (wrongly) at the end of the returned 
value.


Example, in which all but the first choose.files() select 
"M:\\test\\target.dat". Note the last result.


> pathlong <- choose.files(caption = "long")
> pathlong # long file name to use as default for short selection
[1] 
"M:\\test\\Averyveryveryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\target.dat"

> choose.files(caption = "short")  # no default without multi works
[1] "M:\\test\\target.dat"
> choose.files(default=pathlong, caption = "short") # default without 
multi= works

[1] "M:\\test\\target.dat"
> choose.files(caption = "short", multi = FALSE) # multi = FALSE 
without default works

[1] "M:\\test\\target.dat"
> choose.files(default=pathlong, caption = "short", multi = TRUE) # 
multi = TRUE with default works

[1] "M:\\test\\target.dat"
> choose.files(default=pathlong, caption = "short", multi = FALSE) # 
multi = FALSE with default fails
[1] 
"M:\\test\\target.dat\\ryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\Averyveryveryveryverylongfoldername\\target.dat"


> # in case it's relevant
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United 
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C 


[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] graphics  grDevices datasets  stats tcltk utils tools 
  methods

[9] base

other attached packages:
 [1] CBRIutils_1.0   stringr_1.2.0   svSocket_0.9-57 TinnR_1.0-5 
R2HTML_2.3.2
 [6] Hmisc_4.0-3 ggplot2_2.2.1   Formula_1.2-1   survival_2.41-3 
lattice_0.20-35


loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2  htmlTable_1.9   digest_0.6.12 
htmltools_0.3.6
 [5] splines_3.4.0   scales_0.4.1grid_3.4.0 
checkmate_1.8.2
 [9] devtools_1.12.0 knitr_1.15.1munsell_0.4.3 
compiler_3.4.0
[13] tibble_1.3.0nnet_7.3-12 acepack_1.4.1 
Matrix_1.2-10
[17] svMisc_0.9-70   plyr_1.8.4  base64enc_0.1-3 
data.table_1.10.4
[21] stringi_1.1.5   magrittr_1.5gtable_0.2.0 
colorspace_1.3-2
[25] foreign_0.8-68  cluster_2.0.6   gridExtra_2.2.1 
htmlwidgets_0.8
[29] withr_1.0.2 lazyeval_0.2.0  backports_1.0.5 
memoise_1.1.0

[33] rpart_4.1-11Rcpp_0.12.10latticeExtra_0.6-28
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading edited functions already in saved workspace automatically

2017-05-09 Thread Michael Friendly

Ralf:

You are afflicted with several mind bugs:
* the "my-way mind bug" -- "I want to do it MY WAY, because that's sort 
of what

I know" and also,
* the "my-square-peg-should-fit-into-this-round-hole mind bug" -- "R 
should be able to
do it MY WAY, but it puts obstacles in my path," perhaps a subsidiary, 
but more technical:

* the "loading-a-function-or-data-is-the-same mind bug"
As in many things R, you can't always get to MY WAY from there, at least 
not without a tortuous journey.


You think you should be able to do everything you want in .Rprofile, but 
then you posed two separate problems:

(a) save/reload history
(b) save/reload functions and data

If you recognize them as two separate problems, there is an easier path:
(a) use .Rprofile only for making your history persistent, as I described
(b) Put your functions & data you always want available in a package; 
you can load it from .Rprofile


I originally defined a bunch of handy functions (e.g., cd(), a setwd() 
replacement, that works more like `cd` on unix, in that `cd()` returns 
to the previous directory; it also changes the Windows title to

`RGui:`  abbreviation of getwd() )

I moved them all out of .Rprofile, made a package `myutil` and now load 
them from there with


 #==
 # load default packages
 #==
 if (!require(myutil)) warning("myutil functions not available")

hope this helps,
-Michael

On 5/9/2017 10:20 AM, Ralf Goertz wrote:

Am Sat, 6 May 2017 11:17:42 -0400
schrieb Michael Friendly :


On 5/5/2017 10:23 AM, Ralf Goertz wrote:

Am Fri, 05 May 2017 07:14:36 -0700
schrieb Jeff Newmiller :
  

R normally prompts you to save .RData, but it just automatically
saves .Rhistory... the two are unrelated.

Not here. If I say "n" to the prompted question "Save workspace
image? [y/n/c]: " my history doesn't get saved.

Version:

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-suse-linux-gnu (64-bit)
  

On Windoze, here's what I use in my .Rprofile, which runs every time
I start an RGUI coonsole.  The key is .First & .Last to load/save
history automagically.

Hi Michael,

thanks. This helps with saving the history without saving the data. But
actually I'd really like to save both and still be able to load
functions automatically from .Rprofile. Not saving the data as Jeff
suggested is not a good option because it is sometimes tedious to
rebuild my environment by reexecuting commands in the history. And I
explained in my OP why I can't use .First() to achieve my goal.

But let me try again to explain the problem because I think not
everybody understood what I was trying to say. For simplicity I use the
plain variable "a" instead of a function. Start a fresh session and
remove all variables, define one variable and quit with saving:


rm(list=ls())
a=17
quit(save="yes")

Now, before opening a new session edit .Rprofile such that it contains
just the two lines:

print("Hello from .Rprofile")
a=42

Start a new session where your saved environment will be loaded.
Observe that you see the line

[1] "Hello from .Rprofile"

proving that the commands in .Rprofile have been executed. Now look at
"a":


a

[1] 17


You would expect to see this because *after* your "Hello" line you find

[Previously saved workspace restored]

So you have set "a" to 42 in .Rprofile but it gets overwritten from the
previously saved and now restored workspace. On the other hand, .First()
gets executed after the restoring of the workspace. Therefore, I could
edit .Rprofile to read

.First=function(){ assign("a",42,pos=1) }

Now, after starting I see that "a" is indeed 42. But then it turns out
that from now on I need "a" to be 11. After editing .Rprofile
accordingly, I am quite hopeful but after starting a new session I see
that "a" is still 42. Why is that? Because .First() was saved and when I
started a new session it got a new function body (setting "a" to 11) but
before it could be executed it was again overwritten by the old value
(setting "a" to 42) and I am chasing my own tail. Sigh.

.Last() doesn't help. Apparently (at least on my linux system) it is
executed *after* saving the environment so too late to remove anything
you don't want saved. In that regard linux doesn't seem to be typical,
since in "?.Last" the reverse order is described as typical:

  Exactly what happens at termination of an R session depends on the
  platform and GUI interface in use.  A typical sequence is to run
  ‘.Last()’ and ‘.Last.sys()’ (unless ‘runLast’ is false), to save
  the workspace if requested (and in most cases also to save the
  session history: see ‘savehistory’), then run any finalizers (see
  ‘reg.finalizer’) that have been set to be run on exit, close all
  open graphics devices, remove the session temporary directory and
  print any remaining warnings (e.g., from 

Re: [R] About calculating average values from several matrices

2017-05-09 Thread lily li
yes, I just tried for the dataframe and it works, so there is no problem on
this side.

On Tue, May 9, 2017 at 9:14 AM, Doran, Harold  wrote:

> Im not sure if you’re asking a question or confirming that it works for
> you. But, obviously, the code below behaves as expected
>
>
>
> *From:* lily li [mailto:chocol...@gmail.com]
> *Sent:* Tuesday, May 09, 2017 11:13 AM
> *To:* Doran, Harold 
> *Cc:* Charles Determan ; R mailing list <
> r-help@r-project.org>
>
> *Subject:* Re: [R] About calculating average values from several matrices
>
>
>
> Yes, that means to control decimal numbers. For example, use round(2.3122,
> digits=1), it gets 2.3
>
>
>
> On Tue, May 9, 2017 at 9:11 AM, Doran, Harold  wrote:
>
> ?round
>
>
>
>
>
> *From:* lily li [mailto:chocol...@gmail.com]
> *Sent:* Tuesday, May 09, 2017 11:10 AM
> *To:* Charles Determan 
> *Cc:* Doran, Harold ; R mailing list  >
> *Subject:* Re: [R] About calculating average values from several matrices
>
>
>
> Thanks very much, it works. But how to round the values to have only 1
> decimal digit or 2 decimal digits? I think by dividing, the values are
> double type now. Thanks again.
>
>
>
>
>
> On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
> wrote:
>
> If you want the mean of each element across you list of matrices the
> following should provide what you are looking for where Reduce sums all
> your matrix elements across matrices and the simply divided my the number
> of matrices for the element-wise mean.
>
> Reduce(`+`, mylist)/length(mylist)
>
> Regards,
>
> Charles
>
>
>
> On Tue, May 9, 2017 at 9:52 AM, lily li  wrote:
>
> I meant for each cell, it takes the average from other dataframes at the
> same cell. I don't know how to deal with row names and col names though, so
> it has the error message.
>
> On Tue, May 9, 2017 at 8:50 AM, Doran, Harold  wrote:
>
> > It’s not clear to me what your actual structure is. Can you provide
> > str(object)? Assuming it is a list, and you want the mean over all cells
> or
> > columns, you might want like this:
> >
> >
> >
> > myData <- vector("list", 3)
> >
> >
> >
> > for(i in 1:3){
> >
> > myData[[i]] <- matrix(rnorm(100), 10, 10)
> >
> > }
> >
> >
> >
> > ### mean over all cells
> >
> > sapply(myData, function(x) mean(x))
> >
> >
> >
> > ### mean over all columns
> >
> > sapply(myData, function(x) colMeans(x))
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *From:* lily li [mailto:chocol...@gmail.com]
> > *Sent:* Tuesday, May 09, 2017 10:44 AM
> > *To:* Doran, Harold 
> > *Cc:* R mailing list 
> > *Subject:* Re: [R] About calculating average values from several matrices
>
>
> >
> >
> >
> > I'm trying to get a new dataframe or whatever to call, which has the same
> > structure with each file as listed above. For each cell in the new
> > dataframe or the new file, it is the average value from former dataframes
> > at the same location. Thanks.
> >
> >
> >
> > On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:
> >
> > Are you trying to take the mean over all cells, or over rows/columns
> > within each dataframe. Also, are these different dataframes stored
> within a
> > list or are they standalone?
> >
> >
> >
> >
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
> > Sent: Tuesday, May 09, 2017 10:39 AM
> > To: R mailing list 
>
> > Subject: [R] About calculating average values from several matrices
> >
> > Hi R users,
> >
> > I have a question about manipulating the data.
> > For example, there are several such data frames or matrices, and I want
> to
> > calculate the average value from all the data frames or matrices. How to
> do
> > it? Also, should I convert them to data frame or matrix first? Right now,
> > when I use typeof() function, each one is a list.
> >
> > file1
> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>  nov
> >
> > app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2
>  1.1
> > app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8
>  1.8
> > app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2
>  4.1
> >
> > file2
> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>  nov
> >
> > app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5
>  1.6
> > app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2
>  1.4
> > app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3
>  4.2
> >
> > file3 has the similar structure and values...
> >
> > There are eight such files, and when I use the function mean(file1,
> file2,
> > file3, ..., file8), it returns the error below. Thanks for your help.
> >
> > Warning message:

Re: [R] About calculating average values from several matrices

2017-05-09 Thread Doran, Harold
Im not sure if you’re asking a question or confirming that it works for you. 
But, obviously, the code below behaves as expected

From: lily li [mailto:chocol...@gmail.com]
Sent: Tuesday, May 09, 2017 11:13 AM
To: Doran, Harold 
Cc: Charles Determan ; R mailing list 

Subject: Re: [R] About calculating average values from several matrices

Yes, that means to control decimal numbers. For example, use round(2.3122, 
digits=1), it gets 2.3

On Tue, May 9, 2017 at 9:11 AM, Doran, Harold 
> wrote:
?round


From: lily li [mailto:chocol...@gmail.com]
Sent: Tuesday, May 09, 2017 11:10 AM
To: Charles Determan >
Cc: Doran, Harold >; R mailing list 
>
Subject: Re: [R] About calculating average values from several matrices

Thanks very much, it works. But how to round the values to have only 1 decimal 
digit or 2 decimal digits? I think by dividing, the values are double type now. 
Thanks again.


On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
> wrote:
If you want the mean of each element across you list of matrices the following 
should provide what you are looking for where Reduce sums all your matrix 
elements across matrices and the simply divided my the number of matrices for 
the element-wise mean.

Reduce(`+`, mylist)/length(mylist)
Regards,
Charles

On Tue, May 9, 2017 at 9:52 AM, lily li 
> wrote:
I meant for each cell, it takes the average from other dataframes at the
same cell. I don't know how to deal with row names and col names though, so
it has the error message.

On Tue, May 9, 2017 at 8:50 AM, Doran, Harold 
> wrote:

> It’s not clear to me what your actual structure is. Can you provide
> str(object)? Assuming it is a list, and you want the mean over all cells or
> columns, you might want like this:
>
>
>
> myData <- vector("list", 3)
>
>
>
> for(i in 1:3){
>
> myData[[i]] <- matrix(rnorm(100), 10, 10)
>
> }
>
>
>
> ### mean over all cells
>
> sapply(myData, function(x) mean(x))
>
>
>
> ### mean over all columns
>
> sapply(myData, function(x) colMeans(x))
>
>
>
>
>
>
>
>
>
>
>
> *From:* lily li [mailto:chocol...@gmail.com]
> *Sent:* Tuesday, May 09, 2017 10:44 AM
> *To:* Doran, Harold >
> *Cc:* R mailing list >
> *Subject:* Re: [R] About calculating average values from several matrices

>
>
>
> I'm trying to get a new dataframe or whatever to call, which has the same
> structure with each file as listed above. For each cell in the new
> dataframe or the new file, it is the average value from former dataframes
> at the same location. Thanks.
>
>
>
> On Tue, May 9, 2017 at 8:41 AM, Doran, Harold 
> > wrote:
>
> Are you trying to take the mean over all cells, or over rows/columns
> within each dataframe. Also, are these different dataframes stored within a
> list or are they standalone?
>
>
>
>
> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf Of lily li
> Sent: Tuesday, May 09, 2017 10:39 AM
> To: R mailing list >
> Subject: [R] About calculating average values from several matrices
>
> Hi R users,
>
> I have a question about manipulating the data.
> For example, there are several such data frames or matrices, and I want to
> calculate the average value from all the data frames or matrices. How to do
> it? Also, should I convert them to data frame or matrix first? Right now,
> when I use typeof() function, each one is a list.
>
> file1
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
> app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
> app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1
>
> file2
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
> app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
> app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2
>
> file3 has the similar structure and values...
>
> There are eight such files, and when I use the function mean(file1, file2,
> file3, ..., file8), it returns the error below. Thanks for your help.
>
> Warning message:
> In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>   argument is not numeric or logical: 

Re: [R] About calculating average values from several matrices

2017-05-09 Thread lily li
Yes, that means to control decimal numbers. For example, use round(2.3122,
digits=1), it gets 2.3

On Tue, May 9, 2017 at 9:11 AM, Doran, Harold  wrote:

> ?round
>
>
>
>
>
> *From:* lily li [mailto:chocol...@gmail.com]
> *Sent:* Tuesday, May 09, 2017 11:10 AM
> *To:* Charles Determan 
> *Cc:* Doran, Harold ; R mailing list  >
> *Subject:* Re: [R] About calculating average values from several matrices
>
>
>
> Thanks very much, it works. But how to round the values to have only 1
> decimal digit or 2 decimal digits? I think by dividing, the values are
> double type now. Thanks again.
>
>
>
>
>
> On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
> wrote:
>
> If you want the mean of each element across you list of matrices the
> following should provide what you are looking for where Reduce sums all
> your matrix elements across matrices and the simply divided my the number
> of matrices for the element-wise mean.
>
> Reduce(`+`, mylist)/length(mylist)
>
> Regards,
>
> Charles
>
>
>
> On Tue, May 9, 2017 at 9:52 AM, lily li  wrote:
>
> I meant for each cell, it takes the average from other dataframes at the
> same cell. I don't know how to deal with row names and col names though, so
> it has the error message.
>
> On Tue, May 9, 2017 at 8:50 AM, Doran, Harold  wrote:
>
> > It’s not clear to me what your actual structure is. Can you provide
> > str(object)? Assuming it is a list, and you want the mean over all cells
> or
> > columns, you might want like this:
> >
> >
> >
> > myData <- vector("list", 3)
> >
> >
> >
> > for(i in 1:3){
> >
> > myData[[i]] <- matrix(rnorm(100), 10, 10)
> >
> > }
> >
> >
> >
> > ### mean over all cells
> >
> > sapply(myData, function(x) mean(x))
> >
> >
> >
> > ### mean over all columns
> >
> > sapply(myData, function(x) colMeans(x))
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *From:* lily li [mailto:chocol...@gmail.com]
> > *Sent:* Tuesday, May 09, 2017 10:44 AM
> > *To:* Doran, Harold 
> > *Cc:* R mailing list 
> > *Subject:* Re: [R] About calculating average values from several matrices
>
>
> >
> >
> >
> > I'm trying to get a new dataframe or whatever to call, which has the same
> > structure with each file as listed above. For each cell in the new
> > dataframe or the new file, it is the average value from former dataframes
> > at the same location. Thanks.
> >
> >
> >
> > On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:
> >
> > Are you trying to take the mean over all cells, or over rows/columns
> > within each dataframe. Also, are these different dataframes stored
> within a
> > list or are they standalone?
> >
> >
> >
> >
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
> > Sent: Tuesday, May 09, 2017 10:39 AM
> > To: R mailing list 
>
> > Subject: [R] About calculating average values from several matrices
> >
> > Hi R users,
> >
> > I have a question about manipulating the data.
> > For example, there are several such data frames or matrices, and I want
> to
> > calculate the average value from all the data frames or matrices. How to
> do
> > it? Also, should I convert them to data frame or matrix first? Right now,
> > when I use typeof() function, each one is a list.
> >
> > file1
> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>  nov
> >
> > app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2
>  1.1
> > app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8
>  1.8
> > app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2
>  4.1
> >
> > file2
> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>  nov
> >
> > app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5
>  1.6
> > app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2
>  1.4
> > app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3
>  4.2
> >
> > file3 has the similar structure and values...
> >
> > There are eight such files, and when I use the function mean(file1,
> file2,
> > file3, ..., file8), it returns the error below. Thanks for your help.
> >
> > Warning message:
> > In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
> >   argument is not numeric or logical: returning NA
> >
> > [[alternative HTML version deleted]]
> >
> > __
>
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> 

Re: [R] About calculating average values from several matrices

2017-05-09 Thread Doran, Harold
?round


From: lily li [mailto:chocol...@gmail.com]
Sent: Tuesday, May 09, 2017 11:10 AM
To: Charles Determan 
Cc: Doran, Harold ; R mailing list 
Subject: Re: [R] About calculating average values from several matrices

Thanks very much, it works. But how to round the values to have only 1 decimal 
digit or 2 decimal digits? I think by dividing, the values are double type now. 
Thanks again.


On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
> wrote:
If you want the mean of each element across you list of matrices the following 
should provide what you are looking for where Reduce sums all your matrix 
elements across matrices and the simply divided my the number of matrices for 
the element-wise mean.

Reduce(`+`, mylist)/length(mylist)
Regards,
Charles

On Tue, May 9, 2017 at 9:52 AM, lily li 
> wrote:
I meant for each cell, it takes the average from other dataframes at the
same cell. I don't know how to deal with row names and col names though, so
it has the error message.

On Tue, May 9, 2017 at 8:50 AM, Doran, Harold 
> wrote:

> It’s not clear to me what your actual structure is. Can you provide
> str(object)? Assuming it is a list, and you want the mean over all cells or
> columns, you might want like this:
>
>
>
> myData <- vector("list", 3)
>
>
>
> for(i in 1:3){
>
> myData[[i]] <- matrix(rnorm(100), 10, 10)
>
> }
>
>
>
> ### mean over all cells
>
> sapply(myData, function(x) mean(x))
>
>
>
> ### mean over all columns
>
> sapply(myData, function(x) colMeans(x))
>
>
>
>
>
>
>
>
>
>
>
> *From:* lily li [mailto:chocol...@gmail.com]
> *Sent:* Tuesday, May 09, 2017 10:44 AM
> *To:* Doran, Harold >
> *Cc:* R mailing list >
> *Subject:* Re: [R] About calculating average values from several matrices

>
>
>
> I'm trying to get a new dataframe or whatever to call, which has the same
> structure with each file as listed above. For each cell in the new
> dataframe or the new file, it is the average value from former dataframes
> at the same location. Thanks.
>
>
>
> On Tue, May 9, 2017 at 8:41 AM, Doran, Harold 
> > wrote:
>
> Are you trying to take the mean over all cells, or over rows/columns
> within each dataframe. Also, are these different dataframes stored within a
> list or are they standalone?
>
>
>
>
> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf Of lily li
> Sent: Tuesday, May 09, 2017 10:39 AM
> To: R mailing list >
> Subject: [R] About calculating average values from several matrices
>
> Hi R users,
>
> I have a question about manipulating the data.
> For example, there are several such data frames or matrices, and I want to
> calculate the average value from all the data frames or matrices. How to do
> it? Also, should I convert them to data frame or matrix first? Right now,
> when I use typeof() function, each one is a list.
>
> file1
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
> app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
> app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1
>
> file2
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
> app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
> app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2
>
> file3 has the similar structure and values...
>
> There are eight such files, and when I use the function mean(file1, file2,
> file3, ..., file8), it returns the error below. Thanks for your help.
>
> Warning message:
> In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>   argument is not numeric or logical: returning NA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] About calculating average values from several matrices

2017-05-09 Thread Charles Determan
Just call 'round' on your results then at your desired number of digits.

On Tue, May 9, 2017 at 10:09 AM, lily li  wrote:

> Thanks very much, it works. But how to round the values to have only 1
> decimal digit or 2 decimal digits? I think by dividing, the values are
> double type now. Thanks again.
>
>
> On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
> wrote:
>
>> If you want the mean of each element across you list of matrices the
>> following should provide what you are looking for where Reduce sums all
>> your matrix elements across matrices and the simply divided my the number
>> of matrices for the element-wise mean.
>>
>> Reduce(`+`, mylist)/length(mylist)
>>
>> Regards,
>> Charles
>>
>> On Tue, May 9, 2017 at 9:52 AM, lily li  wrote:
>>
>>> I meant for each cell, it takes the average from other dataframes at the
>>> same cell. I don't know how to deal with row names and col names though,
>>> so
>>> it has the error message.
>>>
>>> On Tue, May 9, 2017 at 8:50 AM, Doran, Harold  wrote:
>>>
>>> > It’s not clear to me what your actual structure is. Can you provide
>>> > str(object)? Assuming it is a list, and you want the mean over all
>>> cells or
>>> > columns, you might want like this:
>>> >
>>> >
>>> >
>>> > myData <- vector("list", 3)
>>> >
>>> >
>>> >
>>> > for(i in 1:3){
>>> >
>>> > myData[[i]] <- matrix(rnorm(100), 10, 10)
>>> >
>>> > }
>>> >
>>> >
>>> >
>>> > ### mean over all cells
>>> >
>>> > sapply(myData, function(x) mean(x))
>>> >
>>> >
>>> >
>>> > ### mean over all columns
>>> >
>>> > sapply(myData, function(x) colMeans(x))
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > *From:* lily li [mailto:chocol...@gmail.com]
>>> > *Sent:* Tuesday, May 09, 2017 10:44 AM
>>> > *To:* Doran, Harold 
>>> > *Cc:* R mailing list 
>>> > *Subject:* Re: [R] About calculating average values from several
>>> matrices
>>>
>>> >
>>> >
>>> >
>>> > I'm trying to get a new dataframe or whatever to call, which has the
>>> same
>>> > structure with each file as listed above. For each cell in the new
>>> > dataframe or the new file, it is the average value from former
>>> dataframes
>>> > at the same location. Thanks.
>>> >
>>> >
>>> >
>>> > On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:
>>> >
>>> > Are you trying to take the mean over all cells, or over rows/columns
>>> > within each dataframe. Also, are these different dataframes stored
>>> within a
>>> > list or are they standalone?
>>> >
>>> >
>>> >
>>> >
>>> > -Original Message-
>>> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily
>>> li
>>> > Sent: Tuesday, May 09, 2017 10:39 AM
>>> > To: R mailing list 
>>> > Subject: [R] About calculating average values from several matrices
>>> >
>>> > Hi R users,
>>> >
>>> > I have a question about manipulating the data.
>>> > For example, there are several such data frames or matrices, and I
>>> want to
>>> > calculate the average value from all the data frames or matrices. How
>>> to do
>>> > it? Also, should I convert them to data frame or matrix first? Right
>>> now,
>>> > when I use typeof() function, each one is a list.
>>> >
>>> > file1
>>> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>>>  nov
>>> >
>>> > app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2
>>>  1.1
>>> > app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8
>>>  1.8
>>> > app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2
>>>  4.1
>>> >
>>> > file2
>>> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>>>  nov
>>> >
>>> > app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5
>>>  1.6
>>> > app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2
>>>  1.4
>>> > app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3
>>>  4.2
>>> >
>>> > file3 has the similar structure and values...
>>> >
>>> > There are eight such files, and when I use the function mean(file1,
>>> file2,
>>> > file3, ..., file8), it returns the error below. Thanks for your help.
>>> >
>>> > Warning message:
>>> > In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>>> >   argument is not numeric or logical: returning NA
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > __
>>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide http://www.R-project.org/
>>> > posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To 

Re: [R] About calculating average values from several matrices

2017-05-09 Thread lily li
Thanks very much, it works. But how to round the values to have only 1
decimal digit or 2 decimal digits? I think by dividing, the values are
double type now. Thanks again.


On Tue, May 9, 2017 at 9:04 AM, Charles Determan 
wrote:

> If you want the mean of each element across you list of matrices the
> following should provide what you are looking for where Reduce sums all
> your matrix elements across matrices and the simply divided my the number
> of matrices for the element-wise mean.
>
> Reduce(`+`, mylist)/length(mylist)
>
> Regards,
> Charles
>
> On Tue, May 9, 2017 at 9:52 AM, lily li  wrote:
>
>> I meant for each cell, it takes the average from other dataframes at the
>> same cell. I don't know how to deal with row names and col names though,
>> so
>> it has the error message.
>>
>> On Tue, May 9, 2017 at 8:50 AM, Doran, Harold  wrote:
>>
>> > It’s not clear to me what your actual structure is. Can you provide
>> > str(object)? Assuming it is a list, and you want the mean over all
>> cells or
>> > columns, you might want like this:
>> >
>> >
>> >
>> > myData <- vector("list", 3)
>> >
>> >
>> >
>> > for(i in 1:3){
>> >
>> > myData[[i]] <- matrix(rnorm(100), 10, 10)
>> >
>> > }
>> >
>> >
>> >
>> > ### mean over all cells
>> >
>> > sapply(myData, function(x) mean(x))
>> >
>> >
>> >
>> > ### mean over all columns
>> >
>> > sapply(myData, function(x) colMeans(x))
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > *From:* lily li [mailto:chocol...@gmail.com]
>> > *Sent:* Tuesday, May 09, 2017 10:44 AM
>> > *To:* Doran, Harold 
>> > *Cc:* R mailing list 
>> > *Subject:* Re: [R] About calculating average values from several
>> matrices
>>
>> >
>> >
>> >
>> > I'm trying to get a new dataframe or whatever to call, which has the
>> same
>> > structure with each file as listed above. For each cell in the new
>> > dataframe or the new file, it is the average value from former
>> dataframes
>> > at the same location. Thanks.
>> >
>> >
>> >
>> > On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:
>> >
>> > Are you trying to take the mean over all cells, or over rows/columns
>> > within each dataframe. Also, are these different dataframes stored
>> within a
>> > list or are they standalone?
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
>> > Sent: Tuesday, May 09, 2017 10:39 AM
>> > To: R mailing list 
>> > Subject: [R] About calculating average values from several matrices
>> >
>> > Hi R users,
>> >
>> > I have a question about manipulating the data.
>> > For example, there are several such data frames or matrices, and I want
>> to
>> > calculate the average value from all the data frames or matrices. How
>> to do
>> > it? Also, should I convert them to data frame or matrix first? Right
>> now,
>> > when I use typeof() function, each one is a list.
>> >
>> > file1
>> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>>  nov
>> >
>> > app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2
>>  1.1
>> > app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8
>>  1.8
>> > app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2
>>  4.1
>> >
>> > file2
>> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>>  nov
>> >
>> > app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5
>>  1.6
>> > app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2
>>  1.4
>> > app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3
>>  4.2
>> >
>> > file3 has the similar structure and values...
>> >
>> > There are eight such files, and when I use the function mean(file1,
>> file2,
>> > file3, ..., file8), it returns the error below. Thanks for your help.
>> >
>> > Warning message:
>> > In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>> >   argument is not numeric or logical: returning NA
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]


Re: [R] About calculating average values from several matrices

2017-05-09 Thread Charles Determan
If you want the mean of each element across you list of matrices the
following should provide what you are looking for where Reduce sums all
your matrix elements across matrices and the simply divided my the number
of matrices for the element-wise mean.

Reduce(`+`, mylist)/length(mylist)

Regards,
Charles

On Tue, May 9, 2017 at 9:52 AM, lily li  wrote:

> I meant for each cell, it takes the average from other dataframes at the
> same cell. I don't know how to deal with row names and col names though, so
> it has the error message.
>
> On Tue, May 9, 2017 at 8:50 AM, Doran, Harold  wrote:
>
> > It’s not clear to me what your actual structure is. Can you provide
> > str(object)? Assuming it is a list, and you want the mean over all cells
> or
> > columns, you might want like this:
> >
> >
> >
> > myData <- vector("list", 3)
> >
> >
> >
> > for(i in 1:3){
> >
> > myData[[i]] <- matrix(rnorm(100), 10, 10)
> >
> > }
> >
> >
> >
> > ### mean over all cells
> >
> > sapply(myData, function(x) mean(x))
> >
> >
> >
> > ### mean over all columns
> >
> > sapply(myData, function(x) colMeans(x))
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *From:* lily li [mailto:chocol...@gmail.com]
> > *Sent:* Tuesday, May 09, 2017 10:44 AM
> > *To:* Doran, Harold 
> > *Cc:* R mailing list 
> > *Subject:* Re: [R] About calculating average values from several matrices
> >
> >
> >
> > I'm trying to get a new dataframe or whatever to call, which has the same
> > structure with each file as listed above. For each cell in the new
> > dataframe or the new file, it is the average value from former dataframes
> > at the same location. Thanks.
> >
> >
> >
> > On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:
> >
> > Are you trying to take the mean over all cells, or over rows/columns
> > within each dataframe. Also, are these different dataframes stored
> within a
> > list or are they standalone?
> >
> >
> >
> >
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
> > Sent: Tuesday, May 09, 2017 10:39 AM
> > To: R mailing list 
> > Subject: [R] About calculating average values from several matrices
> >
> > Hi R users,
> >
> > I have a question about manipulating the data.
> > For example, there are several such data frames or matrices, and I want
> to
> > calculate the average value from all the data frames or matrices. How to
> do
> > it? Also, should I convert them to data frame or matrix first? Right now,
> > when I use typeof() function, each one is a list.
> >
> > file1
> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>  nov
> >
> > app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2
>  1.1
> > app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8
>  1.8
> > app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2
>  4.1
> >
> > file2
> > jan   feb   mar   apr   may   jun   jul   aug   sep   oct
>  nov
> >
> > app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5
>  1.6
> > app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2
>  1.4
> > app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3
>  4.2
> >
> > file3 has the similar structure and values...
> >
> > There are eight such files, and when I use the function mean(file1,
> file2,
> > file3, ..., file8), it returns the error below. Thanks for your help.
> >
> > Warning message:
> > In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
> >   argument is not numeric or logical: returning NA
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About calculating average values from several matrices

2017-05-09 Thread lily li
I meant for each cell, it takes the average from other dataframes at the
same cell. I don't know how to deal with row names and col names though, so
it has the error message.

On Tue, May 9, 2017 at 8:50 AM, Doran, Harold  wrote:

> It’s not clear to me what your actual structure is. Can you provide
> str(object)? Assuming it is a list, and you want the mean over all cells or
> columns, you might want like this:
>
>
>
> myData <- vector("list", 3)
>
>
>
> for(i in 1:3){
>
> myData[[i]] <- matrix(rnorm(100), 10, 10)
>
> }
>
>
>
> ### mean over all cells
>
> sapply(myData, function(x) mean(x))
>
>
>
> ### mean over all columns
>
> sapply(myData, function(x) colMeans(x))
>
>
>
>
>
>
>
>
>
>
>
> *From:* lily li [mailto:chocol...@gmail.com]
> *Sent:* Tuesday, May 09, 2017 10:44 AM
> *To:* Doran, Harold 
> *Cc:* R mailing list 
> *Subject:* Re: [R] About calculating average values from several matrices
>
>
>
> I'm trying to get a new dataframe or whatever to call, which has the same
> structure with each file as listed above. For each cell in the new
> dataframe or the new file, it is the average value from former dataframes
> at the same location. Thanks.
>
>
>
> On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:
>
> Are you trying to take the mean over all cells, or over rows/columns
> within each dataframe. Also, are these different dataframes stored within a
> list or are they standalone?
>
>
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
> Sent: Tuesday, May 09, 2017 10:39 AM
> To: R mailing list 
> Subject: [R] About calculating average values from several matrices
>
> Hi R users,
>
> I have a question about manipulating the data.
> For example, there are several such data frames or matrices, and I want to
> calculate the average value from all the data frames or matrices. How to do
> it? Also, should I convert them to data frame or matrix first? Right now,
> when I use typeof() function, each one is a list.
>
> file1
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
> app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
> app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1
>
> file2
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
> app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
> app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2
>
> file3 has the similar structure and values...
>
> There are eight such files, and when I use the function mean(file1, file2,
> file3, ..., file8), it returns the error below. Thanks for your help.
>
> Warning message:
> In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>   argument is not numeric or logical: returning NA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About calculating average values from several matrices

2017-05-09 Thread Doran, Harold
It’s not clear to me what your actual structure is. Can you provide 
str(object)? Assuming it is a list, and you want the mean over all cells or 
columns, you might want like this:

myData <- vector("list", 3)

for(i in 1:3){
myData[[i]] <- matrix(rnorm(100), 10, 10)
}

### mean over all cells
sapply(myData, function(x) mean(x))

### mean over all columns
sapply(myData, function(x) colMeans(x))





From: lily li [mailto:chocol...@gmail.com]
Sent: Tuesday, May 09, 2017 10:44 AM
To: Doran, Harold 
Cc: R mailing list 
Subject: Re: [R] About calculating average values from several matrices

I'm trying to get a new dataframe or whatever to call, which has the same 
structure with each file as listed above. For each cell in the new dataframe or 
the new file, it is the average value from former dataframes at the same 
location. Thanks.

On Tue, May 9, 2017 at 8:41 AM, Doran, Harold 
> wrote:
Are you trying to take the mean over all cells, or over rows/columns within 
each dataframe. Also, are these different dataframes stored within a list or 
are they standalone?



-Original Message-
From: R-help 
[mailto:r-help-boun...@r-project.org] On 
Behalf Of lily li
Sent: Tuesday, May 09, 2017 10:39 AM
To: R mailing list >
Subject: [R] About calculating average values from several matrices

Hi R users,

I have a question about manipulating the data.
For example, there are several such data frames or matrices, and I want to 
calculate the average value from all the data frames or matrices. How to do it? 
Also, should I convert them to data frame or matrix first? Right now, when I 
use typeof() function, each one is a list.

file1
jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov

app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1

file2
jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov

app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2

file3 has the similar structure and values...

There are eight such files, and when I use the function mean(file1, file2, 
file3, ..., file8), it returns the error below. Thanks for your help.

Warning message:
In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
  argument is not numeric or logical: returning NA
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About calculating average values from several matrices

2017-05-09 Thread lily li
I'm trying to get a new dataframe or whatever to call, which has the same
structure with each file as listed above. For each cell in the new
dataframe or the new file, it is the average value from former dataframes
at the same location. Thanks.

On Tue, May 9, 2017 at 8:41 AM, Doran, Harold  wrote:

> Are you trying to take the mean over all cells, or over rows/columns
> within each dataframe. Also, are these different dataframes stored within a
> list or are they standalone?
>
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
> Sent: Tuesday, May 09, 2017 10:39 AM
> To: R mailing list 
> Subject: [R] About calculating average values from several matrices
>
> Hi R users,
>
> I have a question about manipulating the data.
> For example, there are several such data frames or matrices, and I want to
> calculate the average value from all the data frames or matrices. How to do
> it? Also, should I convert them to data frame or matrix first? Right now,
> when I use typeof() function, each one is a list.
>
> file1
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
> app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
> app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1
>
> file2
> jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov
>
> app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
> app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
> app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2
>
> file3 has the similar structure and values...
>
> There are eight such files, and when I use the function mean(file1, file2,
> file3, ..., file8), it returns the error below. Thanks for your help.
>
> Warning message:
> In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
>   argument is not numeric or logical: returning NA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About calculating average values from several matrices

2017-05-09 Thread Doran, Harold
Are you trying to take the mean over all cells, or over rows/columns within 
each dataframe. Also, are these different dataframes stored within a list or 
are they standalone?



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of lily li
Sent: Tuesday, May 09, 2017 10:39 AM
To: R mailing list 
Subject: [R] About calculating average values from several matrices

Hi R users,

I have a question about manipulating the data.
For example, there are several such data frames or matrices, and I want to 
calculate the average value from all the data frames or matrices. How to do it? 
Also, should I convert them to data frame or matrix first? Right now, when I 
use typeof() function, each one is a list.

file1
jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov

app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1

file2
jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov

app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2

file3 has the similar structure and values...

There are eight such files, and when I use the function mean(file1, file2, 
file3, ..., file8), it returns the error below. Thanks for your help.

Warning message:
In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
  argument is not numeric or logical: returning NA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About calculating average values from several matrices

2017-05-09 Thread lily li
Hi R users,

I have a question about manipulating the data.
For example, there are several such data frames or matrices, and I want to
calculate the average value from all the data frames or matrices. How to do
it? Also, should I convert them to data frame or matrix first? Right now,
when I use typeof() function, each one is a list.

file1
jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov

app1   1.1   1.20.80.9   1.31.5   2.2   3.2   3.01.2   1.1
app2   3.1   3.22.82.5   2.32.5   3.2   3.0   2.91.8   1.8
app3   5.1   5.23.84.9   5.35.5   5.2   4.2   5.04.2   4.1

file2
jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov

app1   1.9   1.50.50.9   1.21.8   2.5   3.7   3.21.5   1.6
app2   3.5   3.72.32.2   2.52.0   3.6   3.2   2.81.2   1.4
app3   5.5   5.03.54.4   5.45.6   5.3   4.4   5.24.3   4.2

file3 has the similar structure and values...

There are eight such files, and when I use the function mean(file1, file2,
file3, ..., file8), it returns the error below. Thanks for your help.

Warning message:
In mean.default(file1, file2, file3, file4, file5, file6, file7,  :
  argument is not numeric or logical: returning NA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading edited functions already in saved workspace automatically

2017-05-09 Thread Ralf Goertz
Am Sat, 6 May 2017 11:17:42 -0400
schrieb Michael Friendly :

> On 5/5/2017 10:23 AM, Ralf Goertz wrote:
> > Am Fri, 05 May 2017 07:14:36 -0700
> > schrieb Jeff Newmiller :
> >  
> >> R normally prompts you to save .RData, but it just automatically
> >> saves .Rhistory... the two are unrelated.  
> >
> > Not here. If I say "n" to the prompted question "Save workspace
> > image? [y/n/c]: " my history doesn't get saved.
> >
> > Version:
> >
> > R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
> > Copyright (C) 2016 The R Foundation for Statistical Computing
> > Platform: x86_64-suse-linux-gnu (64-bit)
> >  
> 
> On Windoze, here's what I use in my .Rprofile, which runs every time
> I start an RGUI coonsole.  The key is .First & .Last to load/save
> history automagically.

Hi Michael,

thanks. This helps with saving the history without saving the data. But
actually I'd really like to save both and still be able to load
functions automatically from .Rprofile. Not saving the data as Jeff
suggested is not a good option because it is sometimes tedious to
rebuild my environment by reexecuting commands in the history. And I
explained in my OP why I can't use .First() to achieve my goal.

But let me try again to explain the problem because I think not
everybody understood what I was trying to say. For simplicity I use the
plain variable "a" instead of a function. Start a fresh session and
remove all variables, define one variable and quit with saving:

> rm(list=ls())
> a=17
> quit(save="yes")

Now, before opening a new session edit .Rprofile such that it contains
just the two lines:

print("Hello from .Rprofile")
a=42

Start a new session where your saved environment will be loaded.
Observe that you see the line 

[1] "Hello from .Rprofile"

proving that the commands in .Rprofile have been executed. Now look at
"a":

> a
[1] 17


You would expect to see this because *after* your "Hello" line you find

[Previously saved workspace restored]

So you have set "a" to 42 in .Rprofile but it gets overwritten from the
previously saved and now restored workspace. On the other hand, .First()
gets executed after the restoring of the workspace. Therefore, I could
edit .Rprofile to read

.First=function(){ assign("a",42,pos=1) }

Now, after starting I see that "a" is indeed 42. But then it turns out
that from now on I need "a" to be 11. After editing .Rprofile
accordingly, I am quite hopeful but after starting a new session I see
that "a" is still 42. Why is that? Because .First() was saved and when I
started a new session it got a new function body (setting "a" to 11) but
before it could be executed it was again overwritten by the old value
(setting "a" to 42) and I am chasing my own tail. Sigh.

.Last() doesn't help. Apparently (at least on my linux system) it is
executed *after* saving the environment so too late to remove anything
you don't want saved. In that regard linux doesn't seem to be typical,
since in "?.Last" the reverse order is described as typical:

 Exactly what happens at termination of an R session depends on the
 platform and GUI interface in use.  A typical sequence is to run
 ‘.Last()’ and ‘.Last.sys()’ (unless ‘runLast’ is false), to save
 the workspace if requested (and in most cases also to save the
 session history: see ‘savehistory’), then run any finalizers (see
 ‘reg.finalizer’) that have been set to be run on exit, close all
 open graphics devices, remove the session temporary directory and
 print any remaining warnings (e.g., from ‘.Last()’ and device
 closure).


IMHO this is a design flaw.

Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Antwort: Re: Factors and Alternatives (SOLVED)

2017-05-09 Thread G . Maubach
Hi David,
Hi Bob,

many thanks for your help.

Your solution - just to use all levels instead of just the one's found in 
the data - helped.

The original code looked like this:

-- cut --

c_v10_val_labs <- c(
  "1 = sehr gut",
  "2", "3", "4", "5",
  "6 = sehr schlecht"
)

# where c_v10_val_labs is handed over to my function as "val_labs".

  ds_results$value <- factor(ds_results$value,
 levels = sort(unique(ds_results$value)),  # 
old code
 labels = sort(unique(val_labs)))

-- cut --

If I write instead

-- cut --

  ds_results$value <- factor(ds_results$value,
 levels = seq_along(val_labs),  # new code 1st 
version
 labels = sort(unique(val_labs)))

-- cut --

Your solution builds a factor with all factor levels even if a value for 
factor is not present (not NA, but does just not occur in the data, i.e. 
not stated by any respondent).

In Zumel's book "Practical Data Science with R" (
https://www.amazon.de/Practical-Data-Science-Nina-Zumel/dp/1617291560), 
Shelter Island: Manning, 2014, p. 23-24, Listing 2-5, a mapping using 
subscripts is described:

-- cut --

mapping <- list(
'A40'='car (new)',
'A41'='car (used)',
'A42'='furniture/equipment',
'A43'='radio/television',
'A44'='domestic appliances',
...
)

for(i in 1:(dim(d))[2]) {
if(class(d[,i])=='character') {
d[,i] <- as.factor(as.character(mapping[d[,i]]))
}
}

-- cut -

Simple stated this would mean:

-- cut --

val_labs <- list(
  "1" = "1 = sehr gut",
  "2" = "2",
  "3" = "3",
  "4" = "4",
  "5" = "5",
  "6" = "6 = sehr schlecht"
)

set.seed(12345)
answers = c(sample(1:5, 10, replace = TRUE))

test <- factor(unlist(val_labs[answers]))

# or just

val_labs <- c(
  "1 = sehr gut",
  "2",
  "3",
  "4",
  "5",
  "6 = sehr schlecht"
)

set.seed(12345)
answers = c(sample(1:5, 10, replace = TRUE))

test <- val_labs[answers]

-- cut --

Adapting this to my code would give:

-- cut --

  ds_results$value <- factor(ds_results$value,
 levels = sort(unique(ds_results$value)),
 labels = 
val_labs[sort(unique(ds_results$value))])  # new code 2nd version

-- cut --

This results in a factor just as long as the vector of unique resulting 
values.

Both solutions work. Which version is best depends on the overall process 
and the purpose of the code. I document all this for use by readers who 
refer later to the list archives.

Using your version and running my code reveals that ggplot runs into 
difficulties cause the legend lacks values and the sequence and coloring 
of the legend is wrong. But that's another story.

Many thanks again for your help.

Kind regards

Georg




Von:David L Carlson 
An: "g.maub...@weinwolf.de" , "Bob O'Hara" 
, 
Kopie:  r-help 
Datum:  09.05.2017 14:37
Betreff:RE: [R] Antwort: Re:  Factors and Alternatives



I'm not sure I understand your question, but you can easily include all 
possible answers when you create the factor by using the levels= argument 
as Bob pointed out. Here is an example of values that range from 1 to 6, 
but value 3 is not represented. Notice that a factor level 3 is created 
even though it does not appear in the data:

> set.seed(42)
> x <- sample.int(6, 10, replace=TRUE)
> table(x)
x
1 2 4 5 6 
1 1 3 3 2 
> y <- factor(x, levels=1:6)
> y
 [1] 6 6 2 5 4 4 5 1 4 5
Levels: 1 2 3 4 5 6

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 13:58
Betreff:Re: Re: [R] Factors and Alternatives



For the problem you state, would it be enough to explicitly define your 
levels?

fac <- rep(c("a", "b", "d"), each=4)
fac.f <- factor(fac, levels=c("a", "b", "c", "d"))
table(fac.f)

# but be warned...
fac.f2 <- factor(fac.f)
table(fac.f2)

This has the advantage that the code explicitly documents what the
possible values are, so if something goes wrong down-stream, you know
it is a real problem (well, unless you have some type conversions
screwing things up). You might also want to do some defensive
programming, and put some checks in the code, to make sure your
factors have the right number of levels.

Bob

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of 
g.maub...@weinwolf.de
Sent: Tuesday, May 9, 2017 6:37 AM
To: Bob O'Hara 
Cc: r-help 
Subject: [R] Antwort: Re: Factors and Alternatives

Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 

I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose 

Re: [R] Antwort: Re: Factors and Alternatives

2017-05-09 Thread David L Carlson
I'm not sure I understand your question, but you can easily include all 
possible answers when you create the factor by using the levels= argument as 
Bob pointed out. Here is an example of values that range from 1 to 6, but value 
3 is not represented. Notice that a factor level 3 is created even though it 
does not appear in the data:

> set.seed(42)
> x <- sample.int(6, 10, replace=TRUE)
> table(x)
x
1 2 4 5 6 
1 1 3 3 2 
> y <- factor(x, levels=1:6)
> y
 [1] 6 6 2 5 4 4 5 1 4 5
Levels: 1 2 3 4 5 6

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of 
g.maub...@weinwolf.de
Sent: Tuesday, May 9, 2017 6:37 AM
To: Bob O'Hara 
Cc: r-help 
Subject: [R] Antwort: Re: Factors and Alternatives

Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 
I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose the original values of the variable. If I use the 
original variable for filtering and analysis I might happen that for some 
dimensions values were not selected. This means they are not NA but none 
of the respondents chose "4" for instance on a scale from 1 to 6. That 
means that creating a factor from the analysed data with the complete 
scale (1:6) fails due the different vector length (amount of remaining 
unique values in the analysis vs values in the scale). As I have a 
function doing the analysis I am looking for a way to make my function 
robust to such circumstances and be able to use it to analyse all "item 
batteries". Thus my question. I believe my findings are not odd. Maybe 
there is a way dealing with that kind of problems in R and I am eager to 
learn how it can be solved using R.

What would you suggest?

Kind regards

Georg




Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 12:26
Betreff:Re: [R] Factors and Alternatives



That's easy! First
> str(test3)
 Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1

tells you that the internal values are 1 and 2, and the labels are
"WITHOUT Contact" and "WITH Contact". If you read the help page for
factor() you'll see this:

levels: an optional vector of the values (as character strings) that
  ‘x’ might have taken.  The default is the unique set of
  values taken by ‘as.character(x)’, sorted into increasing
  order _of ‘x’_.  Note that this set can be specified as
  smaller than ‘sort(unique(x))’.

  labels: _either_ an optional character vector of (unique) labels for
  the levels (in the same order as ‘levels’ after removing
  those in ‘exclude’), _or_ a character string of length 1.

So, when you create test3 you say that test can take values 0 and 1,
and these should be labelled as "WITHOUT Contact" and "WITH Contact".
So R internally codes "1" as 1 and "0" as 2 (internally R codes
factors as integers, which can be both useful and dangerous), and then
gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
care that they were 1 and 0, because you've told it to change the
labels.

If you want to filter by the original values, then don't change the
labels (or at least not until after you've filtered by the original
labels), or convert the filter to the new labels. You're asking for a
data structure with two sets of labels, which sounds odd in general.

Bob

On 9 May 2017 at 12:12,   wrote:
> Hi All,
>
> I am using factors in a study for the social sciences.
>
> I discovered the following:
>
> -- cut --
>
> library(dplyr)
>
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
>
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
>
> test3 <- factor(test1,
> levels = c(0, 1),
> labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
>
> d_test1 %>% filter(test1 == 0)  # works OK
> d_test2 %>% filter(test2 == 0)  # works OK
> d_test3 %>% filter(test3 == 0)  # does not work, why?
>
> myf <- function(ds) {
>   print(levels(ds$test3))
>   print(labels(ds$test3))
>   print(as.numeric(ds$test3))
>   print(as.character(ds$test3))
> }
>
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
>
> -- cut --
>
> Why is it not possible to use a factor with labels for filtering with 
the
> original values?
> Is there a data structure that works like a factor but gives also access
> to the original values?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 

Re: [R] Factors and Alternatives

2017-05-09 Thread Bob O'Hara
For the problem you state, would it be enough to explicitly define your levels?

fac <- rep(c("a", "b", "d"), each=4)
fac.f <- factor(fac, levels=c("a", "b", "c", "d"))
table(fac.f)

# but be warned...
fac.f2 <- factor(fac.f)
table(fac.f2)

This has the advantage that the code explicitly documents what the
possible values are, so if something goes wrong down-stream, you know
it is a real problem (well, unless you have some type conversions
screwing things up). You might also want to do some defensive
programming, and put some checks in the code, to make sure your
factors have the right number of levels.

Bob

On 9 May 2017 at 13:36,   wrote:
> Hi Bob,
>
> many thanks for your reply.
>
> I have read the documentation. In my current project I use "item
> batteries" for dimensions of touchpoints which are rated by our customers.
> I wrote functions to analyse them. If I create a factor before filtering
> and analysing I lose the original values of the variable. If I use the
> original variable for filtering and analysis I might happen that for some
> dimensions values were not selected. This means they are not NA but none
> of the respondents chose "4" for instance on a scale from 1 to 6. That
> means that creating a factor from the analysed data with the complete
> scale (1:6) fails due the different vector length (amount of remaining
> unique values in the analysis vs values in the scale). As I have a
> function doing the analysis I am looking for a way to make my function
> robust to such circumstances and be able to use it to analyse all "item
> batteries". Thus my question. I believe my findings are not odd. Maybe
> there is a way dealing with that kind of problems in R and I am eager to
> learn how it can be solved using R.
>
> What would you suggest?
>
> Kind regards
>
> Georg
>
>
>
>
> Von:"Bob O'Hara" 
> An: g.maub...@weinwolf.de,
> Kopie:  r-help 
> Datum:  09.05.2017 12:26
> Betreff:Re: [R] Factors and Alternatives
>
>
>
> That's easy! First
>> str(test3)
>  Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1
>
> tells you that the internal values are 1 and 2, and the labels are
> "WITHOUT Contact" and "WITH Contact". If you read the help page for
> factor() you'll see this:
>
> levels: an optional vector of the values (as character strings) that
>   ‘x’ might have taken.  The default is the unique set of
>   values taken by ‘as.character(x)’, sorted into increasing
>   order _of ‘x’_.  Note that this set can be specified as
>   smaller than ‘sort(unique(x))’.
>
>   labels: _either_ an optional character vector of (unique) labels for
>   the levels (in the same order as ‘levels’ after removing
>   those in ‘exclude’), _or_ a character string of length 1.
>
> So, when you create test3 you say that test can take values 0 and 1,
> and these should be labelled as "WITHOUT Contact" and "WITH Contact".
> So R internally codes "1" as 1 and "0" as 2 (internally R codes
> factors as integers, which can be both useful and dangerous), and then
> gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
> care that they were 1 and 0, because you've told it to change the
> labels.
>
> If you want to filter by the original values, then don't change the
> labels (or at least not until after you've filtered by the original
> labels), or convert the filter to the new labels. You're asking for a
> data structure with two sets of labels, which sounds odd in general.
>
> Bob
>
> On 9 May 2017 at 12:12,   wrote:
>> Hi All,
>>
>> I am using factors in a study for the social sciences.
>>
>> I discovered the following:
>>
>> -- cut --
>>
>> library(dplyr)
>>
>> test1 <- c(rep(1, 4), rep(0, 6))
>> d_test1 <- data.frame(test)
>>
>> test2 <- factor(test1)
>> d_test2 <- data.frame(test2)
>>
>> test3 <- factor(test1,
>> levels = c(0, 1),
>> labels = c("WITHOUT Contact", "WITH Contact"))
>> d_test3 <- data.frame(test3)
>>
>> d_test1 %>% filter(test1 == 0)  # works OK
>> d_test2 %>% filter(test2 == 0)  # works OK
>> d_test3 %>% filter(test3 == 0)  # does not work, why?
>>
>> myf <- function(ds) {
>>   print(levels(ds$test3))
>>   print(labels(ds$test3))
>>   print(as.numeric(ds$test3))
>>   print(as.character(ds$test3))
>> }
>>
>> # This showsthat it is not possible to access the original
>> # values which were the basis to build the factor:
>> myf(d_test3)
>>
>> -- cut --
>>
>> Why is it not possible to use a factor with labels for filtering with
> the
>> original values?
>> Is there a data structure that works like a factor but gives also access
>> to the original values?
>>
>> Kind regards
>>
>> Georg
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> 

[R] Antwort: Re: Factors and Alternatives

2017-05-09 Thread G . Maubach
Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 
I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose the original values of the variable. If I use the 
original variable for filtering and analysis I might happen that for some 
dimensions values were not selected. This means they are not NA but none 
of the respondents chose "4" for instance on a scale from 1 to 6. That 
means that creating a factor from the analysed data with the complete 
scale (1:6) fails due the different vector length (amount of remaining 
unique values in the analysis vs values in the scale). As I have a 
function doing the analysis I am looking for a way to make my function 
robust to such circumstances and be able to use it to analyse all "item 
batteries". Thus my question. I believe my findings are not odd. Maybe 
there is a way dealing with that kind of problems in R and I am eager to 
learn how it can be solved using R.

What would you suggest?

Kind regards

Georg




Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 12:26
Betreff:Re: [R] Factors and Alternatives



That's easy! First
> str(test3)
 Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1

tells you that the internal values are 1 and 2, and the labels are
"WITHOUT Contact" and "WITH Contact". If you read the help page for
factor() you'll see this:

levels: an optional vector of the values (as character strings) that
  ‘x’ might have taken.  The default is the unique set of
  values taken by ‘as.character(x)’, sorted into increasing
  order _of ‘x’_.  Note that this set can be specified as
  smaller than ‘sort(unique(x))’.

  labels: _either_ an optional character vector of (unique) labels for
  the levels (in the same order as ‘levels’ after removing
  those in ‘exclude’), _or_ a character string of length 1.

So, when you create test3 you say that test can take values 0 and 1,
and these should be labelled as "WITHOUT Contact" and "WITH Contact".
So R internally codes "1" as 1 and "0" as 2 (internally R codes
factors as integers, which can be both useful and dangerous), and then
gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
care that they were 1 and 0, because you've told it to change the
labels.

If you want to filter by the original values, then don't change the
labels (or at least not until after you've filtered by the original
labels), or convert the filter to the new labels. You're asking for a
data structure with two sets of labels, which sounds odd in general.

Bob

On 9 May 2017 at 12:12,   wrote:
> Hi All,
>
> I am using factors in a study for the social sciences.
>
> I discovered the following:
>
> -- cut --
>
> library(dplyr)
>
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
>
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
>
> test3 <- factor(test1,
> levels = c(0, 1),
> labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
>
> d_test1 %>% filter(test1 == 0)  # works OK
> d_test2 %>% filter(test2 == 0)  # works OK
> d_test3 %>% filter(test3 == 0)  # does not work, why?
>
> myf <- function(ds) {
>   print(levels(ds$test3))
>   print(labels(ds$test3))
>   print(as.numeric(ds$test3))
>   print(as.character(ds$test3))
> }
>
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
>
> -- cut --
>
> Why is it not possible to use a factor with labels for filtering with 
the
> original values?
> Is there a data structure that works like a factor but gives also access
> to the original values?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Factors and Alternatives

2017-05-09 Thread Bob O'Hara
That's easy! First
> str(test3)
 Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1

tells you that the internal values are 1 and 2, and the labels are
"WITHOUT Contact" and "WITH Contact". If you read the help page for
factor() you'll see this:

levels: an optional vector of the values (as character strings) that
  ‘x’ might have taken.  The default is the unique set of
  values taken by ‘as.character(x)’, sorted into increasing
  order _of ‘x’_.  Note that this set can be specified as
  smaller than ‘sort(unique(x))’.

  labels: _either_ an optional character vector of (unique) labels for
  the levels (in the same order as ‘levels’ after removing
  those in ‘exclude’), _or_ a character string of length 1.

So, when you create test3 you say that test can take values 0 and 1,
and these should be labelled as "WITHOUT Contact" and "WITH Contact".
So R internally codes "1" as 1 and "0" as 2 (internally R codes
factors as integers, which can be both useful and dangerous), and then
gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
care that they were 1 and 0, because you've told it to change the
labels.

If you want to filter by the original values, then don't change the
labels (or at least not until after you've filtered by the original
labels), or convert the filter to the new labels. You're asking for a
data structure with two sets of labels, which sounds odd in general.

Bob

On 9 May 2017 at 12:12,   wrote:
> Hi All,
>
> I am using factors in a study for the social sciences.
>
> I discovered the following:
>
> -- cut --
>
> library(dplyr)
>
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
>
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
>
> test3 <- factor(test1,
> levels = c(0, 1),
> labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
>
> d_test1 %>% filter(test1 == 0)  # works OK
> d_test2 %>% filter(test2 == 0)  # works OK
> d_test3 %>% filter(test3 == 0)  # does not work, why?
>
> myf <- function(ds) {
>   print(levels(ds$test3))
>   print(labels(ds$test3))
>   print(as.numeric(ds$test3))
>   print(as.character(ds$test3))
> }
>
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
>
> -- cut --
>
> Why is it not possible to use a factor with labels for filtering with the
> original values?
> Is there a data structure that works like a factor but gives also access
> to the original values?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Factors and Alternatives

2017-05-09 Thread G . Maubach
Hi All,

I am using factors in a study for the social sciences.

I discovered the following:

-- cut --

library(dplyr)

test1 <- c(rep(1, 4), rep(0, 6))
d_test1 <- data.frame(test)

test2 <- factor(test1)
d_test2 <- data.frame(test2)

test3 <- factor(test1, 
levels = c(0, 1),
labels = c("WITHOUT Contact", "WITH Contact"))
d_test3 <- data.frame(test3)

d_test1 %>% filter(test1 == 0)  # works OK
d_test2 %>% filter(test2 == 0)  # works OK
d_test3 %>% filter(test3 == 0)  # does not work, why?

myf <- function(ds) {
  print(levels(ds$test3))
  print(labels(ds$test3))
  print(as.numeric(ds$test3))
  print(as.character(ds$test3))
}

# This showsthat it is not possible to access the original
# values which were the basis to build the factor:
myf(d_test3)

-- cut --

Why is it not possible to use a factor with labels for filtering with the 
original values?
Is there a data structure that works like a factor but gives also access 
to the original values?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Joining tables with different order and matched values

2017-05-09 Thread Ulrik Stervbo
Hi Abo,

Please keep the list in cc.

I think the function documentation is pretty straight forward - two
data.frames are required, and if you wish to keep elements that are not
present in both data.frames, you set the flag all = TRUE. You also have the
option to specify which columns to join by.

If you need more assistance with joining two data.frames, you should
provide a reproducible example, and if you have trouble with a function you
should provide an example of what you have tried so far.

Best wishes,
Ulrik



On Tue, 9 May 2017 at 10:00 abo dalash  wrote:

> Could you please teach me about the correct formation of the syntax?. I
> have read the help page and other online resources about inner,left,
> join but wasn't able to formulate the correct syntax.
>
>
> Sent from my Samsung device
>
>
>  Original message 
> From: Ulrik Stervbo 
> Date: 09/05/2017 7:42 a.m. (GMT+00:00)
> To: abo dalash , "r-help@R-project.org" <
> r-help@r-project.org>
> Subject: Re: [R] Joining tables with different order and matched values
>
> Hi Abo,
>
> ?merge
>
> or the join functions from dplyr.
>
> HTH
> Ulrik
>
> On Tue, 9 May 2017 at 06:44 abo dalash  wrote:
>
>> Hi All ..,
>>
>>
>> I have 2 tables and I'm trying to have some information from the 1st
>> table to appear in the second table with different order.
>>
>>
>> For Example, let's say this is my 1st table :-
>>
>>
>>
>> Drug name   indications
>>
>>  IbuprofenPain
>>
>>  Simvastatinhyperlipidemia
>>
>> losartan   hypertension
>>
>>
>>
>> my 2nd table is in different order for the 1st column :-
>>
>>
>> Drug name   indications
>>
>>
>> Simvastatin
>>
>> losartan
>>
>> Ibuprofen
>>
>> Metformin
>>
>>
>> I wish to see the indication of each drug in my 2nd table subsisted from
>> the information in my 1st table so the final table
>>
>> would be like this
>>
>>
>> Drug name   indications
>>
>>
>> Simvastatin hyperlipidemia
>>
>> losartan   hypertension
>>
>> Ibuprofen   pain
>>
>> MetforminN/A
>>
>>
>> I have been trying to use Sqldf package and right join function but not
>> able to formulate the correct syntax.
>>
>>
>> I'm also trying to identify rows contain at least one shared value  in a
>> dataset called 'Values":
>>
>>
>> >Values
>>
>> A B
>>
>> 1,2,5   3,8,7
>>
>> 2,4,6   7,6,3
>>
>>
>>
>> Columns A & B in the first row do not share any value while in the 2nd
>> row they have a single shared value which is 6.
>>
>> The result I wish to see :-
>>
>>
>> A B shared values
>>
>> 1,2,5   3,8,7 N/A
>>
>> 2,4,6   7,6,3   6
>>
>>
>> I tried this syntax : SharedValues <- Values$A == Values$B but this
>> returns logical results and what I wish to have
>>
>> is a new data frame including the new vector "shared values" showing the
>> information exactly as above.
>>
>>
>>
>>
>> Kind Regards
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Joining tables with different order and matched values

2017-05-09 Thread Ulrik Stervbo
Hi Abo,

?merge

or the join functions from dplyr.

HTH
Ulrik

On Tue, 9 May 2017 at 06:44 abo dalash  wrote:

> Hi All ..,
>
>
> I have 2 tables and I'm trying to have some information from the 1st table
> to appear in the second table with different order.
>
>
> For Example, let's say this is my 1st table :-
>
>
>
> Drug name   indications
>
>  IbuprofenPain
>
>  Simvastatinhyperlipidemia
>
> losartan   hypertension
>
>
>
> my 2nd table is in different order for the 1st column :-
>
>
> Drug name   indications
>
>
> Simvastatin
>
> losartan
>
> Ibuprofen
>
> Metformin
>
>
> I wish to see the indication of each drug in my 2nd table subsisted from
> the information in my 1st table so the final table
>
> would be like this
>
>
> Drug name   indications
>
>
> Simvastatin hyperlipidemia
>
> losartan   hypertension
>
> Ibuprofen   pain
>
> MetforminN/A
>
>
> I have been trying to use Sqldf package and right join function but not
> able to formulate the correct syntax.
>
>
> I'm also trying to identify rows contain at least one shared value  in a
> dataset called 'Values":
>
>
> >Values
>
> A B
>
> 1,2,5   3,8,7
>
> 2,4,6   7,6,3
>
>
>
> Columns A & B in the first row do not share any value while in the 2nd row
> they have a single shared value which is 6.
>
> The result I wish to see :-
>
>
> A B shared values
>
> 1,2,5   3,8,7 N/A
>
> 2,4,6   7,6,3   6
>
>
> I tried this syntax : SharedValues <- Values$A == Values$B but this
> returns logical results and what I wish to have
>
> is a new data frame including the new vector "shared values" showing the
> information exactly as above.
>
>
>
>
> Kind Regards
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing arguments to simple plotting program.

2017-05-09 Thread Ulrik Stervbo
HI Gerard,

You get the literals because the variables are not implicitly expanded -
'Placebo(N=n1)  ' is just a string indicating the N = n1.

What you want is to use paste() or paste0():
c(paste0("Placebo(N=", n1, ")"), paste0("Low Dose (N=", n2, ")"),
paste0("High Dose (N=", n3, ")"))
should do it.

I was taught a long ago that attach() should be avoided to avoid name
conflicts. Also, it makes it difficult to figure out which data is actually
being used.

HTH
Ulrik

On Tue, 9 May 2017 at 06:44 Gerard Smits  wrote:

> Hi All,
>
> I thought I’d try to get a function working instead of block copying code
> and editing. My backorund is more SAS, so using a SAS Macro would be easy,
> but not so lucky with R functions.
>
>
> R being used on Mac Sierra 10.12.4:
>
> R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
> Copyright (C) 2016 The R Foundation for Statistical Computing
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>
>
> resp<-read.csv("//users//gerard//gs//r_work//xyz.csv", header = TRUE)
>
> v5  <-subset(resp, subset=visit==5 & pp==1)
>
> plot_f1 <-function(n1,n2,n3) {
>   attach(v8)
>   par(oma=c(2,2,2,2))
>   boxplot(formula = d_comp ~ rx_grp,
>   main="Figure 2\nChange in Composite Score at Visit 5 (Day
> 31)\nPer Protocol Population",
>   ylim=c(-10,5),
>   names=c('Placebo(N=n1)  ',
>   'Low Dose(N=n2) ',
>   'High Dose(N=n3)'),
>   ylab='Change from Baseline')
>   abline(h=c(0), col="lightgray")
> }
>
> plot_f1(n1=114, n2=119, n3=116)
>
> The above is a simplified example where I am trying to pass 3 arguments,
> n1-n3, to be shown in the x-axis tables,  Instead of the numbers, I get the
> literal n1, n2, n3.
>
> Any help appreciated.
>
> Thanks,
>
> Gerard
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I use R to build a dictionary of proper nouns?

2017-05-09 Thread θ "

Hi Boris :
I'm very thanks for your reply and your suggestions.
In order to be clear show my workflow, I have added my code and document file 
in the attachment.
My research target is to get the topic technique of CMP (chemical mechanical 
polishing).
So I want to use related patent texts to do text mining.
Here are my ways for text mining process.
1.Tf-idf
2.CMP ontology
The CMP ontology is made by myself. It's used to build the dictionary and 
extract the proper nouns of CMP.
Here is my workflow to  build a dictionary of proper nouns:
1. Read the ontology file into R.
2. Extract proper nouns from the ontology.
3.Use tm package to do preprocessing:
(remove"_",  tolower,  stripWhitespace, stemDocument)
4. Build a dictionary of proper nouns.

Finally, I want to extract proper noun which shows in my patent documents 
(corpus_tm) and its frequency.

Thanks
Eva

寄件者: Boris Steipe 
寄件日期: 2017年5月8日 下午 05:09
收件者: θ "
副本: r-help@r-project.org
主旨: Re: [R] How do I use R to build a dictionary of proper nouns?

Your workflow is not clear to me, so I can't give any specific advice.

1: I don't understand what you need. Do you need the column names changed? They 
correspond to the matched
   words.

2: How was the vector dictionary_word created? These are (mostly) stemmed 
nouns, but some of them are two or even three words? Did you do this by hand? 
But this also contains "cmp" which is not a stemmed word, or "particl", or 
"recoveri" which is not correctly stemmed. This doesn't look promising, I think 
at least you will need to place hyphens between the words, but since you are 
using stemmed words this will be difficult.

3: Since the default tokenizer is "words", I think the two-word and three-word 
elements of the dictionary_word vector will not be found. They don't exist as 
tokens.

4: Don't use "list" as a variable name.

In summary - I think your problems have to do with stemming and tokenizing and 
not with formatting the output of DocumentTermMatrix(). I don't think tm has 
functions to produce stemmed multi-word tokens like the elements in your 
dictionary_word vector. You may need to do the analysis with your own 
functions, using regular expressions.


B.


> On May 8, 2017, at 3:56 AM, θ "  wrote:
>
> Hi Steipe:
> Thanks for your recommend.
> I have used the DocumentTermMatrix function of tm package to try.
> But I prefer the matrix result shows the frequency of the dictionary word.
> Is there any way to do?
> The following are my code and result:
>
> dictionary_word <- c("neutral", "abras particl", "acid", "apparatus", "back 
> film", "basic", "carrier", "chemic", "chromat confoc", "clean system", "cmp", 
> "compens type", "compress", "comsum", "control system", "down pressur", 
> "dresser condition", "detect system", "flow rate control", "fractal type", 
> "groov", "hard", "improv type", "infrar", "laser confoc", "layer", "measur 
> system", "micro stuctur", "monitor system", "multi layer", "none-por", 
> "nonwoven pad", "pad", "pad applic", "pad condit system", "pad materi", "pad 
> properti", "pad structur", "ph sensor", "planet type", "plate", "plat", 
> "poisson ratio", "polish head", "polish system", "polym pad", "polyurethan 
> pad", "porous", "process paramet", "process path", "process time", 
> "recoveri", "rotat speed", "rough", "scatter", "semiconductor cmp", "sensor", 
> "signal acceptor", "singl layer", "slurri", "slurri flow rate", "slurri ph 
> valu", "slurri stirrer", "slurri suppli system", "slurri temperatur", "slurri 
> weight percentag", "storag cmp", "stylus profil", "substrat cmp", "thick", 
> "transfer robot", "ultrason", "urethan pad", "wafer cassett", "wafer transfer 
> system", "white light interferomet", "young modulus")
>
> list<-inspect(DocumentTermMatrix(corpus_tm,
>  list(weighting =weightTf,
>   dictionary = dictionary_word)))
>
> 
>
>
> 寄件者: Boris Steipe 
> 寄件日期: 2017年5月5日 下午 04:39
> 收件者: θ "
> 副本: r-help@r-project.org
> 主旨: Re: [R] How do I use R to build a dictionary of proper nouns?
>
> Did you try using the table() function, possibly in combination with sort() 
> or rank()?
>
>
> Consider:
>
> myNouns <- c("proper", "nouns", "domain", "ontology", "dictionary",
>  "dictionary", "corpus", "patent", "files", "proper", "nouns",
>  "word", "frequency", "file", "preprocess", "corpus", "proper",
>  "nouns", "domain", "ontology", "idea", "nouns", "dictionary",
>  "dictionary", "corpus", "attachments", "texts", "corpus",
>  "preprocesses", "proper", "nouns")
>
> myNounFrequencies <- table(myNouns)
> myNounFrequencies
>
> myNounFrequencies <- sort(myNounFrequencies, decreasing = TRUE)
> myNounFrequencies
>
> which(names(myNounFrequencies) == "corpus")
>
>
>
>
>
> > On May 5, 2017, at 1:58 AM, θ "  wrote:
> >
> > θ "