Re: [R] NADA Package install disappearance

2013-05-23 Thread Prof Brian Ripley

On 22/05/2013 21:06, Rich Shepard wrote:

On Wed, 22 May 2013, rwillims wrote:


I have been using the NADA package to do some statistical analysis,
however I have just found that the package is no longer available for
install.  I've downloaded an older version ( NADA_1.5-4.tar.gz ) and
tried
to use install.packages to install it in two versions of R ( 3.0.0 and
2.15.1) and I have gotten the same error message for both: package
‘path/NADA_1.5-4.tar.gz’ is not available (for R version 2.15.1).


Rachel,

   I'm running R-3.0.0 and had no problems re-installing NADA from the
osuosl.org ftp server. I've no idea what version of NADA is installed.

   Have you tried another repository?

Rich



A good idea for things like this is to check the CRAN package web page: 
http://cran.r-project.org/package=NADA


That shows it is archived, and last updated over a year ago.

You should be able to download the source file, and

install.packages('NADA_1.5-4.tar.gz', repos=NULL)

worked for me in 3.0.1.

The package was archived because it was unmaintained and uses the 
long-obsolete \synopsis syntax that is being removed.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] convert a character string to a name

2013-05-23 Thread jpm miao
Hi,
   From time to time I need to do the aggregation. To illustrate, I present
a toy example as below. In this example, the task is to aggregate x and y
by z with the function mean.
   Could I call the aggregation function with x_test, where
   x_test=c(x,y)? Thanks

Miao


 dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2)
 dftest
x y z
1   1 1 1
2   2 2 0
3   3 3 1
4   4 0 0
5   5 1 1
6   6 2 0
7   7 3 1
8   8 0 0
9   9 1 1
10 10 2 0
11 11 3 1
12 12 0 0
 aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
  z x y
1 0 7 1
2 1 6 2
 x_test=c(x,y)
 aggregate(cbind(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z')
a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data =
dftest) :
  variable lengths differ (found for 'z')
 aggregate(factor(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z')

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding rows without loops

2013-05-23 Thread Blaser Nello
Merge should do the trick. How to best use it will depend on what you
want to do with the data after. 
The following is an example of what you could do. This will perform
best, if the rows are missing at random and do not cluster.

DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))

DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE)

while(any(is.na(DFm))){
  if (any(is.na(DFm[1,]))) stop(Complete first row required!)
  ind - which(is.na(DFm), arr.ind=TRUE)
  prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2)
  DFm[is.na(DFm)] - DFm[prind]
}
DFm

Best,
Nello

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Adeel Amin
Sent: Donnerstag, 23. Mai 2013 07:01
To: r-help@r-project.org
Subject: [R] adding rows without loops

I'm comparing a variety of datasets with over 4M rows.  I've solved this
problem 5 different ways using a for/while loop but the processing time
is murder (over 8 hours doing this row by row per data set).  As such
I'm trying to find whether this solution is possible without a loop or
one in which the processing time is much faster.

Each dataset is a time series as such:

DF1:

X.DATE X.TIME VALUE VALUE2
1 01052007   020037 29
2 01052007   030042 24
3 01052007   040045 28
4 01052007   050045 27
5 01052007   070045 35
6 01052007   080042 32
7 01052007   090045 32
...
...
...
n

DF2

X.DATE X.TIME VALUE VALUE2
1 01052007   020037 29
2 01052007   030042 24
3 01052007   040045 28
4 01052007   050045 27
5 01052007   060045 35
6 01052007   070042 32
7 01052007   080045 32

...
...
n+4000

In other words there are 4000 more rows in DF2 then DF1 thus the
datasets are of unequal length.

I'm trying to ensure that all dataframes have the same number of X.DATE
and X.TIME entries.  Where they are missing, I'd like to insert a new
row.

In the above example, when comparing DF2 to DF1, entry 01052007 0600
entry is missing in DF1.  The solution would add a row to DF1 at the
appropriate index.

so new dataframe would be


X.DATE X.TIME VALUE VALUE2
1 01052007   020037 29
2 01052007   030042 24
3 01052007   040045 28
4 01052007   050045 27
5 01052007   060045 27
6 01052007   070045 35
7 01052007   080042 32
8 01052007   090045 32

Value and Value2 would be the same as row 4.

Of course this is simple to accomplish using a row by row analysis but
with of 4M rows the processing time destroying and rebinding the
datasets is very time consuming and I believe highly un-R'ish.  What am
I missing?

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert a character string to a name

2013-05-23 Thread arun
 with(dftest,aggregate(cbind(x,y),list(z),FUN=mean))
#  Group.1 x y
#1   0 7 1
#2   1 6 2


#or
library(plyr)
ddply(dftest,.(z),numcolwise(mean))
#  z x y
#1 0 7 1
#2 1 6 2
A.K.



- Original Message -
From: jpm miao miao...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 3:05 AM
Subject: [R] convert a character string to a name

Hi,
   From time to time I need to do the aggregation. To illustrate, I present
a toy example as below. In this example, the task is to aggregate x and y
by z with the function mean.
   Could I call the aggregation function with x_test, where
   x_test=c(x,y)? Thanks

Miao


 dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2)
 dftest
    x y z
1   1 1 1
2   2 2 0
3   3 3 1
4   4 0 0
5   5 1 1
6   6 2 0
7   7 3 1
8   8 0 0
9   9 1 1
10 10 2 0
11 11 3 1
12 12 0 0
 aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
  z x y
1 0 7 1
2 1 6 2
 x_test=c(x,y)
 aggregate(cbind(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z')
a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data =
dftest) :
  variable lengths differ (found for 'z')
 aggregate(factor(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z')

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert a character string to a name

2013-05-23 Thread Blaser Nello
If you want to use the character string:

attach(dftest)
aggregate(cbind(sapply(x_test, get))~z, data=dftest, FUN=mean)
# or
with(dftest,aggregate(cbind(sapply(x_test, get)),list(z),FUN=mean))
detach(dftest)

Cheers,
Nello

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of arun
Sent: Donnerstag, 23. Mai 2013 09:19
To: jpm miao
Cc: R help
Subject: Re: [R] convert a character string to a name

 with(dftest,aggregate(cbind(x,y),list(z),FUN=mean))
#  Group.1 x y
#1   0 7 1
#2   1 6 2


#or
library(plyr)
ddply(dftest,.(z),numcolwise(mean))
#  z x y
#1 0 7 1
#2 1 6 2
A.K.



- Original Message -
From: jpm miao miao...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 3:05 AM
Subject: [R] convert a character string to a name

Hi,
   From time to time I need to do the aggregation. To illustrate, I present a 
toy example as below. In this example, the task is to aggregate x and y by z 
with the function mean.
   Could I call the aggregation function with x_test, where
   x_test=c(x,y)? Thanks

Miao


 dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest
    x y z
1   1 1 1
2   2 2 0
3   3 3 1
4   4 0 0
5   5 1 1
6   6 2 0
7   7 3 1
8   8 0 0
9   9 1 1
10 10 2 0
11 11 3 1
12 12 0 0
 aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
  z x y
1 0 7 1
2 1 6 2
 x_test=c(x,y)
 aggregate(cbind(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, 
data=dftest, FUN=mean) Error in model.frame.default(formula = 
cbind(factor(x_test)) ~ z, data =
dftest) :
  variable lengths differ (found for 'z')
 aggregate(factor(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z')

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordered and unordered variables

2013-05-23 Thread PIKAL Petr
Hi

Try to put your question on stackexchange. Or maybe it is already answered 
there. I am not an statistical expert but based on common sense (which can be 
counter intuitive sometimes) I will use ordered factor if I expect influence of 
tension value on breaks. Anyway I will probably consult more experienced people 
around or some textbook.

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of meng
 Sent: Thursday, May 23, 2013 4:44 AM
 To: Uwe Ligges
 Cc: R help
 Subject: Re: [R] ordered and unordered variables
 
 It's not homework.
 I met this question during my practical work via R.
 The boss is an expert of biology,but he doesn't know statistics.So I
 must find the right method to this work.
 
 
 
 
 
 
 
 
 
 
 
 
 At 2013-05-22 17:30:34,Uwe Ligges lig...@statistik.tu-dortmund.de
 wrote:
 
 
 On 22.05.2013 07:09, meng wrote:
  Thanks.
 
 
  As to the data  warpbreaks, if I want to analysis the impact of
 tension(L,M,H) on breaks, should I order the tension or not?
 
 No homework questions on this list, please ask your teacher.
 
 Best,
 Uwe Ligges
 
 
 
 
 
 
 
  Many thanks.
 
 
 
 
 
 
 
 
 
 
 
 
  At 2013-05-21 20:55:18,David Winsemius dwinsem...@comcast.net
 wrote:
 
  On May 20, 2013, at 10:35 PM, meng wrote:
 
  Hi all:
  If the explainary variables are ordinal,the result of regression
 is
  different from unordered variables.But I can't understand the
  result of regression from ordered variable.
 
  The data is warpbreaks,which belongs to R.
 
  If I use the unordered variable(tension):Levels: L M H The
 result
  is easy to understand:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)36.39   2.80  12.995   2e-16 ***
  tensionM  -10.00   3.96  -2.525 0.014717 *
  tensionH  -14.72   3.96  -3.718 0.000501 ***
 
  If I use the ordered variable(tension):Levels: L  M  H I don't
  know how to explain the result:
 Estimate Std. Error t value Pr(|t|)
  (Intercept)   28.148  1.617  17.410   2e-16 ***
  tension.L-10.410  2.800  -3.718 0.000501 ***
  tension.Q  2.155  2.800   0.769 0.445182
 
  What's tension.L and tension.Q stands for?And how to explain
 the result then?
 
  Ordered factors are handled by the R regression mechanism with
 orthogonal polynomial contrasts: .L for linear and .Q for
 quadratic. If the term had 4 levels there would also have been a .C
 (cubic) term. Treatment contrasts are used for unordered factors.
 Generally one would want to do predictions for explanations of the
 results. Trying to explain the individual coefficient values from
 polynomial contrasts is similar to and just as unproductive as trying
 to explain the individual coefficients involving interaction terms.
 
  --
 
  David Winsemius
  Alameda, CA, USA
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert a character string to a name

2013-05-23 Thread jim holtman
try this:

  dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2)
  aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
  z x y
1 0 7 1
2 1 6 2
   x_test=c(x,y)
   a - formula(paste0('cbind('
+ , x_test[1]
+ , ','
+ , x_test[2]
+ , ') ~ z'
+ ))
  a
cbind(x, y) ~ z
 aggregate(a, data = dftest, FUN = mean)
  z x y
1 0 7 1
2 1 6 2



On Thu, May 23, 2013 at 3:05 AM, jpm miao miao...@gmail.com wrote:

 Hi,
From time to time I need to do the aggregation. To illustrate, I present
 a toy example as below. In this example, the task is to aggregate x and y
 by z with the function mean.
Could I call the aggregation function with x_test, where
x_test=c(x,y)? Thanks

 Miao


  dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2)
  dftest
 x y z
 1   1 1 1
 2   2 2 0
 3   3 3 1
 4   4 0 0
 5   5 1 1
 6   6 2 0
 7   7 3 1
 8   8 0 0
 9   9 1 1
 10 10 2 0
 11 11 3 1
 12 12 0 0
  aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
   z x y
 1 0 7 1
 2 1 6 2
  x_test=c(x,y)
  aggregate(cbind(x_test)~z, data=dftest, FUN=mean)
 Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) :
   variable lengths differ (found for 'z')
 a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean)
 Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data =
 dftest) :
   variable lengths differ (found for 'z')
  aggregate(factor(x_test)~z, data=dftest, FUN=mean)
 Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) :
   variable lengths differ (found for 'z')

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert a character string to a name

2013-05-23 Thread arun
Sorry, didn't read your question properly

#Just a modification without attach():
 aggregate(cbind(sapply(x_test,get,dftest))~z,data=dftest,FUN=mean)
#  z x y
#1 0 7 1
#2 1 6 2

#if you need to aggregate() all the columns except the grouping column

 aggregate(.~z,data=dftest,FUN=mean)
# z x y
#1 0 7 1
#2 1 6 2
A.K.





- Original Message -
From: Blaser Nello nbla...@ispm.unibe.ch
To: arun smartpink...@yahoo.com; jpm miao miao...@gmail.com
Cc: R help r-help@r-project.org
Sent: Thursday, May 23, 2013 3:29 AM
Subject: RE: [R] convert a character string to a name

If you want to use the character string:

attach(dftest)
aggregate(cbind(sapply(x_test, get))~z, data=dftest, FUN=mean)
# or
with(dftest,aggregate(cbind(sapply(x_test, get)),list(z),FUN=mean))
detach(dftest)

Cheers,
Nello

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of arun
Sent: Donnerstag, 23. Mai 2013 09:19
To: jpm miao
Cc: R help
Subject: Re: [R] convert a character string to a name

 with(dftest,aggregate(cbind(x,y),list(z),FUN=mean))
#  Group.1 x y
#1   0 7 1
#2   1 6 2


#or
library(plyr)
ddply(dftest,.(z),numcolwise(mean))
#  z x y
#1 0 7 1
#2 1 6 2
A.K.



- Original Message -
From: jpm miao miao...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 3:05 AM
Subject: [R] convert a character string to a name

Hi,
   From time to time I need to do the aggregation. To illustrate, I present a 
toy example as below. In this example, the task is to aggregate x and y by z 
with the function mean.
   Could I call the aggregation function with x_test, where
   x_test=c(x,y)? Thanks

Miao


 dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2) dftest
    x y z
1   1 1 1
2   2 2 0
3   3 3 1
4   4 0 0
5   5 1 1
6   6 2 0
7   7 3 1
8   8 0 0
9   9 1 1
10 10 2 0
11 11 3 1
12 12 0 0
 aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
  z x y
1 0 7 1
2 1 6 2
 x_test=c(x,y)
 aggregate(cbind(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z') a1aggregate(cbind(factor(x_test))~z, 
data=dftest, FUN=mean) Error in model.frame.default(formula = 
cbind(factor(x_test)) ~ z, data =
dftest) :
  variable lengths differ (found for 'z')
 aggregate(factor(x_test)~z, data=dftest, FUN=mean)
Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) :
  variable lengths differ (found for 'z')

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Transform Coordinate System of an ASCII-Grid

2013-05-23 Thread jas
Dear all,


I have an ASCII-Grid for Switzerland in the Swiss National Coordinate System
of CH1903. Now for a Webapplication of the ASCII-Grid, I need to deliver the
ASCII-Grid in the WGS84 System.

Via coordinates(ascii) I can export the coordinates and convert them with
a formula into WGS84. My problem is now, how can I implement these into the
ASCII-Grid, so that the whole grid-structure is from now on gonna be saved
in the WGS84-coordinate format?
(important: I don't want to change the projection, I want to actually change
the numeric format of the coordinates)

Thank you so much for your help,
jas



--
View this message in context: 
http://r.789695.n4.nabble.com/Transform-Coordinate-System-of-an-ASCII-Grid-tp4667786.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC)

2013-05-23 Thread Viechtbauer Wolfgang (STAT)
The mean percentage change and the raw mean change are not directly comparable, 
even after standardization based on the SD of the percentage change or raw 
change values. So, I would not mix those in the same analysis.

Best,
Wolfgang

 -Original Message-
 From: Qiang Yue [mailto:qiangm...@gmail.com]
 Sent: Wednesday, May 22, 2013 20:38
 To: Viechtbauer Wolfgang (STAT); r-help
 Subject: Re: RE: [R] using metafor for meta-analysis of before-after
 studies (escalc, SMCC)
 
 Dear Dr. Viechtbauer:
 
 Thank you very much for sparing your precious time to answer my question.
 I still want to make sure for the third question below:  for studies which
 only reported percentage changes (something like: the metabolite
 concentration increased by 20%+/-5% after intervention), we can not use
 the percentage change to calculate SMCC, but have to get the raw change
 first?
 
 With best wishes.
 
 Qiang Yue
 
 From: Viechtbauer Wolfgang (STAT)
 Date: 2013-05-21 10:09
 To: Moon Qiang; r-help
 Subject: RE: [R] using metafor for meta-analysis of before-after studies
 (escalc, SMCC)
 Please see my answers below.
 
 Best,
 Wolfgang
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On Behalf Of Moon Qiang
  Sent: Thursday, May 16, 2013 19:12
  To: r-help
  Subject: [R] using metafor for meta-analysis of before-after studies
  (escalc, SMCC)
 
  Hello.
 
  I am trying to perform meta-analysis on some before-after studies. These
  studies are designed to clarify if there is any significant metabolic
  change before and after an intervention. There is only one group in thes
 e
  studies, i.e., no control group. I followed the e-mail communication of
  R-help (https://stat.ethz.ch/pipermail/r-help/2012-
 April/308946.html ) and
  the Metafor Manual (version 1.8-0, released 2013-04-
 11, relevant contents
  can be found on pages of 59-61 under 'Outcome Measures for Individual
  Groups '). I made a trial analysis and attached the output here, I wonde
 r
  if anyone can look through it and give me some comments.
   I have three questions about the analysis:
 
  1) Most studies reported the before-and-after raw change as Mean+/-
 SD, but
  few of them have reported the values of before-intervention (mean_r and
  sd_r) and the values of after-
 intervention (mean_s and sd_s), and none of
  them reported the r value (correlation for the before- and after-
  intervention measurements). Based on the guideline of the Metafor manual
 ,
  I
  set the raw mean change as m1i (i.e., raw mean change=mean_s=m1i), and s
 et
  the standard deviation of raw change as sd1i (i.e., the standard deviati
 on
  of raw change =sd_s=sd1i), and set all other arguments including m2i,
  sd2i,
  ri as 0, and then calculated the standardized mean change using change
  score (SMCC). I am not sure if all these settings are correct.
 
 This is correct. The escalc() function still will compute (m1i-
 m2i)/sqrt(sd1i^2 + sd2i^2 -
  2*ri*sd1i*sd2i), but since m2i=sd2i=ri=0, this is equivalent to mean_chan
 ge / SD_change, which is what you want.
 
 Make sure that mean_s is NOT the standard error (SE) of the change scores,
  but really the SD.
 
  2) A few studies have specified individual values of m1i, m2i, sd1i, sd2
 i
  ,
  but did not report the change score or its sd. So can I set r=0 and use
  these values to calculate SMCC? Since SMCC is not calculated in the same
  way like 1), will this be a problem?
 
 Yes, this will be a problem, since you now really assume that r=0, which i
 s not correct. Maybe you can back-
 calculate r from other information (e.g., the p or t value from a t-test -
 - see https://stat.ethz.ch/pipermail/r-help/2012-
 April/308946.html). Or you could try to get r from the authors (then you c
 ould also just directly ask for the change score mean and SD). If that is
 not successful, you will have to impute some kind of reasonable value for
 r and do a sensitivity analysis in the end.
 
  3) some studies reported the percentage mean changes instead of raw mean
  change (percentage change=(value of after-intervention - value of before
  intervention) / value of before intervention), I think it may not be the
  right way to simply substitute the raw mean change with the percentage
  mean
  changes. Is there any method to deal with this problem?
 
 Don't know anything off the top of my head.
 
  Any comments are welcome.
 
  With best regards.
    --
   Qiang Yue

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Merge

2013-05-23 Thread Keniajin Wambui
-- Forwarded message --
From: Keniajin Wambui kiang...@gmail.com
Date: Thu, May 23, 2013 at 11:36 AM
Subject: Merge
To: r-help@r-project.org


I am using R 3.01 on R Studio to merge two data sets with approx 120
variables and the other with 140 variables but with a serialno as the
unique identifier.
i.e

Serialno  name  year outcome
1 ken1989   d
2 mary 1989a
4 john1989   a
5 tom 1989   a
6 jolly   1989  d

and

Serialno  name   year   disch_type
11 mwai1990   d
21  wanjiku  1990a
43   maina1990   a
55john 1990   a
67 welly   1990   d

How can I merge them to a common data set without having name.x and
name.y or year.x and year.y after merging
--
Mega Six Solutions
Web Designer and Research Consultant
Kennedy Mwai
25475211786


--
Mega Six Solutions
Web Designer and Research Consultant
Kennedy Mwai
25475211786

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transform Coordinate System of an ASCII-Grid

2013-05-23 Thread Daisy Englert Duursma
Hello,

You question is a bit unclear. Do you just want to change to decimal
degrees? Can you please provide an example of your code and include a
small example ascii.

On Thu, May 23, 2013 at 5:44 PM, jas
jacqueline.schwei...@wuestundpartner.com wrote:
 Dear all,


 I have an ASCII-Grid for Switzerland in the Swiss National Coordinate System
 of CH1903. Now for a Webapplication of the ASCII-Grid, I need to deliver the
 ASCII-Grid in the WGS84 System.

 Via coordinates(ascii) I can export the coordinates and convert them with
 a formula into WGS84. My problem is now, how can I implement these into the
 ASCII-Grid, so that the whole grid-structure is from now on gonna be saved
 in the WGS84-coordinate format?
 (important: I don't want to change the projection, I want to actually change
 the numeric format of the coordinates)

 Thank you so much for your help,
 jas



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Transform-Coordinate-System-of-an-ASCII-Grid-tp4667786.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
Daisy Englert Duursma
Department of Biological Sciences
Room E8C156
Macquarie University, North Ryde, NSW 2109
Australia

Tel +61 2 9850 9256

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] point.in.polygon help

2013-05-23 Thread Daisy Englert Duursma
It would be useful to know what your ultimate goal is.

On Wed, May 22, 2013 at 6:29 AM, karengrace84 kgfis...@alumni.unc.edu wrote:
 I am new to mapping with R, and I would like to use the point.in.polygon
 function from the sp package, but I am unsure of how to get my data in the
 correct format for the function. The generic form of the function is as
 follows:

 point.in.polygon(point.x, point.y, pol.x, pol.y, mode.checked=FALSE)

 I have no problem with the point.x and point.y inputs. I have a list of gps
 longitudes and latitudes that will go in fine. My problem is with the pol.x
 and pol.y input. My polygon is currently in the form of a
 SpatialPolygonsDataFrame created by inputting shp files with the rgdal
 package.

 How do I get a numerical array of the x- and y-coordinates from my polygon
 that will go into the point.in.polygon function?



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/point-in-polygon-help-tp4667645.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Daisy Englert Duursma
Department of Biological Sciences
Room E8C156
Macquarie University, North Ryde, NSW 2109
Australia

Tel +61 2 9850 9256

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SEM: multigroup model

2013-05-23 Thread Amarnath Bose
Dear R Gurus,

I am trying to run a multigroup SEM using Prof. John Fox's SEM package.

The two groups are Ready to Eat denoted by RTE and
Ready to Cook denoted by RTC.

I ran a omnibus CFA on the data of consumer perceptions  preferences and
am satisfied with what I got.

When I tried to do a multigroup SEM - my understanding is limited to the
SEM manual in CRAN - using the code below, I get the following message:
  Error in summary.msemObjectiveML(sem.MG) :
 no 'dimnames' attribute for array
   Execution halted

The relevant part of my code follows:
mod.mg - multigroupModel(sbmod.cfa,groups=c(RTC,RTE))

sem.MG - sem(mod.mg,data=srt,group=RTind,
  formula = ~ inv + imp + tch + emo + loy + usg + sig + dif + ndif
+
vda + vdb + vdc + vdd + vde + vdf + vdg + vdh +
riskT + riskP + riskS + riskFi + riskFu + riskPs

)
summary(sem.MG)

I was expecting two sets of fit indices for RTE  RTC and want to do an
ANOVA
across the models; as well as possibly check for loading equivalence.

Can somebody please throw some light on where I am making a mistake?

Thanks
Amarnath Bose
-- 


*Amarnath Bose*
* **Associate Professor   *
*Decision Sciences Department*
*Birla Institute of Management Technology
*
Tel:  +91 120 2323001 - 10 Ext.: 398
Cell: +91 9873179813

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transform Coordinate System of an ASCII-Grid

2013-05-23 Thread Barry Rowlingson
On Thu, May 23, 2013 at 8:44 AM, jas
jacqueline.schwei...@wuestundpartner.com wrote:
 Dear all,


 I have an ASCII-Grid for Switzerland in the Swiss National Coordinate System
 of CH1903. Now for a Webapplication of the ASCII-Grid, I need to deliver the
 ASCII-Grid in the WGS84 System.

 Via coordinates(ascii) I can export the coordinates and convert them with
 a formula into WGS84. My problem is now, how can I implement these into the
 ASCII-Grid, so that the whole grid-structure is from now on gonna be saved
 in the WGS84-coordinate format?
 (important: I don't want to change the projection, I want to actually change
 the numeric format of the coordinates)

You can't change the numeric format of the coordinates without
changing the projection (unless changing from km to m).

In your original coordinate system your grid is a bunch of rectangles
with straight sides and right angles. In your WGS84 system the squares
are no longer square, the sides are no longer straight, and the angles
are no longer 90 degrees. This is all too complicated for a simple
grid data structure to comprehend.

 The solution may be to reproject your grid. This is a transformation
of values, much like stretching an image file, from one grid to
another. raster:projectRaster can do this for you.

For a dataset with a small extent, for some small values of small,
you may be able to get away with transforming the corner coordinates
and ignoring the fact that the earth is not flat. But this will make
everyone who thinks the earth is round cry.

 You should also look into the raster package for more info. You've
not said what you're using to read the data.

 You should probably ask in r-sig-geo anyway, where the mappers hang out.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SEM: multigroup model

2013-05-23 Thread John Fox
Dear Amarnath Bose,

There's nothing obviously wrong with the commands that you report -- in
fact, your commands have the same structure as the multigroup SEM example in
?sem -- so the usual advice about including reproducible code producing the
error applies. If you like, you could send me your data and the complete R
script that you used.

Best,
 John

---
John Fox
Senator McMaster Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Amarnath Bose
 Sent: Thursday, May 23, 2013 6:54 AM
 To: r-help@r-project.org
 Subject: [R] SEM: multigroup model
 
 Dear R Gurus,
 
 I am trying to run a multigroup SEM using Prof. John Fox's SEM package.
 
 The two groups are Ready to Eat denoted by RTE and
 Ready to Cook denoted by RTC.
 
 I ran a omnibus CFA on the data of consumer perceptions  preferences
 and
 am satisfied with what I got.
 
 When I tried to do a multigroup SEM - my understanding is limited to
 the
 SEM manual in CRAN - using the code below, I get the following message:
   Error in summary.msemObjectiveML(sem.MG) :
  no 'dimnames' attribute for array
Execution halted
 
 The relevant part of my code follows:
 mod.mg - multigroupModel(sbmod.cfa,groups=c(RTC,RTE))
 
 sem.MG - sem(mod.mg,data=srt,group=RTind,
   formula = ~ inv + imp + tch + emo + loy + usg + sig + dif +
 ndif
 +
 vda + vdb + vdc + vdd + vde + vdf + vdg + vdh +
 riskT + riskP + riskS + riskFi + riskFu + riskPs
 
 )
 summary(sem.MG)
 
 I was expecting two sets of fit indices for RTE  RTC and want to do an
 ANOVA
 across the models; as well as possibly check for loading equivalence.
 
 Can somebody please throw some light on where I am making a mistake?
 
 Thanks
 Amarnath Bose
 --
 
 
 *Amarnath Bose*
 * **Associate Professor   *
 *Decision Sciences Department*
 *Birla Institute of Management Technology
 *
 Tel:  +91 120 2323001 - 10 Ext.: 398
 Cell: +91 9873179813
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transform Coordinate System of an ASCII-Grid

2013-05-23 Thread jas
Hello Berry, 

thank you for your reply. 
yes, the flat versus round earth projection is a difficulty, as my grid
isn't that far spread out, I thought I would just use the method anyways. 
I usually use raster or maptools (readAsciiGrid). I am gonna look in to the
mapper's forum, thank you for that tipp :)

Jacqueline




--
View this message in context: 
http://r.789695.n4.nabble.com/Transform-Coordinate-System-of-an-ASCII-Grid-tp4667786p4667799.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Merge

2013-05-23 Thread Rui Barradas

Hello,

Try the following.


rm(list = ls())


dat1 - read.table(text = 
Serialno  name  year outcome
1 ken1989   d
2 mary 1989a
4 john1989   a
5 tom 1989   a
6 jolly   1989  d
, header = TRUE, stringsAsFactors = FALSE)

dat2 - read.table(text = 
Serialno  name   year   disch_type
11 mwai1990   d
21  wanjiku  1990a
43   maina1990   a
55john 1990   a
67 welly   1990   d
, header = TRUE, stringsAsFactors = FALSE)

res - merge(dat1[, c(1, 4)], dat2[, c(1, 4)], all = TRUE)
res - merge(merge(res, dat1, all.y = TRUE), merge(res, dat2, all.y = 
TRUE), all = TRUE)

res - res[, c(1, 4, 5, 2, 3)]
res


Hope this helps,

Rui Barradas

Em 23-05-2013 09:41, Keniajin Wambui escreveu:

-- Forwarded message --
From: Keniajin Wambui kiang...@gmail.com
Date: Thu, May 23, 2013 at 11:36 AM
Subject: Merge
To: r-help@r-project.org


I am using R 3.01 on R Studio to merge two data sets with approx 120
variables and the other with 140 variables but with a serialno as the
unique identifier.
i.e

Serialno  name  year outcome
1 ken1989   d
2 mary 1989a
4 john1989   a
5 tom 1989   a
6 jolly   1989  d

and

Serialno  name   year   disch_type
11 mwai1990   d
21  wanjiku  1990a
43   maina1990   a
55john 1990   a
67 welly   1990   d

How can I merge them to a common data set without having name.x and
name.y or year.x and year.y after merging
--
Mega Six Solutions
Web Designer and Research Consultant
Kennedy Mwai
25475211786


--
Mega Six Solutions
Web Designer and Research Consultant
Kennedy Mwai
25475211786

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding rows without loops

2013-05-23 Thread Adeel - SafeGreenCapital
Thank you Blaser:

This is the exact solution I came up with but when comparing 8M rows even on
an 8G machine, one runs out of memory.  To run this effectively, I have to
break the DF into smaller DFs, loop through them and then do a massive
rmerge at the end.  That's what takes 8+ hours to compute.

Even the bigmemory package is causing OOM issues.  

-Original Message-
From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] 
Sent: Thursday, May 23, 2013 12:15 AM
To: Adeel Amin; r-help@r-project.org
Subject: RE: [R] adding rows without loops

Merge should do the trick. How to best use it will depend on what you
want to do with the data after. 
The following is an example of what you could do. This will perform
best, if the rows are missing at random and do not cluster.

DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))

DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE)

while(any(is.na(DFm))){
  if (any(is.na(DFm[1,]))) stop(Complete first row required!)
  ind - which(is.na(DFm), arr.ind=TRUE)
  prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2)
  DFm[is.na(DFm)] - DFm[prind]
}
DFm

Best,
Nello

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Adeel Amin
Sent: Donnerstag, 23. Mai 2013 07:01
To: r-help@r-project.org
Subject: [R] adding rows without loops

I'm comparing a variety of datasets with over 4M rows.  I've solved this
problem 5 different ways using a for/while loop but the processing time
is murder (over 8 hours doing this row by row per data set).  As such
I'm trying to find whether this solution is possible without a loop or
one in which the processing time is much faster.

Each dataset is a time series as such:

DF1:

X.DATE X.TIME VALUE VALUE2
1 01052007   020037 29
2 01052007   030042 24
3 01052007   040045 28
4 01052007   050045 27
5 01052007   070045 35
6 01052007   080042 32
7 01052007   090045 32
...
...
...
n

DF2

X.DATE X.TIME VALUE VALUE2
1 01052007   020037 29
2 01052007   030042 24
3 01052007   040045 28
4 01052007   050045 27
5 01052007   060045 35
6 01052007   070042 32
7 01052007   080045 32

...
...
n+4000

In other words there are 4000 more rows in DF2 then DF1 thus the
datasets are of unequal length.

I'm trying to ensure that all dataframes have the same number of X.DATE
and X.TIME entries.  Where they are missing, I'd like to insert a new
row.

In the above example, when comparing DF2 to DF1, entry 01052007 0600
entry is missing in DF1.  The solution would add a row to DF1 at the
appropriate index.

so new dataframe would be


X.DATE X.TIME VALUE VALUE2
1 01052007   020037 29
2 01052007   030042 24
3 01052007   040045 28
4 01052007   050045 27
5 01052007   060045 27
6 01052007   070045 35
7 01052007   080042 32
8 01052007   090045 32

Value and Value2 would be the same as row 4.

Of course this is simple to accomplish using a row by row analysis but
with of 4M rows the processing time destroying and rebinding the
datasets is very time consuming and I believe highly un-R'ish.  What am
I missing?

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] point.in.polygon help

2013-05-23 Thread karengrace84
I am looking at fish tagging data. I have gps coordinates of where each fish
was tagged and released, and I have a map of 10 coastal basins of the state
of Louisiana. I am trying to determine which basin each fish was tagged in. 



--
View this message in context: 
http://r.789695.n4.nabble.com/point-in-polygon-help-tp4667645p4667808.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding rows...

2013-05-23 Thread Adeel - SafeGreenCapital
Hi Rainer:

Thanks for the reply.  Posting the large dataset is a task.  There are 8M
rows between the two of them and the first discrepancy in the data doesn't
happen until at least the 40,000th row on each dataframe.  The examples I
posted are a pretty good abstraction of the root of the issue.  

The problem isn't the data.  The problem is Out Of Memory issues when doing
any operations like merge, rbind, etc.  The solution that Blaser suggested
in his post works great, but the systems quickly run out of memory.  What
does work without OOM issues are for/while loops but on average take an
inordinate time to compute and tie up a machine for hours and hours at time.
Essentially I break the data apart, add rows and rebind.  It's a brute force
type of approach and run times are in excess of 48 hours for one full
iteration across 25 data frames.  Terrible.

I am about to go down the road of using data.tables class as its far more
memory efficient, but the documentation is cryptic. Your idea of creating a
super set has some merit and it's what I was experimenting with prior to my
original post.  

-Original Message-
From: Rainer Schuermann [mailto:rainer.schuerm...@gmx.net] 
Sent: Thursday, May 23, 2013 12:19 AM
To: Adeel Amin
Subject: adding rows...

Can I suggest that you post the output of

dput( DF1 )
dput( DF2 )

rather than pictures of your data? Any solution attempt will depend upon
the data types...

Just shooting in the dark: Have you tried just row-binding the missing 4k
lines to DF1 and then order DF1 as you like? It looks as if the data are
ordered by time / date? 

Rgds,
Rainer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Merge

2013-05-23 Thread arun
You could also do:
library(plyr)
res1-join(dat1,dat2,type=full)
res1
#   Serialno    name year outcome disch_type
#1 1 ken 1989   d   NA
#2 2    mary 1989   a   NA
#3 4    john 1989   a   NA
#4 5 tom 1989   a   NA
#5 6   jolly 1989   d   NA
#6    11    mwai 1990    NA  d
#7    21 wanjiku 1990    NA  a
#8    43   maina 1990    NA  a
#9    55    john 1990    NA  a
#10   67   welly 1990    NA  d


identical(res,res1)
#[1] TRUE
#or
lst1-list(dat1,dat2)


 Reduce(function(...) merge(...,by=c(Serialno,name,year),all=TRUE),lst1) 

#  Serialno    name year outcome disch_type
#1 1 ken 1989   d   NA
#2 2    mary 1989   a   NA
#3 4    john 1989   a   NA
#4 5 tom 1989   a   NA
#5 6   jolly 1989   d   NA
#6    11    mwai 1990    NA  d
#7    21 wanjiku 1990    NA  a
#8    43   maina 1990    NA  a
#9    55    john 1990    NA  a
#10   67   welly 1990    NA  d
A.K.



- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: Keniajin Wambui kiang...@gmail.com
Cc: r-help@r-project.org
Sent: Thursday, May 23, 2013 8:36 AM
Subject: Re: [R] Fwd: Merge

Hello,

Try the following.


rm(list = ls())


dat1 - read.table(text = 
Serialno  name  year outcome
1             ken    1989   d
2             mary 1989    a
4             john    1989   a
5             tom     1989   a
6             jolly   1989      d
, header = TRUE, stringsAsFactors = FALSE)

dat2 - read.table(text = 
Serialno  name       year       disch_type
11             mwai    1990       d
21          wanjiku  1990        a
43           maina    1990       a
55            john     1990       a
67             welly   1990       d
, header = TRUE, stringsAsFactors = FALSE)

res - merge(dat1[, c(1, 4)], dat2[, c(1, 4)], all = TRUE)
res - merge(merge(res, dat1, all.y = TRUE), merge(res, dat2, all.y = 
TRUE), all = TRUE)
res - res[, c(1, 4, 5, 2, 3)]
res


Hope this helps,

Rui Barradas

Em 23-05-2013 09:41, Keniajin Wambui escreveu:
 -- Forwarded message --
 From: Keniajin Wambui kiang...@gmail.com
 Date: Thu, May 23, 2013 at 11:36 AM
 Subject: Merge
 To: r-help@r-project.org


 I am using R 3.01 on R Studio to merge two data sets with approx 120
 variables and the other with 140 variables but with a serialno as the
 unique identifier.
 i.e

 Serialno  name  year outcome
 1             ken    1989   d
 2             mary 1989    a
 4             john    1989   a
 5             tom     1989   a
 6             jolly   1989      d

 and

 Serialno  name       year       disch_type
 11             mwai    1990       d
 21          wanjiku  1990        a
 43           maina    1990       a
 55            john     1990       a
 67             welly   1990       d

 How can I merge them to a common data set without having name.x and
 name.y or year.x and year.y after merging
 --
 Mega Six Solutions
 Web Designer and Research Consultant
 Kennedy Mwai
 25475211786


 --
 Mega Six Solutions
 Web Designer and Research Consultant
 Kennedy Mwai
 25475211786

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using metafor for meta-analysis of before-after studies (escalc, SMCC)

2013-05-23 Thread Qiang Yue
 Dear Dr. Viechtbauer:

Thanks so much! Now all these issues are clear. 

With best regards.




Qiang Yue

From: Viechtbauer Wolfgang (STAT)
Date: 2013-05-23 05:06
To: qiangmoon; r-help
Subject: RE: RE: [R] using metafor for meta-analysis of before-after studies 
(escalc, SMCC)
The mean percentage change and the raw mean change are not directly comparable, 
even after standardization based on the SD of the percentage change or raw 
change values. So, I would not mix those in the same analysis.

Best,
Wolfgang

 -Original Message-
 From: Qiang Yue [mailto:qiangm...@gmail.com]
 Sent: Wednesday, May 22, 2013 20:38
 To: Viechtbauer Wolfgang (STAT); r-help
 Subject: Re: RE: [R] using metafor for meta-analysis of before-after
 studies (escalc, SMCC)
 
 Dear Dr. Viechtbauer:
 
 Thank you very much for sparing your precious time to answer my question.
 I still want to make sure for the third question below:  for studies which
 only reported percentage changes (something like: the metabolite
 concentration increased by 20%+/-5% after intervention), we can not use
 the percentage change to calculate SMCC, but have to get the raw change
 first?
 
 With best wishes.
 
 Qiang Yue
 
 From: Viechtbauer Wolfgang (STAT)
 Date: 2013-05-21 10:09
 To: Moon Qiang; r-help
 Subject: RE: [R] using metafor for meta-analysis of before-after studies
 (escalc, SMCC)
 Please see my answers below.
 
 Best,
 Wolfgang
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On Behalf Of Moon Qiang
  Sent: Thursday, May 16, 2013 19:12
  To: r-help
  Subject: [R] using metafor for meta-analysis of before-after studies
  (escalc, SMCC)
 
  Hello.
 
  I am trying to perform meta-analysis on some before-after studies. These
  studies are designed to clarify if there is any significant metabolic
  change before and after an intervention. There is only one group in thes
 e
  studies, i.e., no control group. I followed the e-mail communication of
  R-help (https://stat.ethz.ch/pipermail/r-help/2012-
 April/308946.html ) and
  the Metafor Manual (version 1.8-0, released 2013-04-
 11, relevant contents
  can be found on pages of 59-61 under 'Outcome Measures for Individual
  Groups '). I made a trial analysis and attached the output here, I wonde
 r
  if anyone can look through it and give me some comments.
   I have three questions about the analysis:
 
  1) Most studies reported the before-and-after raw change as Mean+/-
 SD, but
  few of them have reported the values of before-intervention (mean_r and
  sd_r) and the values of after-
 intervention (mean_s and sd_s), and none of
  them reported the r value (correlation for the before- and after-
  intervention measurements). Based on the guideline of the Metafor manual
 ,
  I
  set the raw mean change as m1i (i.e., raw mean change=mean_s=m1i), and s
 et
  the standard deviation of raw change as sd1i (i.e., the standard deviati
 on
  of raw change =sd_s=sd1i), and set all other arguments including m2i,
  sd2i,
  ri as 0, and then calculated the standardized mean change using change
  score (SMCC). I am not sure if all these settings are correct.
 
 This is correct. The escalc() function still will compute (m1i-
 m2i)/sqrt(sd1i^2 + sd2i^2 -
  2*ri*sd1i*sd2i), but since m2i=sd2i=ri=0, this is equivalent to mean_chan
 ge / SD_change, which is what you want.
 
 Make sure that mean_s is NOT the standard error (SE) of the change scores,
  but really the SD.
 
  2) A few studies have specified individual values of m1i, m2i, sd1i, sd2
 i
  ,
  but did not report the change score or its sd. So can I set r=0 and use
  these values to calculate SMCC? Since SMCC is not calculated in the same
  way like 1), will this be a problem?
 
 Yes, this will be a problem, since you now really assume that r=0, which i
 s not correct. Maybe you can back-
 calculate r from other information (e.g., the p or t value from a t-test -
 - see https://stat.ethz.ch/pipermail/r-help/2012-
 April/308946.html). Or you could try to get r from the authors (then you c
 ould also just directly ask for the change score mean and SD). If that is
 not successful, you will have to impute some kind of reasonable value for
 r and do a sensitivity analysis in the end.
 
  3) some studies reported the percentage mean changes instead of raw mean
  change (percentage change=(value of after-intervention - value of before
  intervention) / value of before intervention), I think it may not be the
  right way to simply substitute the raw mean change with the percentage
  mean
  changes. Is there any method to deal with this problem?
 
 Don't know anything off the top of my head.
 
  Any comments are welcome.
 
  With best regards.
--
   Qiang Yue
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] ordered and unordered variables

2013-05-23 Thread Greg Snow
Meng,

This really comes down to what question you are trying to answer.  Before
worrying about details of default contrasts and issues like that you first
need to work out what is really the question of interest.  The main
difference between declaring a variable ordered or not is the default
contrasts.  Defaults are provided because there are many cases where which
contrasts are used internally does not matter, so why make someone think
about it.  In cases where the choice of contrasts matter, it is rare that
any default coding is the correct/best choice and you should really think
through what contrasts answer the question of interest and use those custom
contrasts.

For example, to answer the question if Tension has any overall effect it
does not matter which contrast encoding you use (as long as it is full
rank), the test statistic and p-value for testing the whole effect will be
the same.  The predictions of the means of groups will also be the same
regardless of which contrasts are used (and this is often a clearer way to
present/explain the results).

A case where the specific contrasts would matter would be if we want to see
if we can reduce the number of groups by combining groups together, or
interpolate to certain groups.  The treatment contrasts will test if low
and medium can be combined (which makes sense) and if low and high can be
combined (which does not make sense unless the first is true and in fact
the overall factor is not significant), what makes more sense would be to
compare low to medium and medium to high (it could be that low is different
from the other 2, but med and high can be combined).  The polynomial
contrasts give a different view, the quadratic term in this case tests
whether the medium group is the average of the low group and the high group
(so we could interpolate medium), this only makes sense if the medium
tension is centered (in some sense) between the other 2, i.e. the
difference from low to medium is exactly the same as the difference from
medium to high, but if that were the case then I would expect a numerical
term rather than an ordered factor.

So, to summarize, it depends on the question of interest.  For some
questions the contrasts don't matter, in which case it does not matter, in
other cases the correct contrasts to use are determined by the question and
you should use the contrasts that answer that question (which are rarely a
default).


On Tue, May 21, 2013 at 11:09 PM, meng laomen...@163.com wrote:

 Thanks.


 As to the data  warpbreaks, if I want to analysis the impact of
 tension(L,M,H) on breaks, should I order the tension or not?


 Many thanks.












 At 2013-05-21 20:55:18,David Winsemius dwinsem...@comcast.net wrote:
 
 On May 20, 2013, at 10:35 PM, meng wrote:
 
  Hi all:
  If the explainary variables are ordinal,the result of regression is
 different from
  unordered variables.But I can't understand the result of regression
 from ordered
  variable.
 
  The data is warpbreaks,which belongs to R.
 
  If I use the unordered variable(tension):Levels: L M H
  The result is easy to understand:
 Estimate Std. Error t value Pr(|t|)
  (Intercept)36.39   2.80  12.995   2e-16 ***
  tensionM  -10.00   3.96  -2.525 0.014717 *
  tensionH  -14.72   3.96  -3.718 0.000501 ***
 
  If I use the ordered variable(tension):Levels: L  M  H
  I don't know how to explain the result:
Estimate Std. Error t value Pr(|t|)
  (Intercept)   28.148  1.617  17.410   2e-16 ***
  tension.L-10.410  2.800  -3.718 0.000501 ***
  tension.Q  2.155  2.800   0.769 0.445182
 
  What's tension.L and tension.Q stands for?And how to explain the
 result then?
 
 Ordered factors are handled by the R regression mechanism with orthogonal
 polynomial contrasts: .L for linear and .Q for quadratic. If the term
 had 4 levels there would also have been a .C (cubic) term. Treatment
 contrasts are used for unordered factors. Generally one would want to do
 predictions for explanations of the results. Trying to explain the
 individual coefficient values from polynomial contrasts is similar to and
 just as unproductive as trying to explain the individual coefficients
 involving interaction terms.
 
 --
 
 David Winsemius
 Alameda, CA, USA
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding rows without loops

2013-05-23 Thread Rainer Schuermann
Using the data generated with your code below, does

rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] )
DF1 - DF1[ order( DF1$X.DATE, DF1$X.TIME ), ]

do the job?

Rgds,
Rainer




On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote:
 Thank you Blaser:
 
 This is the exact solution I came up with but when comparing 8M rows even on
 an 8G machine, one runs out of memory.  To run this effectively, I have to
 break the DF into smaller DFs, loop through them and then do a massive
 rmerge at the end.  That's what takes 8+ hours to compute.
 
 Even the bigmemory package is causing OOM issues.  
 
 -Original Message-
 From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] 
 Sent: Thursday, May 23, 2013 12:15 AM
 To: Adeel Amin; r-help@r-project.org
 Subject: RE: [R] adding rows without loops
 
 Merge should do the trick. How to best use it will depend on what you
 want to do with the data after. 
 The following is an example of what you could do. This will perform
 best, if the rows are missing at random and do not cluster.
 
 DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
 VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
 DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
 VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
 
 DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE)
 
 while(any(is.na(DFm))){
   if (any(is.na(DFm[1,]))) stop(Complete first row required!)
   ind - which(is.na(DFm), arr.ind=TRUE)
   prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2)
   DFm[is.na(DFm)] - DFm[prind]
 }
 DFm
 
 Best,
 Nello
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Adeel Amin
 Sent: Donnerstag, 23. Mai 2013 07:01
 To: r-help@r-project.org
 Subject: [R] adding rows without loops
 
 I'm comparing a variety of datasets with over 4M rows.  I've solved this
 problem 5 different ways using a for/while loop but the processing time
 is murder (over 8 hours doing this row by row per data set).  As such
 I'm trying to find whether this solution is possible without a loop or
 one in which the processing time is much faster.
 
 Each dataset is a time series as such:
 
 DF1:
 
 X.DATE X.TIME VALUE VALUE2
 1 01052007   020037 29
 2 01052007   030042 24
 3 01052007   040045 28
 4 01052007   050045 27
 5 01052007   070045 35
 6 01052007   080042 32
 7 01052007   090045 32
 ...
 ...
 ...
 n
 
 DF2
 
 X.DATE X.TIME VALUE VALUE2
 1 01052007   020037 29
 2 01052007   030042 24
 3 01052007   040045 28
 4 01052007   050045 27
 5 01052007   060045 35
 6 01052007   070042 32
 7 01052007   080045 32
 
 ...
 ...
 n+4000
 
 In other words there are 4000 more rows in DF2 then DF1 thus the
 datasets are of unequal length.
 
 I'm trying to ensure that all dataframes have the same number of X.DATE
 and X.TIME entries.  Where they are missing, I'd like to insert a new
 row.
 
 In the above example, when comparing DF2 to DF1, entry 01052007 0600
 entry is missing in DF1.  The solution would add a row to DF1 at the
 appropriate index.
 
 so new dataframe would be
 
 
 X.DATE X.TIME VALUE VALUE2
 1 01052007   020037 29
 2 01052007   030042 24
 3 01052007   040045 28
 4 01052007   050045 27
 5 01052007   060045 27
 6 01052007   070045 35
 7 01052007   080042 32
 8 01052007   090045 32
 
 Value and Value2 would be the same as row 4.
 
 Of course this is simple to accomplish using a row by row analysis but
 with of 4M rows the processing time destroying and rebinding the
 datasets is very time consuming and I believe highly un-R'ish.  What am
 I missing?
 
 Thanks!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert a character string to a name

2013-05-23 Thread Greg Snow
Here are a couple of approaches:

 dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2)
 x_test=c(x,y)
 aggregate( dftest[,x_test], dftest['z'], FUN=mean )
  z x y
1 0 7 1
2 1 6 2

 ### Or

 tmp.f - as.formula( paste( 'cbind(',
+   paste( x_test, collapse=',' ),
+ ') ~ z' ) )
 aggregate( tmp.f, data=dftest, FUN=mean )
  z x y
1 0 7 1
2 1 6 2


The first just uses x_test to subset the data frame and sends the
constructed subset to aggregate.  The second constructs the formula from
the strings and passes the formula to aggregate.



On Thu, May 23, 2013 at 1:05 AM, jpm miao miao...@gmail.com wrote:

 Hi,
From time to time I need to do the aggregation. To illustrate, I present
 a toy example as below. In this example, the task is to aggregate x and y
 by z with the function mean.
Could I call the aggregation function with x_test, where
x_test=c(x,y)? Thanks

 Miao


  dftest-data.frame(x=1:12, y=(1:12)%%4, z=(1:12)%%2)
  dftest
 x y z
 1   1 1 1
 2   2 2 0
 3   3 3 1
 4   4 0 0
 5   5 1 1
 6   6 2 0
 7   7 3 1
 8   8 0 0
 9   9 1 1
 10 10 2 0
 11 11 3 1
 12 12 0 0
  aggregate(cbind(x,y)~z, data=dftest, FUN=mean)
   z x y
 1 0 7 1
 2 1 6 2
  x_test=c(x,y)
  aggregate(cbind(x_test)~z, data=dftest, FUN=mean)
 Error in model.frame.default(formula = cbind(x_test) ~ z, data = dftest) :
   variable lengths differ (found for 'z')
 a1aggregate(cbind(factor(x_test))~z, data=dftest, FUN=mean)
 Error in model.frame.default(formula = cbind(factor(x_test)) ~ z, data =
 dftest) :
   variable lengths differ (found for 'z')
  aggregate(factor(x_test)~z, data=dftest, FUN=mean)
 Error in model.frame.default(formula = factor(x_test) ~ z, data = dftest) :
   variable lengths differ (found for 'z')

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Removing rows w/ smaller value from data frame

2013-05-23 Thread ramoss
Hello,

I have a column called max_date in my data frame and I only want to keep the
bigger values for the same activity.  How can I do that?

data frame:

activitymax_dt
A2013-03-05
B 2013-03-28
A 2013-03-28
C 2013-03-28
B 2013-03-01

Thank you for your help



--
View this message in context: 
http://r.789695.n4.nabble.com/Removing-rows-w-smaller-value-from-data-frame-tp4667816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing rows w/ smaller value from data frame

2013-05-23 Thread PIKAL Petr
Hi

change max_dt do PISIX class and use standard comparison operator and use the 
result for selecting rows.

 s-seq(c(ISOdate(2000,3,20)), by = day, length.out = 10)
 ss[5]
 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE


Regards
Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of ramoss
 Sent: Thursday, May 23, 2013 4:24 PM
 To: r-help@r-project.org
 Subject: [R] Removing rows w/ smaller value from data frame
 
 Hello,
 
 I have a column called max_date in my data frame and I only want to
 keep the bigger values for the same activity.  How can I do that?
 
 data frame:
 
 activitymax_dt
 A2013-03-05
 B 2013-03-28
 A 2013-03-28
 C 2013-03-28
 B 2013-03-01
 
 Thank you for your help
 
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Removing-
 rows-w-smaller-value-from-data-frame-tp4667816.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error message solution: cannot allocate vector of size 200Mb

2013-05-23 Thread Ray Cheung
Dear All,

I wrote a program using R 2.15.2 but this error message cannot allocate
vector of size 200Mb appeared. I want to ask in general how to handle this
situation. I try to run the same program on other computers. It is
perfectly fine. Can anybody help? Thank you very much in advance.

Best Regards,
Ray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message solution: cannot allocate vector of size 200Mb

2013-05-23 Thread Gyanendra Pokharel
Try in R 64 bit.
Thanks

Gyanendra Pokharel
University of Guelph
Guelph, ON


On Thu, May 23, 2013 at 10:53 AM, Ray Cheung ray1...@gmail.com wrote:

 Dear All,

 I wrote a program using R 2.15.2 but this error message cannot allocate
 vector of size 200Mb appeared. I want to ask in general how to handle this
 situation. I try to run the same program on other computers. It is
 perfectly fine. Can anybody help? Thank you very much in advance.

 Best Regards,
 Ray

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] group data based on row value

2013-05-23 Thread David Carlson
The OP indicated that the middle group should be closed on both ends, i.e.
[0.1, 0.6].

 dat2 - rbind(dat, 0.1, 0.6)
 dat2$group -  factor(ifelse(dat2$Var.1, A, ifelse(dat2$Var.6, C,
B)))
 dat2
  Var group
1 0.0 A
2 0.2 B
3 0.5 B
4 1.0 C
5 4.0 C
6 6.0 C
7 0.1 B
8 0.6 B

Does it but would be clumsy for more than three groups. Depending on the
precision of the numbers something like

 dat2$group - cut( dat2$Var, breaks=c(-Inf, 0.1-.0001, 0.6+.0001, Inf),
labels=LETTERS[1:3])
 dat2
  Var group
1 0.0 A
2 0.2 B
3 0.5 B
4 1.0 C
5 4.0 C
6 6.0 C
7 0.1 B
8 0.6 B

would also work.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Jeff Newmiller
Sent: Wednesday, May 22, 2013 5:27 PM
To: Ye Lin; R help
Subject: Re: [R] group data based on row value

dat$group - cut( dat$Var, breaks=c(-Inf,0.1, 0.6,Inf))
levels(dat$group) - LETTERS[1:3]

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Ye Lin ye...@lbl.gov wrote:

hey, I want to divide my data into three groups based on the value in
one
column with group name.

dat:

Var
0
0.2
0.5
1
4
6

I tried:

dat - cbind(dat, group=cut(dat$Var, breaks=c(0.1,0.6)))

But it doesnt work, I want to group those 0.1 as group A, 0.1-0.6 as
group
B, 0.6 as group C

Thanks for your help!

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample(c(0, 1)...) vs. rbinom

2013-05-23 Thread Albyn Jones

After a bit of playing around, I discovered that
sample() does something similar in other situations:


set.seed(105021)
sample(1:5,1,prob=c(1,1,1,1,1))

[1] 3

set.seed(105021)
sample(1:5,1)

[1] 2



set.seed(105021)
sample(1:5,5,prob=c(1,1,1,1,1))

[1] 3 4 2 1 5

set.seed(105021)
sample(1:5,5)

[1] 2 5 1 4 3

albyn


On 2013-05-22 22:24, peter dalgaard wrote:

On May 23, 2013, at 07:01 , Jeff Newmiller wrote:

You seem to be building an elaborate structure for testing the 
reproducibility of the random number generator. I suspect that rbinom 
is calling the random number generator a different number of times 
when you pass prob=0.5 than otherwise.


Nope. It's switching 0 and 1:

set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); 
set.seed(1); rbinom(10,1,pp)

 [1] 1 1 0 0 1 0 0 0 0 1
 [1] 0 0 1 1 0 1 1 1 1 0

which is curious, but of course has no implication for the
distributional properties. Curiouser, if you drop the prob= in 
sample.


set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); 
rbinom(10,1,pp)

 [1] 0 0 1 1 0 1 1 1 1 0
 [1] 0 0 1 1 0 1 1 1 1 0

However, it was never a design goal that two different random
functions (or even two code paths within the same function) should
give exactly the same values, even if they simulate the same
distribution, so this is nothing more than a curiosity.




Appendix A: some R code that exhibits the problem
=

ppp - seq(0, 1, by = 0.01)

result - do.call(rbind, lapply(ppp, function(p) {
set.seed(1)
sampleRes - sample(c(0, 1), size = 1, replace = TRUE,
prob=c(1-p, p))

set.seed(1)
rbinomRes - rbinom(1, size = 1, prob = p)

data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))

}))

result


Appendix B: the output from the R code
==

prob equivalent
1   0.00   TRUE
2   0.01   TRUE
3   0.02   TRUE
4   0.03   TRUE
5   0.04   TRUE
6   0.05   TRUE
7   0.06   TRUE
8   0.07   TRUE
9   0.08   TRUE
10  0.09   TRUE
11  0.10   TRUE
12  0.11   TRUE
13  0.12   TRUE
14  0.13   TRUE
15  0.14   TRUE
16  0.15   TRUE
17  0.16   TRUE
18  0.17   TRUE
19  0.18   TRUE
20  0.19   TRUE
21  0.20   TRUE
22  0.21   TRUE
23  0.22   TRUE
24  0.23   TRUE
25  0.24   TRUE
26  0.25   TRUE
27  0.26   TRUE
28  0.27   TRUE
29  0.28   TRUE
30  0.29   TRUE
31  0.30   TRUE
32  0.31   TRUE
33  0.32   TRUE
34  0.33   TRUE
35  0.34   TRUE
36  0.35   TRUE
37  0.36   TRUE
38  0.37   TRUE
39  0.38   TRUE
40  0.39   TRUE
41  0.40   TRUE
42  0.41   TRUE
43  0.42   TRUE
44  0.43   TRUE
45  0.44   TRUE
46  0.45   TRUE
47  0.46   TRUE
48  0.47   TRUE
49  0.48   TRUE
50  0.49   TRUE
51  0.50  FALSE
52  0.51   TRUE
53  0.52   TRUE
54  0.53   TRUE
55  0.54   TRUE
56  0.55   TRUE
57  0.56   TRUE
58  0.57   TRUE
59  0.58   TRUE
60  0.59   TRUE
61  0.60   TRUE
62  0.61   TRUE
63  0.62   TRUE
64  0.63   TRUE
65  0.64   TRUE
66  0.65   TRUE
67  0.66   TRUE
68  0.67   TRUE
69  0.68   TRUE
70  0.69   TRUE
71  0.70   TRUE
72  0.71   TRUE
73  0.72   TRUE
74  0.73   TRUE
75  0.74   TRUE
76  0.75   TRUE
77  0.76   TRUE
78  0.77   TRUE
79  0.78   TRUE
80  0.79   TRUE
81  0.80   TRUE
82  0.81   TRUE
83  0.82   TRUE
84  0.83   TRUE
85  0.84   TRUE
86  0.85   TRUE
87  0.86   TRUE
88  0.87   TRUE
89  0.88   TRUE
90  0.89   TRUE
91  0.90   TRUE
92  0.91   TRUE
93  0.92   TRUE
94  0.93   TRUE
95  0.94   TRUE
96  0.95   TRUE
97  0.96   TRUE
98  0.97   TRUE
99  0.98   TRUE
100 0.99   TRUE
101 1.00   TRUE

Appendix C: Session information
===


sessionInfo()

R version 3.0.0 (2013-04-03)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   
base






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible 

[R] Could graph objects be stored in a two-dimensional list?

2013-05-23 Thread jpm miao
Hi,

  I have a few graph objects created by some graphic package (say, ggplot2,
which I use frequently). Because of the existent relation between the
graphs, I'd like to index them in two dimensions as p[1,1], p[1,2], p[2,1],
p[2,2] for convenience.

  To my knowledge, the only data type capable of storing graph objects (and
any R object) is list, but unfortunately it is available in only one
dimension. Could the graphs be stored in any two-dimensional data type?

  One remedy that comes to my mind is to build a function f so that
f(1,1)=1
f(1,2)=2
f(2,1)=3
f(2,2)=4
  With functions f and f^{-1} (inverse function of f) , the two-dimensional
indices could be mapped to and from a set of one-dimensional indices, and
the functions are exactly the way R numbers elements in a matrix. Does R
have this built-in function for a m by n matrix or more generally, m*n*p
array? (I know this function is easy to write, but just want to make sure
whether it exists already)

   Thanks,

Miao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calcul of the mean in a period of time

2013-05-23 Thread arun
HI GG,
I should had checked with multiple t=0 only rows.
Apologies!
Check if this work: (Changed the thread name as the solution applies to that 
problem)

dat2- read.csv(dat6.csv,header=TRUE,sep=\t,row.names=1)
str(dat2)
#'data.frame':    3896 obs. of  3 variables:
# $ patient_id: int  2 2 2 2 2 2 2 2 2 2 ...
# $ t : int  0 1 2 3 4 5 6 7 8 9 ...
# $ basdai    : num  2.83 4.05 3.12 3.12 2.42 ...
 
library(plyr)
 dat2New-ddply(dat2,.(patient_id),summarize,t=seq(min(t),max(t)))
 res-join(dat2New,dat2,type=full)


 lst1-lapply(split(res,res$patient_id),function(x) 
{x1-x[x$t!=0,];do.call(rbind,lapply(split(x1,((x1$t-1)%/%3)+1),function(y) 
{y1-if(any(y$t==1)) rbind(x[x$t==0,],y) else y; 
data.frame(patient_id=unique(y1$patient_id),t=head(y1$t,1),basdai=mean(y1$basdai,na.rm=TRUE))})
 ) })

dat3-dat2[unlist(with(dat2,tapply(t,patient_id,FUN=function(x) x==0  
length(x)==1)),use.names=FALSE),]
 head(dat3,3)
#    patient_id t basdai
#143 10 0  5.225
#555 37 0  2.450
#627 42 0  6.950

 lst2-split(dat3,seq_len(nrow(dat3)))
 
lst1[lapply(lst1,length)==0]-mapply(rbind,lst1[lapply(lst1,length)==0],lst2,SIMPLIFY=FALSE)
res1-do.call(rbind,lst1)
 row.names(res1)- 1:nrow(res1)
 res2- res1[,-2]
res2$period-with(res2,ave(patient_id,patient_id,FUN=seq_along))
 #res2
#selected rows
res2[c(48:51,189:192,210:215),]
#    patient_id   basdai period
#48   9 3.625000  8
#49  10 5.225000  1 #t=0 only row
#50  11 6.018750  1
#51  11 6.00  2
#189 36 6.17  1
#190 37 2.45  1 #t=0 only row
#191 38 3.10  1
#192 38 3.575000  2
#210 41 1.918750  1
#211 41 4.025000  2
#212 41 2.975000  3
#213 41 1.725000  4
#214 42 6.95  1 #t=0 only row
#215 44 4.30  1

A.K.







From: GUANGUAN LUO guanguan...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Thursday, May 23, 2013 9:50 AM
Subject: Re: how to calculate the mean in a period of time?



Hello, Arun, sorry to trouble you again,
I tried your method and i found that for patient_id==10 et patient_id==37 ect, 
the scores are repeated 51 times, I don't understand why this occured.

Thank you so much.

GG

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] apply function within different groups

2013-05-23 Thread Estefanía Gómez Galimberti


Hi,

I have a very big data frame and I would like to apply a function to one of the 
columns within different groups  and obtain another dataframe
My data frame is like this:

group var1 var2 myvar 
group1 1 a 100 
group2 2 b 200 
group2 34 c 300 
group3 5 d 400 
group3 6 e 500 
group4 7 f 600 

and I woud like to apply this function to column myvar: 

mifunc = function(vec) {
vec=as.vector(vec)
for (i in 1:(length(vec)-1)){
vec[i]=vec[i+1]-1
}
return(vec)
}
by the groups in column group. I would like to obtain the same dataframe but 
with f(myvar) instead of myvar.

How can I do this?

Thanks, 
Estefania
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing rows w/ smaller value from data frame

2013-05-23 Thread arun
Hi,
Try:
datNew- read.table(text=
activity    max_dt
A    2013-03-05
B    2013-03-28
A    2013-03-28
C    2013-03-28
B    2013-03-01
,sep=,header=TRUE,stringsAsFactors=FALSE)
datNew$max_dt- as.Date(datNew$max_dt)
 aggregate(max_dt~activity,data=datNew,max)
#  activity max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28
#or

library(plyr)
 ddply(datNew,.(activity),summarize, max_dt=max(max_dt))
#  activity max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28
#or 
ddply(datNew,.(activity),summarize, max_dt=tail(sort(max_dt),1))
#  activity max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28


A.K.

- Original Message -
From: ramoss ramine.mossad...@finra.org
To: r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 10:23 AM
Subject: [R] Removing rows w/ smaller value from data frame

Hello,

I have a column called max_date in my data frame and I only want to keep the
bigger values for the same activity.  How can I do that?

data frame:

activity    max_dt
A            2013-03-05
B             2013-03-28
A             2013-03-28
C             2013-03-28
B             2013-03-01

Thank you for your help



--
View this message in context: 
http://r.789695.n4.nabble.com/Removing-rows-w-smaller-value-from-data-frame-tp4667816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xml newbie

2013-05-23 Thread Alexander Coppock
Dear r-helpers,

I am trying to extract quantities of interest from my iTunes library xml file. 
 For example, i'd like to be able to run a simple regression of playcount on 
track number, under the theory that tracks near the beginning of albums get 
played more (either because they are better or because people listen to the 
beginnings of albums)

I have an xml file that is of the following form:

key13162/key
dict
keyTrack ID/keyinteger13162/integer
keyName/keystringI'm A Wheel/string
keyArtist/keystringWilco/string
keyComposer/keystringJeff Tweedy/string
keyAlbum/keystringA Ghost is Born/string
keyGenre/keystringRock/string
keyKind/keystringMatched AAC audio file/string
keySize/keyinteger6248701/integer
keyTotal Time/keyinteger154648/integer
keyDisc Number/keyinteger1/integer
keyDisc Count/keyinteger1/integer
keyTrack Number/keyinteger9/integer
keyTrack Count/keyinteger12/integer
keyYear/keyinteger2004/integer
keyDate Modified/keydate2012-07-26T22:29:15Z/date
keyDate Added/keydate2010-01-27T00:02:21Z/date
keyBit Rate/keyinteger256/integer
keySample Rate/keyinteger44100/integer
keyPlay Count/keyinteger3/integer
keyPlay Date/keyinteger3434905791/integer
keyPlay Date UTC/keydate2012-11-05T00:29:51Z/date
keyArtwork Count/keyinteger1/integer
keySort Album/keystringGhost is Born/string
keyPersistent ID/keystringA8B0E5CF2E86A4C6/string
keyTrack Type/keystringFile/string
keyLocation/keystringfile://localhost/Users/Alex/Music/iTunes/iTunes%20Media/Music/Wilco/A%20Ghost%20is%20Born/09%20I'm%20A%20Wheel.m4a/string
keyFile Folder Count/keyinteger5/integer
keyLibrary Folder Count/keyinteger1/integer
/dict 


From each entry, i'd like to extract: Track ID, Track Number and Play Count.  
In this case, it would be 

13162, 9, 3

my guess is that this can be done using library(XML).

If anyone has any guidance, it would be appreciated.  Please note: 

a) I do not understand XML data structures, so please explain what you mean by 
children etc…
b) Not every entry in my database has a track number and a play count -- i'd 
like to have NAs associated with the appropriate Track ID, which all entries 
have.
c) it'd also be OK if this XML database just got turned into a normal r data 
frame.

Thanks!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in png: unable to start png() device

2013-05-23 Thread Ondrej Novak
Hi,
I use R 2.14.0 on Win XP Pro SP3 and it behaves same - some times.
After I draw a lot of plots (more then 200, 2 concurrent rgui processes
running in parallel) to png then I get same error message.
Bmp(), jpg(), png() - same error. Restart of Rgui helps nothing.

Solutin - restart system and voila everything is ok.

I suspect that there might be something wrong with allocation/deallocation
of Windows resources in windows() function.

Ondrej Novak

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing rows w/ smaller value from data frame

2013-05-23 Thread arun
From your email, it seems like aggregate() is working.

Could you please provide the sessionInfo()?
My guess is that some other loaded library is masking the summarize().
For example, if I load
library(Hmisc)
#The following object is masked from ‘package:plyr’:
#
 #   is.discrete, summarize


ddply(datNew,.(activity),summarize, max_dt=max(max_dt)) #
#Error in is.list(by) : 'by' is missing
 ddply(datNew,.(activity),plyr::summarize,max_dt=max(max_dt))
#  activity max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28
A.K.


- Original Message -
From: Mossadegh, Ramine N. ramine.mossad...@finra.org
To: arun smartpink...@yahoo.com
Cc: 
Sent: Thursday, May 23, 2013 10:44 AM
Subject: RE: [R] Removing rows w/ smaller value from data frame

Thank but I get : Error in is.list(by) : 'by' is missing
When I tried ddply(datNew,.(activity),summarize, max_dt=max(max_dt))

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Thursday, May 23, 2013 10:40 AM
To: Mossadegh, Ramine N.
Cc: R help
Subject: Re: [R] Removing rows w/ smaller value from data frame

Hi,
Try:
datNew- read.table(text=
activity    max_dt
A    2013-03-05
B    2013-03-28
A    2013-03-28
C    2013-03-28
B    2013-03-01
,sep=,header=TRUE,stringsAsFactors=FALSE)
datNew$max_dt- as.Date(datNew$max_dt)
 aggregate(max_dt~activity,data=datNew,max)
#  activity max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28
#or

library(plyr)
 ddply(datNew,.(activity),summarize, max_dt=max(max_dt)) #  activity max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28
#or
ddply(datNew,.(activity),summarize, max_dt=tail(sort(max_dt),1)) #  activity
 max_dt
#1    A 2013-03-28
#2    B 2013-03-28
#3    C 2013-03-28


A.K.

- Original Message -
From: ramoss ramine.mossad...@finra.org
To: r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 10:23 AM
Subject: [R] Removing rows w/ smaller value from data frame

Hello,

I have a column called max_date in my data frame and I only want to keep the
bigger values for the same activity.  How can I do that?

data frame:

activity    max_dt
A            2013-03-05
B             2013-03-28
A             2013-03-28
C             2013-03-28
B             2013-03-01

Thank you for your help



--
View this message in context: 
http://r.789695.n4.nabble.com/Removing-rows-w-smaller-value-from-data-frame-tp4667816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Confidentiality Notice:  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply function within different groups

2013-05-23 Thread arun
Hi,

May be this helps:
dat1- read.table(text=
group var1 var2 myvar
group1 1 a 100
group2 2 b 200
group2 34 c 300
group3 5 d 400
group3 6 e 500
group4 7 f 600
,sep=,header=TRUE,stringsAsFactors=FALSE)

library(plyr)
ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) 
#   group f_myvar
#1 group1  NA
#2 group2 299
#3 group2 300
#4 group3 499
#5 group3 500
#6 group4  NA
A.K.



- Original Message -
From: Estefanía Gómez Galimberti tef...@yahoo.com
To: r help help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 11:30 AM
Subject: [R] apply function within different groups



Hi,

I have a very big data frame and I would like to apply a function to one of the 
columns within different groups  and obtain another dataframe
My data frame is like this:

group var1 var2 myvar 
group1 1 a 100 
group2 2 b 200 
group2 34 c 300 
group3 5 d 400 
group3 6 e 500 
group4 7 f 600 

and I woud like to apply this function to column myvar: 

mifunc = function(vec) {
vec=as.vector(vec)
for (i in 1:(length(vec)-1)){
vec[i]=vec[i+1]-1
}
return(vec)
}
by the groups in column group. I would like to obtain the same dataframe but 
with f(myvar) instead of myvar.

How can I do this?

Thanks, 
Estefania
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding rows without loops

2013-05-23 Thread William Dunlap
 This is the exact solution I came up with ...

exact, really?

Is the time-consuming part the initial merge
   DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE)

or the postprocessing to turn runs of NAs into the last non-NA
value in the column
  while(any(is.na(DFm))){
if (any(is.na(DFm[1,]))) stop(Complete first row required!)
ind - which(is.na(DFm), arr.ind=TRUE)
prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2)
DFm[is.na(DFm)] - DFm[prind]
 }

If it is the latter, you may get better results from applying zoo::na.locf()
to each non-key column of DFm.  E.g.,
   library(zoo)
   f2 - function(DFm) {
  for(i in 3:length(DFm)) {
 DFm[[i]] - na.locf(DFm[[i]])
  }
  DFm
   }
   f(DFm)
gives the same result as Blaser's algorithm
  f1 - function (DFm)  {
 while (any(is.na(DFm))) {
 if (any(is.na(DFm[1, ]))) 
 stop(Complete first row required!)
 ind - which(is.na(DFm), arr.ind = TRUE)
 prind - matrix(c(ind[, row] - 1, ind[, col]), ncol = 2)
 DFm[is.na(DFm)] - DFm[prind]
 }
 DFm
 }

If there are not a huge number of columns I would guess that f2() would be much
faster.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Adeel - SafeGreenCapital
 Sent: Thursday, May 23, 2013 5:54 AM
 To: 'Blaser Nello'; r-help@r-project.org
 Subject: Re: [R] adding rows without loops
 
 Thank you Blaser:
 
 This is the exact solution I came up with but when comparing 8M rows even on
 an 8G machine, one runs out of memory.  To run this effectively, I have to
 break the DF into smaller DFs, loop through them and then do a massive
 rmerge at the end.  That's what takes 8+ hours to compute.
 
 Even the bigmemory package is causing OOM issues.
 
 -Original Message-
 From: Blaser Nello [mailto:nbla...@ispm.unibe.ch]
 Sent: Thursday, May 23, 2013 12:15 AM
 To: Adeel Amin; r-help@r-project.org
 Subject: RE: [R] adding rows without loops
 
 Merge should do the trick. How to best use it will depend on what you
 want to do with the data after.
 The following is an example of what you could do. This will perform
 best, if the rows are missing at random and do not cluster.
 
 DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
 VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
 DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
 VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
 
 DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE)
 
 while(any(is.na(DFm))){
   if (any(is.na(DFm[1,]))) stop(Complete first row required!)
   ind - which(is.na(DFm), arr.ind=TRUE)
   prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2)
   DFm[is.na(DFm)] - DFm[prind]
 }
 DFm
 
 Best,
 Nello
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Adeel Amin
 Sent: Donnerstag, 23. Mai 2013 07:01
 To: r-help@r-project.org
 Subject: [R] adding rows without loops
 
 I'm comparing a variety of datasets with over 4M rows.  I've solved this
 problem 5 different ways using a for/while loop but the processing time
 is murder (over 8 hours doing this row by row per data set).  As such
 I'm trying to find whether this solution is possible without a loop or
 one in which the processing time is much faster.
 
 Each dataset is a time series as such:
 
 DF1:
 
 X.DATE X.TIME VALUE VALUE2
 1 01052007   020037 29
 2 01052007   030042 24
 3 01052007   040045 28
 4 01052007   050045 27
 5 01052007   070045 35
 6 01052007   080042 32
 7 01052007   090045 32
 ...
 ...
 ...
 n
 
 DF2
 
 X.DATE X.TIME VALUE VALUE2
 1 01052007   020037 29
 2 01052007   030042 24
 3 01052007   040045 28
 4 01052007   050045 27
 5 01052007   060045 35
 6 01052007   070042 32
 7 01052007   080045 32
 
 ...
 ...
 n+4000
 
 In other words there are 4000 more rows in DF2 then DF1 thus the
 datasets are of unequal length.
 
 I'm trying to ensure that all dataframes have the same number of X.DATE
 and X.TIME entries.  Where they are missing, I'd like to insert a new
 row.
 
 In the above example, when comparing DF2 to DF1, entry 01052007 0600
 entry is missing in DF1.  The solution would add a row to DF1 at the
 appropriate index.
 
 so new dataframe would be
 
 
 X.DATE X.TIME VALUE VALUE2
 1 01052007   020037 29
 2 01052007   030042 24
 3 01052007   040045 28
 4 01052007   050045 27
 5 01052007   060045 27
 6 01052007   070045 35
 7 01052007   080042 32
 8 01052007   090045 32
 
 Value and Value2 would be the same as row 4.
 
 Of course this is simple to accomplish using a row by row analysis but
 with of 4M rows the processing time destroying and rebinding the
 datasets is very time 

Re: [R] Could graph objects be stored in a two-dimensional list?

2013-05-23 Thread Jeff Newmiller
You could use lists of lists, and index them with vectors.

a - list()
a[[1]] - list()
a[[2]] - list()
a[[c(1,1)]] - g11
a[[c(1,2)]] - g12
a[[c(2,1)]] - g21
a[[c(2,2)]] - g22
print(a[[c(2,1)]])

but this seems like an inefficient use of memory because your indexed data is 
stored more compactly than the graph object is. I would index the data and 
generate the graph object on the fly when I wanted to see it.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

jpm miao miao...@gmail.com wrote:

Hi,

I have a few graph objects created by some graphic package (say,
ggplot2,
which I use frequently). Because of the existent relation between the
graphs, I'd like to index them in two dimensions as p[1,1], p[1,2],
p[2,1],
p[2,2] for convenience.

To my knowledge, the only data type capable of storing graph objects
(and
any R object) is list, but unfortunately it is available in only one
dimension. Could the graphs be stored in any two-dimensional data type?

  One remedy that comes to my mind is to build a function f so that
f(1,1)=1
f(1,2)=2
f(2,1)=3
f(2,2)=4
With functions f and f^{-1} (inverse function of f) , the
two-dimensional
indices could be mapped to and from a set of one-dimensional indices,
and
the functions are exactly the way R numbers elements in a matrix. Does
R
have this built-in function for a m by n matrix or more generally,
m*n*p
array? (I know this function is easy to write, but just want to make
sure
whether it exists already)

   Thanks,

Miao

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply function within different groups

2013-05-23 Thread Estefanía Gómez Galimberti
Thanks a lot!!! It works perkectly!
Just one thing, is there a way to preserve my original data frame so i don´t 
need to join both tables? I could do it with rbind but my original data frame 
is not in order, so
Thanks again!



 From: arun smartpink...@yahoo.com
To: Estefanía Gómez Galimberti tef...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Thursday, May 23, 2013 12:48 PM
Subject: Re: [R] apply function within different groups


Hi,

May be this helps:
dat1- read.table(text=
group var1 var2 myvar
group1 1 a 100
group2 2 b 200
group2 34 c 300
group3 5 d 400
group3 6 e 500
group4 7 f 600
,sep=,header=TRUE,stringsAsFactors=FALSE)

library(plyr)
ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) 
#   group f_myvar
#1 group1  NA
#2 group2 299
#3 group2 300
#4 group3 499
#5 group3 500
#6 group4  NA
A.K.



- Original Message -
From: Estefanía Gómez Galimberti tef...@yahoo.com
To: r help help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 11:30 AM
Subject: [R] apply function within different groups



Hi,

I have a very big data frame and I would like to apply a function to one of the 
columns within different groups  and obtain another dataframe
My data frame is like this:

group var1 var2 myvar 
group1 1 a 100 
group2 2 b 200 
group2 34 c 300 
group3 5 d 400 
group3 6 e 500 
group4 7 f 600 

and I woud like to apply this function to column myvar: 

mifunc = function(vec) {
vec=as.vector(vec)
for (i in 1:(length(vec)-1)){
vec[i]=vec[i+1]-1
}
return(vec)
}
by the groups in column group. I would like to obtain the same dataframe but 
with f(myvar) instead of myvar.

How can I do this?

Thanks, 
Estefania
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: Kernel smoothing with bandwidth which varies with x

2013-05-23 Thread IOANNA
Hello all, 

I would like to use the Nadaraya-Watson estimator assuming a Gaussian
kernel: So far I sued the 
library(sm)
library(sm)
x-runif(5000)
y-rnorm(5000)
plot(x,y,col='black')
h1-h.select(x,y,method='aicc')
lines(ksmooth(x,y,bandwidth=h1))

which works fine. What if my data were clustered requiring a bandwidth that
varies with x? How can I do that?

Thanks in advance, 
Ioanna

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Could graph objects be stored in a two-dimensional list?

2013-05-23 Thread William Dunlap
To my knowledge, the only data type capable of storing graph objects
(and
any R object) is list, but unfortunately it is available in only one
dimension. Could the graphs be stored in any two-dimensional data type?

Lists can have any number of dimensions you want, just as with other vector
types.  The default printout of such a thing is not very pretty, but the 
information
is in the object.

 M - matrix(list(as.roman(99), Two, c(3,pi) , c(4,44,444)), nrow=2, ncol=2)
 M
 [,1]  [,2] 
[1,] 99Numeric,2
[2,] Two Numeric,3
 M[[1,1]]
[1] XCIX
 M[[1,2]]
[1] 3.00 3.141593

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Jeff Newmiller
 Sent: Thursday, May 23, 2013 9:06 AM
 To: jpm miao; r-help
 Subject: Re: [R] Could graph objects be stored in a two-dimensional list?
 
 You could use lists of lists, and index them with vectors.
 
 a - list()
 a[[1]] - list()
 a[[2]] - list()
 a[[c(1,1)]] - g11
 a[[c(1,2)]] - g12
 a[[c(2,1)]] - g21
 a[[c(2,2)]] - g22
 print(a[[c(2,1)]])
 
 but this seems like an inefficient use of memory because your indexed data is 
 stored
 more compactly than the graph object is. I would index the data and generate 
 the graph
 object on the fly when I wanted to see it.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.
 
 jpm miao miao...@gmail.com wrote:
 
 Hi,
 
 I have a few graph objects created by some graphic package (say,
 ggplot2,
 which I use frequently). Because of the existent relation between the
 graphs, I'd like to index them in two dimensions as p[1,1], p[1,2],
 p[2,1],
 p[2,2] for convenience.
 
 To my knowledge, the only data type capable of storing graph objects
 (and
 any R object) is list, but unfortunately it is available in only one
 dimension. Could the graphs be stored in any two-dimensional data type?
 
   One remedy that comes to my mind is to build a function f so that
 f(1,1)=1
 f(1,2)=2
 f(2,1)=3
 f(2,2)=4
 With functions f and f^{-1} (inverse function of f) , the
 two-dimensional
 indices could be mapped to and from a set of one-dimensional indices,
 and
 the functions are exactly the way R numbers elements in a matrix. Does
 R
 have this built-in function for a m by n matrix or more generally,
 m*n*p
 array? (I know this function is easy to write, but just want to make
 sure
 whether it exists already)
 
Thanks,
 
 Miao
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample(c(0, 1)...) vs. rbinom

2013-05-23 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Albyn Jones
 Sent: Thursday, May 23, 2013 8:30 AM
 To: r-help@r-project.org
 Subject: Re: [R] sample(c(0, 1)...) vs. rbinom
 
 After a bit of playing around, I discovered that
 sample() does something similar in other situations:
 
  set.seed(105021)
  sample(1:5,1,prob=c(1,1,1,1,1))
 [1] 3
  set.seed(105021)
  sample(1:5,1)
 [1] 2
 
 
  set.seed(105021)
  sample(1:5,5,prob=c(1,1,1,1,1))
 [1] 3 4 2 1 5
  set.seed(105021)
  sample(1:5,5)
 [1] 2 5 1 4 3
 
 albyn


What is the something similar you are referring to?  And I guess I still 
don't understand what it is that concerns you about the sample function.


Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204



 
 
 On 2013-05-22 22:24, peter dalgaard wrote:
  On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
 
  You seem to be building an elaborate structure for testing the
  reproducibility of the random number generator. I suspect that
 rbinom
  is calling the random number generator a different number of times
  when you pass prob=0.5 than otherwise.
 
  Nope. It's switching 0 and 1:
 
  set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
  set.seed(1); rbinom(10,1,pp)
   [1] 1 1 0 0 1 0 0 0 0 1
   [1] 0 0 1 1 0 1 1 1 1 0
 
  which is curious, but of course has no implication for the
  distributional properties. Curiouser, if you drop the prob= in
  sample.
 
  set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
  rbinom(10,1,pp)
   [1] 0 0 1 1 0 1 1 1 1 0
   [1] 0 0 1 1 0 1 1 1 1 0
 
  However, it was never a design goal that two different random
  functions (or even two code paths within the same function) should
  give exactly the same values, even if they simulate the same
  distribution, so this is nothing more than a curiosity.
 
 
 
  Appendix A: some R code that exhibits the problem
  =
 
  ppp - seq(0, 1, by = 0.01)
 
  result - do.call(rbind, lapply(ppp, function(p) {
  set.seed(1)
  sampleRes - sample(c(0, 1), size = 1, replace = TRUE,
  prob=c(1-p, p))
 
  set.seed(1)
  rbinomRes - rbinom(1, size = 1, prob = p)
 
  data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
 
  }))
 
  result
 
 
  Appendix B: the output from the R code
  ==
 
  prob equivalent
  1   0.00   TRUE
  2   0.01   TRUE
  3   0.02   TRUE
  4   0.03   TRUE
  5   0.04   TRUE
  6   0.05   TRUE
  7   0.06   TRUE
  8   0.07   TRUE
  9   0.08   TRUE
  10  0.09   TRUE
  11  0.10   TRUE
  12  0.11   TRUE
  13  0.12   TRUE
  14  0.13   TRUE
  15  0.14   TRUE
  16  0.15   TRUE
  17  0.16   TRUE
  18  0.17   TRUE
  19  0.18   TRUE
  20  0.19   TRUE
  21  0.20   TRUE
  22  0.21   TRUE
  23  0.22   TRUE
  24  0.23   TRUE
  25  0.24   TRUE
  26  0.25   TRUE
  27  0.26   TRUE
  28  0.27   TRUE
  29  0.28   TRUE
  30  0.29   TRUE
  31  0.30   TRUE
  32  0.31   TRUE
  33  0.32   TRUE
  34  0.33   TRUE
  35  0.34   TRUE
  36  0.35   TRUE
  37  0.36   TRUE
  38  0.37   TRUE
  39  0.38   TRUE
  40  0.39   TRUE
  41  0.40   TRUE
  42  0.41   TRUE
  43  0.42   TRUE
  44  0.43   TRUE
  45  0.44   TRUE
  46  0.45   TRUE
  47  0.46   TRUE
  48  0.47   TRUE
  49  0.48   TRUE
  50  0.49   TRUE
  51  0.50  FALSE
  52  0.51   TRUE
  53  0.52   TRUE
  54  0.53   TRUE
  55  0.54   TRUE
  56  0.55   TRUE
  57  0.56   TRUE
  58  0.57   TRUE
  59  0.58   TRUE
  60  0.59   TRUE
  61  0.60   TRUE
  62  0.61   TRUE
  63  0.62   TRUE
  64  0.63   TRUE
  65  0.64   TRUE
  66  0.65   TRUE
  67  0.66   TRUE
  68  0.67   TRUE
  69  0.68   TRUE
  70  0.69   TRUE
  71  0.70   TRUE
  72  0.71   TRUE
  73  0.72   TRUE
  74  0.73   TRUE
  75  0.74   TRUE
  76  0.75   TRUE
  77  0.76   TRUE
  78  0.77   TRUE
  79  0.78   TRUE
  80  0.79   TRUE
  81  0.80   TRUE
  82  0.81   TRUE
  83  0.82   TRUE
  84  0.83   TRUE
  85  0.84   TRUE
  86  0.85   TRUE
  87  0.86   TRUE
  88  0.87   TRUE
  89  0.88   TRUE
  90  0.89   TRUE
  91  0.90   TRUE
  92  0.91   TRUE
  93  0.92   TRUE
  94  0.93   TRUE
  95  0.94   TRUE
  96  0.95   TRUE
  97  0.96   TRUE
  98  0.97   TRUE
  99  0.98   TRUE
  100 0.99   TRUE
  101 1.00   TRUE
 
  Appendix C: Session information
  ===
 
  sessionInfo()
  R version 3.0.0 (2013-04-03)
  Platform: x86_64-redhat-linux-gnu (64-bit)
 
  locale:
   [1] LC_CTYPE=en_US.UTF-8   

Re: [R] apply function within different groups

2013-05-23 Thread arun
Hi,
No problem.
Try:

dat2-within(dat1,f_myvar-ave(myvar,group,FUN=mifunc))
 dat2
#   group var1 var2 myvar f_myvar
#1 group1    1    a   100  NA
#2 group2    2    b   200 299
#3 group2   34    c   300 300
#4 group3    5    d   400 499
#5 group3    6    e   500 500
#6 group4    7    f   600  NA
A.K.



From: Estefanía Gómez Galimberti tef...@yahoo.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Thursday, May 23, 2013 12:08 PM
Subject: Re: [R] apply function within different groups



Thanks a lot!!! It works perkectly!
Just one thing, is there a way to preserve my original data frame so i don´t 
need to join both tables? I could do it with rbind but my original data frame 
is not in order, so
Thanks again!



From: arun smartpink...@yahoo.com
To: Estefanía Gómez Galimberti tef...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Thursday, May 23, 2013 12:48 PM
Subject: Re: [R] apply function within different groups


Hi,

May be this helps:
dat1- read.table(text=
group var1 var2 myvar
group1 1 a 100
group2 2 b 200
group2 34 c 300
group3 5 d 400
group3 6 e 500
group4 7 f 600
,sep=,header=TRUE,stringsAsFactors=FALSE)

library(plyr)
ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) 
#   group f_myvar
#1 group1  NA
#2
group2 299
#3 group2 300
#4 group3 499
#5 group3 500
#6 group4  NA
A.K.



- Original Message -
From: Estefanía Gómez Galimberti tef...@yahoo.com
To: r help help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 11:30 AM
Subject: [R] apply function within different groups



Hi,

I have a very big data frame and I would like to apply a function to one of the 
columns within different groups  and obtain another dataframe
My data frame is like this:

group var1 var2 myvar 
group1 1 a 100 
group2 2 b 200 
group2 34 c 300 
group3 5 d 400 
group3 6 e 500 
group4 7 f
600 

and I woud like to apply this function to column myvar: 

mifunc = function(vec) {
vec=as.vector(vec)
for (i in 1:(length(vec)-1)){
vec[i]=vec[i+1]-1
}
return(vec)
}
by the groups in column group. I would like to obtain the same dataframe but 
with f(myvar) instead of myvar.

How can I do this?

Thanks, 
Estefania
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply function within different groups

2013-05-23 Thread arun
Using the previous solution:
dat3-mutate(dat1,f_myvar=ddply(dat1,.(group),summarize,f_myvar=mifunc(myvar))[,2])
identical(dat2,dat3)
#[1] TRUE
A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: Estefanía Gómez Galimberti tef...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Thursday, May 23, 2013 1:01 PM
Subject: Re: [R] apply function within different groups

Hi,
No problem.
Try:

dat2-within(dat1,f_myvar-ave(myvar,group,FUN=mifunc))
 dat2
#   group var1 var2 myvar f_myvar
#1 group1    1    a   100  NA
#2 group2    2    b   200 299
#3 group2   34    c   300 300
#4 group3    5    d   400 499
#5 group3    6    e   500 500
#6 group4    7    f   600  NA
A.K.



From: Estefanía Gómez Galimberti tef...@yahoo.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Thursday, May 23, 2013 12:08 PM
Subject: Re: [R] apply function within different groups



Thanks a lot!!! It works perkectly!
Just one thing, is there a way to preserve my original data frame so i don´t 
need to join both tables? I could do it with rbind but my original data frame 
is not in order, so
Thanks again!



From: arun smartpink...@yahoo.com
To: Estefanía Gómez Galimberti tef...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Thursday, May 23, 2013 12:48 PM
Subject: Re: [R] apply function within different groups


Hi,

May be this helps:
dat1- read.table(text=
group var1 var2 myvar
group1 1 a 100
group2 2 b 200
group2 34 c 300
group3 5 d 400
group3 6 e 500
group4 7 f 600
,sep=,header=TRUE,stringsAsFactors=FALSE)

library(plyr)
ddply(dat1,.(group),summarize, f_myvar=mifunc(myvar)) 
#   group f_myvar
#1 group1  NA
#2
group2 299
#3 group2 300
#4 group3 499
#5 group3 500
#6 group4  NA
A.K.



- Original Message -
From: Estefanía Gómez Galimberti tef...@yahoo.com
To: r help help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 11:30 AM
Subject: [R] apply function within different groups



Hi,

I have a very big data frame and I would like to apply a function to one of the 
columns within different groups  and obtain another dataframe
My data frame is like this:

group var1 var2 myvar 
group1 1 a 100 
group2 2 b 200 
group2 34 c 300 
group3 5 d 400 
group3 6 e 500 
group4 7 f
600 

and I woud like to apply this function to column myvar: 

mifunc = function(vec) {
vec=as.vector(vec)
for (i in 1:(length(vec)-1)){
vec[i]=vec[i+1]-1
}
return(vec)
}
by the groups in column group. I would like to obtain the same dataframe but 
with f(myvar) instead of myvar.

How can I do this?

Thanks, 
Estefania
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample(c(0, 1)...) vs. rbinom

2013-05-23 Thread Albyn Jones
the something similar is return a different result in two
situations where one might expect the same result, ie when
a probability vector with equal probabilities is supplied versus
the default of equal probabilities.

And, assuming that by concerns me you mean worries me,
I have no clue why you think it does!  It is a curiosity.

albyn

On Thu, May 23, 2013 at 04:38:18PM +, Nordlund, Dan (DSHS/RDA) wrote:
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Albyn Jones
  Sent: Thursday, May 23, 2013 8:30 AM
  To: r-help@r-project.org
  Subject: Re: [R] sample(c(0, 1)...) vs. rbinom
  
  After a bit of playing around, I discovered that
  sample() does something similar in other situations:
  
   set.seed(105021)
   sample(1:5,1,prob=c(1,1,1,1,1))
  [1] 3
   set.seed(105021)
   sample(1:5,1)
  [1] 2
  
  
   set.seed(105021)
   sample(1:5,5,prob=c(1,1,1,1,1))
  [1] 3 4 2 1 5
   set.seed(105021)
   sample(1:5,5)
  [1] 2 5 1 4 3
  
  albyn
 
 
 What is the something similar you are referring to?  And I guess I still 
 don't understand what it is that concerns you about the sample function.
 
 
 Dan
 
 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA 98504-5204
 
 
 
  
  
  On 2013-05-22 22:24, peter dalgaard wrote:
   On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
  
   You seem to be building an elaborate structure for testing the
   reproducibility of the random number generator. I suspect that
  rbinom
   is calling the random number generator a different number of times
   when you pass prob=0.5 than otherwise.
  
   Nope. It's switching 0 and 1:
  
   set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
   set.seed(1); rbinom(10,1,pp)
[1] 1 1 0 0 1 0 0 0 0 1
[1] 0 0 1 1 0 1 1 1 1 0
  
   which is curious, but of course has no implication for the
   distributional properties. Curiouser, if you drop the prob= in
   sample.
  
   set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
   rbinom(10,1,pp)
[1] 0 0 1 1 0 1 1 1 1 0
[1] 0 0 1 1 0 1 1 1 1 0
  
   However, it was never a design goal that two different random
   functions (or even two code paths within the same function) should
   give exactly the same values, even if they simulate the same
   distribution, so this is nothing more than a curiosity.
  
  
  
   Appendix A: some R code that exhibits the problem
   =
  
   ppp - seq(0, 1, by = 0.01)
  
   result - do.call(rbind, lapply(ppp, function(p) {
   set.seed(1)
   sampleRes - sample(c(0, 1), size = 1, replace = TRUE,
   prob=c(1-p, p))
  
   set.seed(1)
   rbinomRes - rbinom(1, size = 1, prob = p)
  
   data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
  
   }))
  
   result
  
  
   Appendix B: the output from the R code
   ==
  
   prob equivalent
   1   0.00   TRUE
   2   0.01   TRUE
   3   0.02   TRUE
   4   0.03   TRUE
   5   0.04   TRUE
   6   0.05   TRUE
   7   0.06   TRUE
   8   0.07   TRUE
   9   0.08   TRUE
   10  0.09   TRUE
   11  0.10   TRUE
   12  0.11   TRUE
   13  0.12   TRUE
   14  0.13   TRUE
   15  0.14   TRUE
   16  0.15   TRUE
   17  0.16   TRUE
   18  0.17   TRUE
   19  0.18   TRUE
   20  0.19   TRUE
   21  0.20   TRUE
   22  0.21   TRUE
   23  0.22   TRUE
   24  0.23   TRUE
   25  0.24   TRUE
   26  0.25   TRUE
   27  0.26   TRUE
   28  0.27   TRUE
   29  0.28   TRUE
   30  0.29   TRUE
   31  0.30   TRUE
   32  0.31   TRUE
   33  0.32   TRUE
   34  0.33   TRUE
   35  0.34   TRUE
   36  0.35   TRUE
   37  0.36   TRUE
   38  0.37   TRUE
   39  0.38   TRUE
   40  0.39   TRUE
   41  0.40   TRUE
   42  0.41   TRUE
   43  0.42   TRUE
   44  0.43   TRUE
   45  0.44   TRUE
   46  0.45   TRUE
   47  0.46   TRUE
   48  0.47   TRUE
   49  0.48   TRUE
   50  0.49   TRUE
   51  0.50  FALSE
   52  0.51   TRUE
   53  0.52   TRUE
   54  0.53   TRUE
   55  0.54   TRUE
   56  0.55   TRUE
   57  0.56   TRUE
   58  0.57   TRUE
   59  0.58   TRUE
   60  0.59   TRUE
   61  0.60   TRUE
   62  0.61   TRUE
   63  0.62   TRUE
   64  0.63   TRUE
   65  0.64   TRUE
   66  0.65   TRUE
   67  0.66   TRUE
   68  0.67   TRUE
   69  0.68   TRUE
   70  0.69   TRUE
   71  0.70   TRUE
   72  0.71   TRUE
   73  0.72   TRUE
   74  0.73   TRUE
   75  0.74   TRUE
   76  0.75   TRUE
   77  0.76   TRUE
   78  0.77   TRUE
   79  0.78   TRUE
   80  0.79   TRUE
   81  0.80   TRUE
   82  0.81   TRUE
   83  0.82   TRUE
   84  0.83   

Re: [R] data frame sum

2013-05-23 Thread arun
Hi,
ab- cbind(a,b) 
indx-duplicated(names(ab))|duplicated(names(ab),fromLast=TRUE)
res1-cbind(ab[!indx],v2=rowSums(ab[indx]))
 res1[,order(as.numeric(gsub([A-Za-z],,names(res1,]
#v1 v2 v3
#1  3  4  5

#Another example:

a2- data.frame(v1=c(3,6,7),v2=c(2,4,8))
 b2- data.frame(v2=c(2,6,7),v3=c(5,4,9))
 ab2- cbind(a2,b2)
indx-duplicated(names(ab2))|duplicated(names(ab2),fromLast=TRUE)
res1-cbind(ab2[!indx],v2=rowSums(ab2[indx]))
 res1[,order(as.numeric(gsub([A-Za-z],,names(res1,]
#  v1 v2 v3
#1  3  4  5
#2  6 10  4
#3  7 15  9
A.K.


Dear R expert, 
I have two data frame a and b: 
a - data.frame(v1=3,v2=2) 
b - data.frame(v2=2,v3=5) 

Is it possible to obtain a new data frame resulting from the sum of the 
previous df with the 3 variables? namely 
v1,v2,v3 
3,4,5 

Thanx, 
Gianandrea

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strings

2013-05-23 Thread Robin Mjelle
I have two files containing words. I want to print the are in file 1 but
NOT in file 2.
How do I go about?

file 1:
 ABL1
1 ALKBH1
2 ALKBH2
3 ALKBH3
4ANKRD17
5  APEX1
6  APEX2
7   APTX
8  ASF1A
9  ASTE1
10   ATM
11   ATR
12 ATRIP
13  ATRX
14 ATXN3
15 BCCIP
16   BLM
17 BRCA1
18 BRCA2


file2:
 ALKBH2
1ALKBH3
2 APEX1
3 APEX2
4  APLF
5  APTX
6   ATM
7   ATR
8 ATRIP
9   BLM
10BRCA1
11BRCA2
12BRIP1
13   BTBD12
14 CCNH

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strings

2013-05-23 Thread arun
Hi,
Try:

dat1- structure(list(V2 = c(ALKBH1, ALKBH2, ALKBH3, ANKRD17, 
APEX1, APEX2, APTX, ASF1A, ASTE1, ATM, ATR, ATRIP, 
ATRX, ATXN3, BCCIP, BLM, BRCA1, BRCA2)), .Names = V2, class = 
data.frame, row.names = c(NA, 
18L))


dat2- structure(list(V2 = c(ALKBH3, APEX1, APEX2, APLF, APTX, 
ATM, ATR, ATRIP, BLM, BRCA1, BRCA2, BRIP1, BTBD12, 
CCNH)), .Names = V2, class = data.frame, row.names = c(NA, 
14L))


library(sqldf)
sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')
#   V2
#1  ALKBH1
#2  ALKBH2
#3 ANKRD17
#4   ASF1A
#5   ASTE1
#6    ATRX
#7   ATXN3
#8   BCCIP



#or
dat2$id- 1
res-merge(dat1,dat2,all=TRUE)
subset(res,is.na(res$id))[1]
#    V2
#1   ALKBH1
#2   ALKBH2
#4  ANKRD17
#9    ASF1A
#10   ASTE1
#14    ATRX
#15   ATXN3
#16   BCCIP
A.K.




I have two files containing words. I want to print the are in file 1 but 
NOT in file 2. 
How do I go about? 

file 1: 
 ABL1 
1     ALKBH1 
2     ALKBH2 
3     ALKBH3 
4    ANKRD17 
5      APEX1 
6      APEX2 
7       APTX 
8      ASF1A 
9      ASTE1 
10       ATM 
11       ATR 
12     ATRIP 
13      ATRX 
14     ATXN3 
15     BCCIP 
16       BLM 
17     BRCA1 
18     BRCA2 


file2: 
 ALKBH2 
1    ALKBH3 
2     APEX1 
3     APEX2 
4      APLF 
5      APTX 
6       ATM 
7       ATR 
8     ATRIP 
9       BLM 
10    BRCA1 
11    BRCA2 
12    BRIP1 
13   BTBD12 
14     CCNH 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] point.in.polygon help

2013-05-23 Thread MacQueen, Don
In that case I'd definitely look more at the over() function than that
ugly bit I suggested before.

Get your fish info into a SpatialPointsDataFrame

Since your polygons are in a SpatialPolygonsDataFrame, I would expect the
data frame part has one row per basin, and it contains the basin names or
other unique identifier. Loop through the basin names, subsetting the
SpatialPolygonsDataFrame for each each basin, then use the over() function
the with the fish SpatialPointsDataFrame to tell you which fish are in the
current basin.

That's an outline; there are obviously lots of details that would be
needed.

This should work even if, for example, a single basin consists of more
than one polygon (presumably non-overlapping).

There may be a more efficient way, but I don't know it off the top of my
head.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/23/13 6:03 AM, karengrace84 kgfis...@alumni.unc.edu wrote:

I am looking at fish tagging data. I have gps coordinates of where each
fish
was tagged and released, and I have a map of 10 coastal basins of the
state
of Louisiana. I am trying to determine which basin each fish was tagged
in. 



--
View this message in context:
http://r.789695.n4.nabble.com/point-in-polygon-help-tp4667645p4667808.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strings

2013-05-23 Thread MacQueen, Don
See the
   setdiff()
function

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/23/13 11:04 AM, Robin Mjelle robinmje...@gmail.com wrote:

I have two files containing words. I want to print the are in file 1 but
NOT in file 2.
How do I go about?

file 1:
 ABL1
1 ALKBH1
2 ALKBH2
3 ALKBH3
4ANKRD17
5  APEX1
6  APEX2
7   APTX
8  ASF1A
9  ASTE1
10   ATM
11   ATR
12 ATRIP
13  ATRX
14 ATXN3
15 BCCIP
16   BLM
17 BRCA1
18 BRCA2


file2:
 ALKBH2
1ALKBH3
2 APEX1
3 APEX2
4  APLF
5  APTX
6   ATM
7   ATR
8 ATRIP
9   BLM
10BRCA1
11BRCA2
12BRIP1
13   BTBD12
14 CCNH

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strings

2013-05-23 Thread William Dunlap
You recommended
   library(sqldf)
   sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')

Using nothing but the core R packages setdiff() returns the difference between 
two sets.
   setdiff(dat1$V2, dat2$V2)
  [1] ALKBH1  ALKBH2  ANKRD17 ASF1A   ASTE1   ATRXATXN3   
BCCIP
If there are possibly duplicates in dat1$V2, so it is not a set, and you want 
the duplicates
in the result, use
   dat1$V2[ !is.element(dat1$V2, dat2$V2) ]
[1] ALKBH1  ALKBH2  ANKRD17 ASF1A   ASTE1   ATRXATXN3   
BCCIP   

 a - c(1, 2, 3, 2, 1, 4)
 b - c(1, 3)
 setdiff(a, b)
[1] 2 4
 a[ !is.element(a, b) ]
[1] 2 2 4
 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Thursday, May 23, 2013 12:05 PM
 To: R help
 Subject: Re: [R] strings
 
 Hi,
 Try:
 
 dat1- structure(list(V2 = c(ALKBH1, ALKBH2, ALKBH3, ANKRD17,
 APEX1, APEX2, APTX, ASF1A, ASTE1, ATM, ATR, ATRIP,
 ATRX, ATXN3, BCCIP, BLM, BRCA1, BRCA2)), .Names = V2, class =
 data.frame, row.names = c(NA,
 18L))
 
 
 dat2- structure(list(V2 = c(ALKBH3, APEX1, APEX2, APLF, APTX,
 ATM, ATR, ATRIP, BLM, BRCA1, BRCA2, BRIP1, BTBD12,
 CCNH)), .Names = V2, class = data.frame, row.names = c(NA,
 14L))
 
 
 library(sqldf)
 sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')
 #   V2
 #1  ALKBH1
 #2  ALKBH2
 #3 ANKRD17
 #4   ASF1A
 #5   ASTE1
 #6    ATRX
 #7   ATXN3
 #8   BCCIP
 
 
 
 #or
 dat2$id- 1
 res-merge(dat1,dat2,all=TRUE)
 subset(res,is.na(res$id))[1]
 #    V2
 #1   ALKBH1
 #2   ALKBH2
 #4  ANKRD17
 #9    ASF1A
 #10   ASTE1
 #14    ATRX
 #15   ATXN3
 #16   BCCIP
 A.K.
 
 
 
 
 I have two files containing words. I want to print the are in file 1 but
 NOT in file 2.
 How do I go about?
 
 file 1:
  ABL1
 1     ALKBH1
 2     ALKBH2
 3     ALKBH3
 4    ANKRD17
 5      APEX1
 6      APEX2
 7       APTX
 8      ASF1A
 9      ASTE1
 10       ATM
 11       ATR
 12     ATRIP
 13      ATRX
 14     ATXN3
 15     BCCIP
 16       BLM
 17     BRCA1
 18     BRCA2
 
 
 file2:
  ALKBH2
 1    ALKBH3
 2     APEX1
 3     APEX2
 4      APLF
 5      APTX
 6       ATM
 7       ATR
 8     ATRIP
 9       BLM
 10    BRCA1
 11    BRCA2
 12    BRIP1
 13   BTBD12
 14     CCNH
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strings

2013-05-23 Thread arun
#or
 dat1$V2[is.na(match(dat1$V2,dat2$V2))]
#[1] ALKBH1  ALKBH2  ANKRD17 ASF1A   ASTE1   ATRX    ATXN3  
#[8] BCCIP  
 a[is.na(match(a,b))]
#[1] 2 2 4
A.K.




- Original Message -
From: William Dunlap wdun...@tibco.com
To: arun smartpink...@yahoo.com; R help r-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 3:18 PM
Subject: RE: [R] strings

You recommended
   library(sqldf)
   sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')

Using nothing but the core R packages setdiff() returns the difference between 
two sets.
   setdiff(dat1$V2, dat2$V2)
  [1] ALKBH1  ALKBH2  ANKRD17 ASF1A   ASTE1   ATRX    ATXN3   
BCCIP
If there are possibly duplicates in dat1$V2, so it is not a set, and you want 
the duplicates
in the result, use
   dat1$V2[ !is.element(dat1$V2, dat2$V2) ]
[1] ALKBH1  ALKBH2  ANKRD17 ASF1A   ASTE1   ATRX    ATXN3   
BCCIP  

 a - c(1, 2, 3, 2, 1, 4)
 b - c(1, 3)
 setdiff(a, b)
[1] 2 4
 a[ !is.element(a, b) ]
[1] 2 2 4


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Thursday, May 23, 2013 12:05 PM
 To: R help
 Subject: Re: [R] strings
 
 Hi,
 Try:
 
 dat1- structure(list(V2 = c(ALKBH1, ALKBH2, ALKBH3, ANKRD17,
 APEX1, APEX2, APTX, ASF1A, ASTE1, ATM, ATR, ATRIP,
 ATRX, ATXN3, BCCIP, BLM, BRCA1, BRCA2)), .Names = V2, class =
 data.frame, row.names = c(NA,
 18L))
 
 
 dat2- structure(list(V2 = c(ALKBH3, APEX1, APEX2, APLF, APTX,
 ATM, ATR, ATRIP, BLM, BRCA1, BRCA2, BRIP1, BTBD12,
 CCNH)), .Names = V2, class = data.frame, row.names = c(NA,
 14L))
 
 
 library(sqldf)
 sqldf('SELECT * FROM dat1 EXCEPT SELECT * FROM dat2')
 #   V2
 #1  ALKBH1
 #2  ALKBH2
 #3 ANKRD17
 #4   ASF1A
 #5   ASTE1
 #6    ATRX
 #7   ATXN3
 #8   BCCIP
 
 
 
 #or
 dat2$id- 1
 res-merge(dat1,dat2,all=TRUE)
 subset(res,is.na(res$id))[1]
 #    V2
 #1   ALKBH1
 #2   ALKBH2
 #4  ANKRD17
 #9    ASF1A
 #10   ASTE1
 #14    ATRX
 #15   ATXN3
 #16   BCCIP
 A.K.
 
 
 
 
 I have two files containing words. I want to print the are in file 1 but
 NOT in file 2.
 How do I go about?
 
 file 1:
  ABL1
 1     ALKBH1
 2     ALKBH2
 3     ALKBH3
 4    ANKRD17
 5      APEX1
 6      APEX2
 7       APTX
 8      ASF1A
 9      ASTE1
 10       ATM
 11       ATR
 12     ATRIP
 13      ATRX
 14     ATXN3
 15     BCCIP
 16       BLM
 17     BRCA1
 18     BRCA2
 
 
 file2:
  ALKBH2
 1    ALKBH3
 2     APEX1
 3     APEX2
 4      APLF
 5      APTX
 6       ATM
 7       ATR
 8     ATRIP
 9       BLM
 10    BRCA1
 11    BRCA2
 12    BRIP1
 13   BTBD12
 14     CCNH
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet package: command meanings

2013-05-23 Thread C W
Hi List,
I have a little confused when to glmnet() vs cv.glmnet().

I know that,
glmnet(): gives the fit
cv.glment(): does the cv after the fit

I just want to get the beta coefficients after the fit, that's it!

But of all the glmnet examples I've seen, the beta coefficient is
obtained ONLY AFTER cv.glmnet().

Why is that?  Also, why is there so many more extra beta's after the fit?

Thanks,
Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in png: unable to start png() device

2013-05-23 Thread Uwe Ligges



On 23.05.2013 17:06, Ondrej Novak wrote:

Hi,
I use R 2.14.0 on Win XP Pro SP3 and it behaves same - some times.
After I draw a lot of plots (more then 200, 2 concurrent rgui processes
running in parallel) to png then I get same error message.
Bmp(), jpg(), png() - same error. Restart of Rgui helps nothing.

Solutin - restart system and voila everything is ok.

I suspect that there might be something wrong with allocation/deallocation
of Windows resources in windows() function.



R-2.14.0 is anicent, can you try this woth a recent R such as R-3.0.1 
please  and if the problem persists, please provide reproducible code so 
that we can try to reproduce in order to find the problem.


Best,
Uwe Ligges




Ondrej Novak

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: Kernel smoothing with bandwidth which varies with x

2013-05-23 Thread Uwe Ligges



On 23.05.2013 18:10, IOANNA wrote:

Hello all,

I would like to use the Nadaraya-Watson estimator assuming a Gaussian
kernel: So far I sued the
library(sm)
library(sm)
x-runif(5000)
y-rnorm(5000)
plot(x,y,col='black')
h1-h.select(x,y,method='aicc')
lines(ksmooth(x,y,bandwidth=h1))

which works fine. What if my data were clustered requiring a bandwidth that
varies with x? How can I do that?


I'd start with trying to transform x so that the bandwidth can be fixed.

Uwe Ligges








Thanks in advance,
Ioanna

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Distance calculation

2013-05-23 Thread eliza botto
Dear useRs,
i have the following data arranged in three columns

structure(c(0.492096635764151, 0.42688044914, 0.521585941816778, 
1.66472272302545, 2.61878329527404, 2.19154489521664, 0.493876245329722, 
0.4915787202584, 0.889477365620806, 0.609135860199222, 0.739201878930367, 
0.854663750519518, 2.06195904001605, 1.41493262330451, 1.35748791897328, 
1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 
0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 
2.24547399004563), .Dim = c(12L, 2L))

The distance is to be calculated by subtracting each value of each column from 
the corresponding column value in the following way
=The column values are cyclic. For example, after row 12 there is once again 
row 1. So, in a way, row 3 is more closer to row 12 than to row 8. 
= The peak value is the maximum value for any column. the values falling in 
the range of 80% of the maximum values can also be considered as maximum value 
provided they are not falling immediatly next to eachother. 
= If we plot column 1 and column 2 the peak value of column 1 is at 5th grade 
of x-axis and for column 2 its in 12th. For column 2 at x=1 the value is very 
close to that of the value at x=12 (in 80% range of it), but still it can 
considered as peak value as it is immediatly falling next to the maximum value. 
Now The peaks are moved towards eachother in a shortest possible way unless 
maximum values are under eachother
more precisely,
column 1 
1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 
1 2 3 4 5 6 7 8 9 10 11 12(max)
Now distance is measured in the following way
column 1 
1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 
12(max) 1 2 3 4 5 6 7 8 9 10 11 
asum(abs(col1-col2))
==column 1 
1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 
11 12(max) 1 2 3 4 5 6 7 8 9 10  
bsum(abs(col1-col2))==column 1 
1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 
10 11 12(max) 1 2 3 4 5 6 7 8 9 
csum(abs(col1-col2))==column 1 
1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 
9 10 11 12(max) 1 2 3 4 5 6 7 8 
dsum(abs(col1-col2))==column 1 
1 2 3 4 5(max) 6 7 8 9 10 11 12column 2 
8 9 10 11 12(max) 1 2 3 4 5 6 7 
esum(abs(col1-col2))

total distance= a+b+c+d+e
For the following two column it should work the following way

structure(c(0.948228727226247, 1.38569091844218, 0.910510759802679, 
1.25991218521949, 0.993123416952421, 0.553640392997634, 0.357487763503204, 
0.368328033777003, 0.344255688489322, 0.423679560916755, 1.32093576037521, 
3.13420679229785, 0.766278117577654, 0.751997501086888, 0.836280758630117, 
1.188156460303, 1.56771616670373, 1.15928168139479, 0.522523036011874, 
0.561678840701488, 1.11155735914479, 1.26467106348848, 1.09378883406298, 
1.17607018089421), .Dim = c(12L, 2L))
column 1 
1 2 3 4 5 6 7 8 9 10 11 12(max)column 2 
1 2 3 4 5(max) 6 7 8 9 10(max) 11 12
Now as for column 2, 10th value is closer to col1 maximum value, therefore 
distance is measured in the following way
column 1 
1 2 3 4 5 6 7 8 9 10 11 12(max)column 2 
12 1 2 3 4 5 6 7 8 9 10(max) 11
asum(abs(col1-col2))
---
column 1 
1 2 3 4 5 6 7 8 9 10 11 12(max)column 2 
11 12 1 2 3 4 5 6 7 8 9 10(max) 
bsum(abs(col1-col2))
total distance=a+b
How can i do it??
Thankyou very very much in advance
Elisa
  i have the following data arranged in three columns


structure(c(0.492096635764151, 0.42688044914, 0.521585941816778, 
1.66472272302545, 2.61878329527404, 2.19154489521664, 0.493876245329722, 
0.4915787202584, 0.889477365620806, 0.609135860199222, 0.739201878930367, 
0.854663750519518, 2.06195904001605, 1.41493262330451, 1.35748791897328, 
1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 
0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 
2.24547399004563), .Dim = c(12L, 2L))


The distance is to be calculated by subtracting each value of each column from 
the corresponding column value in the following way

=The column values are cyclic. For example, after row 12 there is once again 
row 1. So, in a way, row 3 is more closer to row 12 than to row 8. 

= The peak value is the maximum value for any column. the values falling in 
the range of 80% of the maximum values can also be considered as maximum value 
provided they are not falling immediatly next to eachother. 

= If we plot column 1 and column 2 the peak value of column 1 is at 5th grade 
of x-axis and for column 2 its in 12th. 
For column 2 at x=1 the value is very close to that of the value at x=12 (in 
80% range of it), but still it can considered as peak value as it is immediatly 
falling next to the maximum value. 
Now The peaks are moved towards eachother in a shortest possible way unless 
maximum values are under eachother
more precisely,

column 1 

1 2 3 4 5(max) 6 7 8 9 10 11 12

column 2 

1 2 3 4 5 6 7 8 9 10 11 12(max)

Now distance is measured in the following way

column 1 

1 2 3 4 5(max) 6 7 8 9 10 11 12

Re: [R] Could graph objects be stored in a two-dimensional list?

2013-05-23 Thread David Winsemius

On May 23, 2013, at 8:30 AM, jpm miao wrote:

 Hi,
 
  I have a few graph objects created by some graphic package (say, ggplot2,
 which I use frequently). Because of the existent relation between the
 graphs, I'd like to index them in two dimensions as p[1,1], p[1,2], p[2,1],
 p[2,2] for convenience.
 
  To my knowledge, the only data type capable of storing graph objects

(This will all be depending on what you do mean by graph objects.)

 (and
 any R object) is list, but unfortunately it is available in only one
 dimension.

I think both of these presumptions are incorrect.

 Could the graphs be stored in any two-dimensional data type?
 
  One remedy that comes to my mind is to build a function f so that
 f(1,1)=1
 f(1,2)=2
 f(2,1)=3
 f(2,2)=4
  With functions f and f^{-1} (inverse function of f) , the two-dimensional
 indices could be mapped to and from a set of one-dimensional indices, and
 the functions are exactly the way R numbers elements in a matrix. Does R
 have this built-in function for a m by n matrix or more generally, m*n*p
 array? (I know this function is easy to write, but just want to make sure
 whether it exists already)
 
Matrices can hold list elements:

 matrix( list(a=a), 2,2)
 [,1] [,2]
[1,] a  a 
[2,] a  a 
 matrix( list(a=a), 2,2)[1,1]
[[1]]
[1] a


And list may be nested in a regular matrix

 list( list( list(a=a), list(b=bb) ), 
   list(list(c=ccc), list(d=) ) )[[1]][[2]]
$b
[1] bb


So storing in this manner for access by an appropriately designed function 
should also be straight-forward. You could argue that the lattice-object panel 
structure depends on this fact.

 
   [[alternative HTML version deleted]]
Please learn to post  in plain text.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] order panels in xyplot by increasing slope

2013-05-23 Thread Belair, Ethan D
I am creating a few dozen multi-panel time series plots using lattice graphics 
in the lme4 package. Each panel in a given plot represents a tree. each 
multipanel plot is a particular treatment. Here's my issue: when you use 
xyplot() to plot this data it orders the panels alphabetically. I would prefer 
to have them in order of increasing slope of the regression line plotted in 
each panel. I've read everything I can find regarding the index.cond argument, 
and the best I can come up with is to manually order them to have the correct 
increasing slope order, i.e. index.cond=c(7,6,23,4,15,8,...). This would take 
inordinate amounts of time and I'm sure there is a better, more eloquent 
solution. Please help!

Sorry for the long dataset below, I'm unsure of how to create a reproducible 
example otherwise.

example.plot = xyplot(ht ~ time|tree, data=data,
type = c(r, g, p),
par.settings=simpleTheme(col=blue),
 main=abc,
 )
example.plot
data=
   tree treat site plot rx mr rxl t w l spp time  diaht
1   1 1  C-H 2002  1  1  Mn N N 14.55  ac1  9.6  74.5
2   2 1  C-H 2002  1  1  Mn N N 14.55  ac1  7.4  69.5
3   3 1  C-H 2003  1  1  Mn N N 13.34  ac1  6.0  66.7
4   4 1  C-H 2003  1  1  Mn N N 13.34  ac1  7.1  75.4
5   5 1  C-H 2003  1  1  Mn N N 13.34  ac1  7.5  57.5
6   6 1  C-H 2008  2  1  Mc N N 11.63  ac1  5.7  71.5
7   7 1  C-H 2008  2  1  Mc N N 11.63  ac1  5.2  50.0
8   8 1  C-H 2011  2  1  Mc N N 13.04  ac1  6.3  62.0
9   9 1  C-H 2011  2  1  Mc N N 13.04  ac1  6.7  60.5
10 10 1  C-H 2017  3  1   H N N 11.38  ac1 10.7  82.0
11 11 1  C-H 2017  3  1   H N N 11.38  ac1  4.4  27.0
12 12 1  C-H 2018  3  1   H N N 11.08  ac1  5.8  49.0
13 13 1  C-H 2018  3  1   H N N 11.08  ac1  4.3  64.2
14 14 1  C-H 2013  4  1 McH N N 15.09  ac1 11.4  86.0
15 15 1  C-H 2013  4  1 McH N N 15.09  ac1  7.6  87.5
16 16 1  C-H 2014  4  1 McH N N 14.17  ac1  5.8  60.1
17 17 1  C-H 2014  4  1 McH N N 14.17  ac1 11.5 100.5
18 18 1  C-H 2014  4  1 McH N N 14.17  ac1  4.7  53.2
19 19 1  C-H 2019  5  1 MnH N N 11.72  ac1  8.1  56.0
20 20 1  C-H 2019  5  1 MnH N N 11.72  ac1  7.1  56.0
21 21 1  C-H 2019  5  1 MnH N N 11.72  ac1  7.1  56.0
22 22 1  C-H 2020  5  1 MnH N N 14.71  ac1  7.0  78.2
23 23 1  C-H 2020  5  1 MnH N N 14.71  ac1  5.2  47.2
24 24 1  C-H 2020  5  1 MnH N N 14.71  ac1  7.0  83.5
595 1 1  C-H 2002  1  1  Mn N N 14.55  ac2  9.6  96.0
596 2 1  C-H 2002  1  1  Mn N N 14.55  ac2  6.0  72.0
597 3 1  C-H 2003  1  1  Mn N N 13.34  ac2  5.7  75.0
598 4 1  C-H 2003  1  1  Mn N N 13.34  ac2  7.5 101.0
599 5 1  C-H 2003  1  1  Mn N N 13.34  ac2  6.9  58.0
600 6 1  C-H 2008  2  1  Mc N N 11.63  ac2  6.0  84.0
601 7 1  C-H 2008  2  1  Mc N N 11.63  ac2  6.3  72.0
602 8 1  C-H 2011  2  1  Mc N N 13.04  ac2  7.4 101.0
603 9 1  C-H 2011  2  1  Mc N N 13.04  ac2  5.6  62.0
60410 1  C-H 2017  3  1   H N N 11.38  ac2 10.7 110.0
60511 1  C-H 2017  3  1   H N N 11.38  ac2  4.7  60.0
60612 1  C-H 2018  3  1   H N N 11.08  ac2  6.4  48.0
60713 1  C-H 2018  3  1   H N N 11.08  ac2  5.6  70.0
60814 1  C-H 2013  4  1 McH N N 15.09  ac2 11.0 116.0
60915 1  C-H 2013  4  1 McH N N 15.09  ac2  7.5 104.0
61016 1  C-H 2014  4  1 McH N N 14.17  ac2  6.5  61.0
61117 1  C-H 2014  4  1 McH N N 14.17  ac2 10.9 110.0
61218 1  C-H 2014  4  1 McH N N 14.17  ac2  5.9  50.0
61319 1  C-H 2019  5  1 MnH N N 11.72  ac2  8.1  76.0
61420 1  C-H 2019  5  1 MnH N N 11.72  ac2  7.1  82.0
61521 1  C-H 2019  5  1 MnH N N 11.72  ac2  7.1  82.0
61622 1  C-H 2020  5  1 MnH N N 14.71  ac2  7.6  98.0
61723 1  C-H 2020  5  1 MnH N N 14.71  ac2  6.1  70.0
61824 1  C-H 2020  5  1 MnH N N 14.71  ac2  8.4  95.0
11891 1  C-H 2002  1  1  Mn N N 14.55  ac3 13.0 109.0
11902 1  C-H 2002  1  1  Mn N N 14.55  ac3  9.8  77.0
11913 1  C-H 2003  1  1  Mn N N 13.34  ac3  8.0  80.0
11924 1  C-H 2003  1  1  Mn N N 13.34  ac3 13.0 113.0
11935 1  C-H 2003  1  1  Mn N N 13.34  ac3   NANA
11946 1  C-H 2008  2  1  Mc N N 11.63  ac3  7.7  89.0
11957 1  C-H 2008  2  1  Mc N N 11.63  ac3  9.5  84.0
11968 1  C-H 2011  2  1  Mc N N 13.04  ac3  6.2 122.0
11979 1  C-H 2011  2  1  Mc N N 13.04  ac3   NANA
1198   10 1  C-H 2017  3  1   H N N 11.38  ac3 11.5 104.0
1199   11 1  C-H 2017  3  1   H N N 11.38  ac3  6.1  62.0
1200   12 1  C-H 2018  3  

Re: [R] order panels in xyplot by increasing slope

2013-05-23 Thread Jim Lemon

On 05/24/2013 06:21 AM, Belair, Ethan D wrote:

example.plot = xyplot(ht ~ time|tree, data=data,
 type = c(r, g, p),
 par.settings=simpleTheme(col=blue),
  main=abc,
  )
example.plot

 ...

Hi Ethan,
This may be what you want:

panel.slope-function(panel) {
 return(diff(range(panel$y,na.rm=TRUE))/
  diff(range(panel$x,na.rm=TRUE)))
}
panel.order-
 order(unlist(lapply(example.plot$panel.args,panel.slope)))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting and Dates

2013-05-23 Thread Denis Chabot
Hi,

I am trying to understand why creating Date variables does not work if I subset 
to avoid NAs. 

I had problems creating these Date variables in my code and I thought that the 
presence of NAs was the cause. So I used a condition to avoid NAs.

It turns out that NAs are not a problem and I do not need to subset, but I'd 
like to understand why subsetting causes the problem.
The strange numbers I start with are what I get when I read an Excel sheet with 
the function read.xls() from package gdata.  

dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 
41397)
dat2 = dat1
dat2[c(5,9)]=NA
Data = data.frame(dat1,dat2)

keep1 = !is.na(Data$dat1)
keep2 = !is.na(Data$dat2)


Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 

Data
dat1  dat2  Dat1a Dat1b  Dat2a Dat2b
1  41327 41327 2013-02-22 15758 2013-02-22 15758
2  41334 41334 2013-03-01 15765 2013-03-01 15765
3  41341 41341 2013-03-08 15772 2013-03-08 15772
4  41348 41348 2013-03-15 15779 2013-03-15 15779
5  41355NA 2013-03-22 15786   NANA
6  41362 41362 2013-03-29 15793 2013-03-29 15793
7  41369 41369 2013-04-05 15800 2013-04-05 15800
8  41376 41376 2013-04-12 15807 2013-04-12 15807
9  41383NA 2013-04-19 15814   NANA
10 41390 41390 2013-04-26 15821 2013-04-26 15821
11 41397 41397 2013-05-03 15828 2013-05-03 15828

So variables Dat1b and Dat2b are not converted to Date class.


sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] gdata_2.12.0

loaded via a namespace (and not attached):
[1] gtools_2.7.0

Thanks in advance,

Denis
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] When the interaction term should be interpreted in AIC table?

2013-05-23 Thread Galina Kamorina
Hi,
I would be very graitful if someone could help me to figure out my problem.

 I used mixed-effects models to analyse my data and AIC approach for model  
selection. I am studying an effect on Labrador tea on basal diameter of spruce 
in 2 different habitats (wet and dry zones) during 3 years.
This is one of example of my AIC table:

 
  
   
   Candidate
   models
   
   
   K
   
   
   AICc
   
   
   $B$(B AICc
   
   
   AICc Wt
   
  
 
 
  
  Zone + Labrador tea + Year
  
  
  9
  
  
  -17.75
  
  
  0.00
  
  
  0.80
  
 
 
  
  Zone + Labrador tea + Year + Zone $B!_(B Labrador tea
  
  
  10
  
  
  -14.69
  
  
  3.06
  
  
  0.17
  
 
 
  
  Zone + Labrador tea + Year + Year $B!_(B Labrador tea
  
  
  12
  
  
  -11.21
  
  
  6.53
  
  
  0.03
  
 
 
  
  Zone + Labrador tea
  
  
  6
  
  
  71.14
  
  
  88.88
  
  
  0.00
  
 
 
  
  Zone + Labrador tea + Zone $B!_(B Labrador tea
  
  
  7
  
  
  73.85
  
  
  91.59
  
  
  0.00
  
 

I interpreted the main effect of zone, Labrador tea and Year. My question is 
should I interpret the interaction term  Zone $B!_(B Labrador tea  also? 
Normally I interpreted the effect of variables that have been in the models 
with $B$(B AICc  4. 
One professor said I should not interpred interaction term if the main effect 
is stronger. But at the same time I saw articles where author interpreted the 
interaction term where Akaike weight was still high.

Thank you in advance.
Galina
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting and Dates

2013-05-23 Thread arun
You could convert those columns to Date class by:


Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01)
#or
Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date))


#  dat1  dat2  Dat1a  Dat1b  Dat2a  Dat2b
#1  41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22
#2  41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01
#3  41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08
#4  41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15
#5  41355    NA 2013-03-22 2013-03-22   NA   NA
#6  41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29
#7  41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05
#8  41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12
#9  41383    NA 2013-04-19 2013-04-19   NA   NA
#10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26
#11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03
A.K.

- Original Message -
From: Denis Chabot chabot.de...@gmail.com
To: R-help@r-project.org
Cc: 
Sent: Thursday, May 23, 2013 5:35 PM
Subject: [R] subsetting and Dates

Hi,

I am trying to understand why creating Date variables does not work if I subset 
to avoid NAs. 

I had problems creating these Date variables in my code and I thought that the 
presence of NAs was the cause. So I used a condition to avoid NAs.

It turns out that NAs are not a problem and I do not need to subset, but I'd 
like to understand why subsetting causes the problem.
The strange numbers I start with are what I get when I read an Excel sheet with 
the function read.xls() from package gdata.  

dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 
41397)
dat2 = dat1
dat2[c(5,9)]=NA
Data = data.frame(dat1,dat2)

keep1 = !is.na(Data$dat1)
keep2 = !is.na(Data$dat2)


Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 

Data
    dat1  dat2      Dat1a Dat1b      Dat2a Dat2b
1  41327 41327 2013-02-22 15758 2013-02-22 15758
2  41334 41334 2013-03-01 15765 2013-03-01 15765
3  41341 41341 2013-03-08 15772 2013-03-08 15772
4  41348 41348 2013-03-15 15779 2013-03-15 15779
5  41355    NA 2013-03-22 15786       NA    NA
6  41362 41362 2013-03-29 15793 2013-03-29 15793
7  41369 41369 2013-04-05 15800 2013-04-05 15800
8  41376 41376 2013-04-12 15807 2013-04-12 15807
9  41383    NA 2013-04-19 15814       NA    NA
10 41390 41390 2013-04-26 15821 2013-04-26 15821
11 41397 41397 2013-05-03 15828 2013-05-03 15828

So variables Dat1b and Dat2b are not converted to Date class.


sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] gdata_2.12.0

loaded via a namespace (and not attached):
[1] gtools_2.7.0

Thanks in advance,

Denis
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create and read symbolic links in Windows

2013-05-23 Thread Santosh
Dear R experts,

This time I am unable create symbolic links to files as I had done last
time. I could not replicate what I had successfully tried last time (rerun
the same code without any modifications) .

I get the following error message..
[1] FALSE
Warning message:
In file.link(.file1, file2,  :
  cannot link './File1' to './file2', reason 'The specified network name is
no longer available'

The file.exists, however, results TRUE when I test for source and target
folders and the source file.. I tried with mapping of drives , relative
folder path,and nothing worked.

The R version (on 64-bit Windows 7):

 version
   _
platform   x86_64-w64-mingw32
arch   x86_64
os mingw32
system x86_64, mingw32
status
major  3
minor  0.0
year   2013
month  04
day03
svn rev62481
language   R
version.string R version 3.0.0 (2013-04-03)
nickname   Masked Marvel

Any suggestions are highly welcome!

Thanks,
Santosh




On Fri, May 3, 2013 at 11:30 AM, Santosh santosh2...@gmail.com wrote:

 Just got it right please ignore the previous posting...

 It worked!
  Prof Ripley made my day!! :) THANK YOU!


 On Fri, May 3, 2013 at 11:23 AM, Santosh santosh2...@gmail.com wrote:

 Thanks for your suggestion... I upgraded to R.3.0.0 in 64-bit Windows 7
 environment..

 This time when I use file.link..
 I get the following error message: 'Cannot create a file when that file
 already exists
 And I don't see the link.

 The other function, file.copy, correctly copies to the target location.

 Still confuse with the error msges...

 Thanks,
 Santosh


 On Thu, May 2, 2013 at 11:42 PM, Prof Brian Ripley rip...@stats.ox.ac.uk
  wrote:

 On 03/05/2013 07:33, Santosh wrote:

 Thanks for the suggestions. In windows (Windows 7, 64-bit), I couldn't
 get file.symlink to work, but file.link did return the result to be
 TRUE but at the target location, I did not see any link.

 Not sure I am missing anything more.. Hope it's nothing to do with
 administrator accounts and administrator rights... Is it something I
 should check with my system administrator?


 You may need to update your R: although the posting guide asked you to
 do that before posting.  There was a relevant bug fix in 2.15.3.


 Thanks,
 Santosh


 On Thu, May 2, 2013 at 12:22 PM, Prof Brian Ripley
 rip...@stats.ox.ac.uk mailto:rip...@stats.ox.ac.uk** wrote:

 On 02/05/2013 19:50, Santosh wrote:

 Dear Rxperts..
 Got a couple of quick q's..
 I am using R in windows environment (both 32-bit and 64-bit)
 a) Is there a way to create symbolic links to some data files?


 See ?file.symlink.  ??'symbolic link' should have got you there.

 Note that this is not very useful for files, but that is a Windows
 and not an R restriction.


   b) How do I read data from symbolic links?

 The same ways you read data from files.


 Thanks so much..
 Santosh



 --
 Brian D. Ripley, rip...@stats.ox.ac.uk mailto:
 rip...@stats.ox.ac.uk
 Professor of Applied Statistics,
 
 http://www.stats.ox.ac.uk/~__**ripley/http://www.stats.ox.ac.uk/~__ripley/

 
 http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/
 
 University of Oxford, Tel: +44 1865 272861
 tel:%2B44%201865%20272861 (self)
 1 South Parks Road, +44 1865 272866 tel:%2B44%201865%20272866
 (PA)

 Oxford OX1 3TG, UKFax: +44 1865 272595
 tel:%2B44%201865%20272595

 __**__
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 
 https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help

 
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide
 
 http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html

 
 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained, reproducible code.




 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  
 http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting and Dates

2013-05-23 Thread Denis Chabot
Thank you for the 2 methods to make the columns class Date, but I would really 
like to know why these variables were not in Date class with my code. Do you 
know?

Denis


Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit :

 You could convert those columns to Date class by:
 
 
 Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01)
 #or
 Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date))
 
 
 #  dat1  dat2  Dat1a  Dat1b  Dat2a  Dat2b
 #1  41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22
 #2  41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01
 #3  41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08
 #4  41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15
 #5  41355NA 2013-03-22 2013-03-22   NA   NA
 #6  41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29
 #7  41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05
 #8  41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12
 #9  41383NA 2013-04-19 2013-04-19   NA   NA
 #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26
 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03
 A.K.
 
 - Original Message -
 From: Denis Chabot chabot.de...@gmail.com
 To: R-help@r-project.org
 Cc: 
 Sent: Thursday, May 23, 2013 5:35 PM
 Subject: [R] subsetting and Dates
 
 Hi,
 
 I am trying to understand why creating Date variables does not work if I 
 subset to avoid NAs. 
 
 I had problems creating these Date variables in my code and I thought that 
 the presence of NAs was the cause. So I used a condition to avoid NAs.
 
 It turns out that NAs are not a problem and I do not need to subset, but I'd 
 like to understand why subsetting causes the problem.
 The strange numbers I start with are what I get when I read an Excel sheet 
 with the function read.xls() from package gdata.  
 
 dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 
 41390, 41397)
 dat2 = dat1
 dat2[c(5,9)]=NA
 Data = data.frame(dat1,dat2)
 
 keep1 = !is.na(Data$dat1)
 keep2 = !is.na(Data$dat2)
 
 
 Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
 Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
 Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
 Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 
 
 Data
 dat1  dat2  Dat1a Dat1b  Dat2a Dat2b
 1  41327 41327 2013-02-22 15758 2013-02-22 15758
 2  41334 41334 2013-03-01 15765 2013-03-01 15765
 3  41341 41341 2013-03-08 15772 2013-03-08 15772
 4  41348 41348 2013-03-15 15779 2013-03-15 15779
 5  41355NA 2013-03-22 15786   NANA
 6  41362 41362 2013-03-29 15793 2013-03-29 15793
 7  41369 41369 2013-04-05 15800 2013-04-05 15800
 8  41376 41376 2013-04-12 15807 2013-04-12 15807
 9  41383NA 2013-04-19 15814   NANA
 10 41390 41390 2013-04-26 15821 2013-04-26 15821
 11 41397 41397 2013-05-03 15828 2013-05-03 15828
 
 So variables Dat1b and Dat2b are not converted to Date class.
 
 
 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 other attached packages:
 [1] gdata_2.12.0
 
 loaded via a namespace (and not attached):
 [1] gtools_2.7.0
 
 Thanks in advance,
 
 Denis
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordered and unordered variables

2013-05-23 Thread meng
Many thanks for your detailed reply.


I'll read your mail thoroughly. Thanks!






At 2013-05-23 21:56:29,Greg Snow 538...@gmail.com wrote:

Meng,


This really comes down to what question you are trying to answer.  Before 
worrying about details of default contrasts and issues like that you first need 
to work out what is really the question of interest.  The main difference 
between declaring a variable ordered or not is the default contrasts.  Defaults 
are provided because there are many cases where which contrasts are used 
internally does not matter, so why make someone think about it.  In cases where 
the choice of contrasts matter, it is rare that any default coding is the 
correct/best choice and you should really think through what contrasts answer 
the question of interest and use those custom contrasts.


For example, to answer the question if Tension has any overall effect it does 
not matter which contrast encoding you use (as long as it is full rank), the 
test statistic and p-value for testing the whole effect will be the same.  The 
predictions of the means of groups will also be the same regardless of which 
contrasts are used (and this is often a clearer way to present/explain the 
results).


A case where the specific contrasts would matter would be if we want to see if 
we can reduce the number of groups by combining groups together, or interpolate 
to certain groups.  The treatment contrasts will test if low and medium can be 
combined (which makes sense) and if low and high can be combined (which does 
not make sense unless the first is true and in fact the overall factor is not 
significant), what makes more sense would be to compare low to medium and 
medium to high (it could be that low is different from the other 2, but med and 
high can be combined).  The polynomial contrasts give a different view, the 
quadratic term in this case tests whether the medium group is the average of 
the low group and the high group (so we could interpolate medium), this only 
makes sense if the medium tension is centered (in some sense) between the other 
2, i.e. the difference from low to medium is exactly the same as the difference 
from medium to high, but if that were the case then !
 I would expect a numerical term rather than an ordered factor.


So, to summarize, it depends on the question of interest.  For some questions 
the contrasts don't matter, in which case it does not matter, in other cases 
the correct contrasts to use are determined by the question and you should use 
the contrasts that answer that question (which are rarely a default).



On Tue, May 21, 2013 at 11:09 PM, meng laomen...@163.com wrote:
Thanks.


As to the data  warpbreaks, if I want to analysis the impact of 
tension(L,M,H) on breaks, should I order the tension or not?


Many thanks.













At 2013-05-21 20:55:18,David Winsemius dwinsem...@comcast.net wrote:

On May 20, 2013, at 10:35 PM, meng wrote:

 Hi all:
 If the explainary variables are ordinal,the result of regression is 
 different from
 unordered variables.But I can't understand the result of regression from 
 ordered
 variable.

 The data is warpbreaks,which belongs to R.

 If I use the unordered variable(tension):Levels: L M H
 The result is easy to understand:
Estimate Std. Error t value Pr(|t|)
 (Intercept)36.39   2.80  12.995   2e-16 ***
 tensionM  -10.00   3.96  -2.525 0.014717 *
 tensionH  -14.72   3.96  -3.718 0.000501 ***

 If I use the ordered variable(tension):Levels: L  M  H
 I don't know how to explain the result:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)   28.148  1.617  17.410   2e-16 ***
 tension.L-10.410  2.800  -3.718 0.000501 ***
 tension.Q  2.155  2.800   0.769 0.445182

 What's tension.L and tension.Q stands for?And how to explain the result 
 then?

Ordered factors are handled by the R regression mechanism with orthogonal 
polynomial contrasts: .L for linear and .Q for quadratic. If the term had 
4 levels there would also have been a .C (cubic) term. Treatment contrasts 
are used for unordered factors. Generally one would want to do predictions for 
explanations of the results. Trying to explain the individual coefficient 
values from polynomial contrasts is similar to and just as unproductive as 
trying to explain the individual coefficients involving interaction terms.

--

David Winsemius
Alameda, CA, USA



[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read 

Re: [R] subsetting and Dates

2013-05-23 Thread David Winsemius

On May 23, 2013, at 7:06 PM, Denis Chabot wrote:

 Thank you for the 2 methods to make the columns class Date, but I would 
 really like to know why these variables were not in Date class with my code. 
 Do you know?

I suspect that the problem lies in the dispatch to `[-.class` or `$-`. When 
the first argument is 'logical', then the first argument is not of class Date 
and so not dispatched to `[-.Date` but rather to .Primitive([-),  there 
being no `$-.logical or `[-.logical` . 

Arguably, as it were, someone should write S4 methods for `[-` and `$-` 
that would dispatch to the expected method on a signature where the second 
argument is a Date or POSIXct class. We might also then want methods for the 
other two [/$ indexing classes, numeric and character.

-- 

David.
 
 Denis
 
 
 Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit :
 
 You could convert those columns to Date class by:
 
 
 Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01)
 #or
 Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date))
 
 
 #  dat1  dat2  Dat1a  Dat1b  Dat2a  Dat2b
 #1  41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22
 #2  41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01
 #3  41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08
 #4  41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15
 #5  41355NA 2013-03-22 2013-03-22   NA   NA
 #6  41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29
 #7  41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05
 #8  41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12
 #9  41383NA 2013-04-19 2013-04-19   NA   NA
 #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26
 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03
 A.K.
 
 - Original Message -
 From: Denis Chabot chabot.de...@gmail.com
 To: R-help@r-project.org
 Cc: 
 Sent: Thursday, May 23, 2013 5:35 PM
 Subject: [R] subsetting and Dates
 
 Hi,
 
 I am trying to understand why creating Date variables does not work if I 
 subset to avoid NAs. 
 
 I had problems creating these Date variables in my code and I thought that 
 the presence of NAs was the cause. So I used a condition to avoid NAs.
 
 It turns out that NAs are not a problem and I do not need to subset, but I'd 
 like to understand why subsetting causes the problem.
 The strange numbers I start with are what I get when I read an Excel sheet 
 with the function read.xls() from package gdata.  
 
 dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 
 41390, 41397)
 dat2 = dat1
 dat2[c(5,9)]=NA
 Data = data.frame(dat1,dat2)
 
 keep1 = !is.na(Data$dat1)
 keep2 = !is.na(Data$dat2)
 
 
 Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
 Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
 Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
 Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 
 
 Data
dat1  dat2  Dat1a Dat1b  Dat2a Dat2b
 1  41327 41327 2013-02-22 15758 2013-02-22 15758
 2  41334 41334 2013-03-01 15765 2013-03-01 15765
 3  41341 41341 2013-03-08 15772 2013-03-08 15772
 4  41348 41348 2013-03-15 15779 2013-03-15 15779
 5  41355NA 2013-03-22 15786   NANA
 6  41362 41362 2013-03-29 15793 2013-03-29 15793
 7  41369 41369 2013-04-05 15800 2013-04-05 15800
 8  41376 41376 2013-04-12 15807 2013-04-12 15807
 9  41383NA 2013-04-19 15814   NANA
 10 41390 41390 2013-04-26 15821 2013-04-26 15821
 11 41397 41397 2013-05-03 15828 2013-05-03 15828
 
 So variables Dat1b and Dat2b are not converted to Date class.
 
 
 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 other attached packages:
 [1] gdata_2.12.0
 
 loaded via a namespace (and not attached):
 [1] gtools_2.7.0
 
 Thanks in advance,
 
 Denis


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting and Dates

2013-05-23 Thread arun
I guess it is due to vectorization.
vec1- as.Date(Data[,2],origin=1899-12-30)
class(vec1)
#[1] Date
 as.vector(vec1)
# [1] 15758 15765 15772 15779    NA 15793 15800 15807    NA 15821 15828


 head(as.list(vec1),2)
#[[1]]
#[1] 2013-02-22
#
#[[2]]
#[1] 2013-03-01
 head(data.frame(vec1),2)
#    vec1
#1 2013-02-22
#2 2013-03-01


unlist(as.list(vec1))
# [1] 15758 15765 15772 15779    NA 15793 15800 15807    NA 15821 15828
 

Also, please check:

http://r.789695.n4.nabble.com/as-vector-with-mode-quot-list-quot-and-POSIXct-td4667533.html

A.K.

- Original Message -
From: Denis Chabot chabot.de...@gmail.com
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Thursday, May 23, 2013 10:06 PM
Subject: Re: [R] subsetting and Dates

Thank you for the 2 methods to make the columns class Date, but I would really 
like to know why these variables were not in Date class with my code. Do you 
know?

Denis


Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit :

 You could convert those columns to Date class by:
 
 
 Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01)
 #or
 Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date))
 
 
 #  dat1  dat2      Dat1a      Dat1b      Dat2a      Dat2b
 #1  41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22
 #2  41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01
 #3  41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08
 #4  41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15
 #5  41355    NA 2013-03-22 2013-03-22       NA       NA
 #6  41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29
 #7  41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05
 #8  41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12
 #9  41383    NA 2013-04-19 2013-04-19       NA       NA
 #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26
 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03
 A.K.
 
 - Original Message -
 From: Denis Chabot chabot.de...@gmail.com
 To: R-help@r-project.org
 Cc: 
 Sent: Thursday, May 23, 2013 5:35 PM
 Subject: [R] subsetting and Dates
 
 Hi,
 
 I am trying to understand why creating Date variables does not work if I 
 subset to avoid NAs. 
 
 I had problems creating these Date variables in my code and I thought that 
 the presence of NAs was the cause. So I used a condition to avoid NAs.
 
 It turns out that NAs are not a problem and I do not need to subset, but I'd 
 like to understand why subsetting causes the problem.
 The strange numbers I start with are what I get when I read an Excel sheet 
 with the function read.xls() from package gdata.  
 
 dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 
 41390, 41397)
 dat2 = dat1
 dat2[c(5,9)]=NA
 Data = data.frame(dat1,dat2)
 
 keep1 = !is.na(Data$dat1)
 keep2 = !is.na(Data$dat2)
 
 
 Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
 Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
 Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
 Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 
 
 Data
     dat1  dat2      Dat1a Dat1b      Dat2a Dat2b
 1  41327 41327 2013-02-22 15758 2013-02-22 15758
 2  41334 41334 2013-03-01 15765 2013-03-01 15765
 3  41341 41341 2013-03-08 15772 2013-03-08 15772
 4  41348 41348 2013-03-15 15779 2013-03-15 15779
 5  41355    NA 2013-03-22 15786       NA    NA
 6  41362 41362 2013-03-29 15793 2013-03-29 15793
 7  41369 41369 2013-04-05 15800 2013-04-05 15800
 8  41376 41376 2013-04-12 15807 2013-04-12 15807
 9  41383    NA 2013-04-19 15814       NA    NA
 10 41390 41390 2013-04-26 15821 2013-04-26 15821
 11 41397 41397 2013-05-03 15828 2013-05-03 15828
 
 So variables Dat1b and Dat2b are not converted to Date class.
 
 
 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base    
 
 other attached packages:
 [1] gdata_2.12.0
 
 loaded via a namespace (and not attached):
 [1] gtools_2.7.0
 
 Thanks in advance,
 
 Denis
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Download data

2013-05-23 Thread Christofer Bogaso
Hello again,

I need to download 'WTI - Cushing, Oklahoma' from '
http://www.eia.gov/dnav/pet/pet_pri_spt_s1_d.htm' which is available under
the column 'View
History'

While I can get the data manually, however I was looking for some R
implementation which can directly download data into R.

Can somebody point me how to achieve that?

Thanks for your help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding rows without loops

2013-05-23 Thread Adeel Amin
Rainer...I can't believe this did the trick.  You're a genius.  Thank you
sir.


On Thu, May 23, 2013 at 7:07 AM, Rainer Schuermann 
rainer.schuerm...@gmx.net wrote:

 Using the data generated with your code below, does

 rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] )
 DF1 - DF1[ order( DF1$X.DATE, DF1$X.TIME ), ]

 do the job?

 Rgds,
 Rainer




 On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote:
  Thank you Blaser:
 
  This is the exact solution I came up with but when comparing 8M rows
 even on
  an 8G machine, one runs out of memory.  To run this effectively, I have
 to
  break the DF into smaller DFs, loop through them and then do a massive
  rmerge at the end.  That's what takes 8+ hours to compute.
 
  Even the bigmemory package is causing OOM issues.
 
  -Original Message-
  From: Blaser Nello [mailto:nbla...@ispm.unibe.ch]
  Sent: Thursday, May 23, 2013 12:15 AM
  To: Adeel Amin; r-help@r-project.org
  Subject: RE: [R] adding rows without loops
 
  Merge should do the trick. How to best use it will depend on what you
  want to do with the data after.
  The following is an example of what you could do. This will perform
  best, if the rows are missing at random and do not cluster.
 
  DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
  VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
  DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
  VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
 
  DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE)
 
  while(any(is.na(DFm))){
if (any(is.na(DFm[1,]))) stop(Complete first row required!)
ind - which(is.na(DFm), arr.ind=TRUE)
prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2)
DFm[is.na(DFm)] - DFm[prind]
  }
  DFm
 
  Best,
  Nello
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On Behalf Of Adeel Amin
  Sent: Donnerstag, 23. Mai 2013 07:01
  To: r-help@r-project.org
  Subject: [R] adding rows without loops
 
  I'm comparing a variety of datasets with over 4M rows.  I've solved this
  problem 5 different ways using a for/while loop but the processing time
  is murder (over 8 hours doing this row by row per data set).  As such
  I'm trying to find whether this solution is possible without a loop or
  one in which the processing time is much faster.
 
  Each dataset is a time series as such:
 
  DF1:
 
  X.DATE X.TIME VALUE VALUE2
  1 01052007   020037 29
  2 01052007   030042 24
  3 01052007   040045 28
  4 01052007   050045 27
  5 01052007   070045 35
  6 01052007   080042 32
  7 01052007   090045 32
  ...
  ...
  ...
  n
 
  DF2
 
  X.DATE X.TIME VALUE VALUE2
  1 01052007   020037 29
  2 01052007   030042 24
  3 01052007   040045 28
  4 01052007   050045 27
  5 01052007   060045 35
  6 01052007   070042 32
  7 01052007   080045 32
 
  ...
  ...
  n+4000
 
  In other words there are 4000 more rows in DF2 then DF1 thus the
  datasets are of unequal length.
 
  I'm trying to ensure that all dataframes have the same number of X.DATE
  and X.TIME entries.  Where they are missing, I'd like to insert a new
  row.
 
  In the above example, when comparing DF2 to DF1, entry 01052007 0600
  entry is missing in DF1.  The solution would add a row to DF1 at the
  appropriate index.
 
  so new dataframe would be
 
 
  X.DATE X.TIME VALUE VALUE2
  1 01052007   020037 29
  2 01052007   030042 24
  3 01052007   040045 28
  4 01052007   050045 27
  5 01052007   060045 27
  6 01052007   070045 35
  7 01052007   080042 32
  8 01052007   090045 32
 
  Value and Value2 would be the same as row 4.
 
  Of course this is simple to accomplish using a row by row analysis but
  with of 4M rows the processing time destroying and rebinding the
  datasets is very time consuming and I believe highly un-R'ish.  What am
  I missing?
 
  Thanks!
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative 

Re: [R] subsetting and Dates

2013-05-23 Thread David Winsemius

On May 23, 2013, at 7:56 PM, arun wrote:

 I guess it is due to vectorization.

The concept of vectorization is much broader than the activities of 
`as.vector`, but it needs a specific functional mechanism to be considered an 
explanation.

 vec1- as.Date(Data[,2],origin=1899-12-30)
 class(vec1)
 #[1] Date
  as.vector(vec1)
 # [1] 15758 15765 15772 15779NA 15793 15800 15807NA 15821 15828

It is certainly true that `as.vector` could unclass a Date-classed vector, but 
why do you believe this has anything to do with how `$-` returns its 
functional result? Setting `trace` on `as.vector` does not result in any signal 
suggesting that it was called:

 trace('as.vector')
 Data$Dat1b = as.Date(Data[ ,dat1], origin=1899-12-30)

# Nothing

 trace( .Primitive($-) )
 Data$Dat1b = as.Date(Data[ ,dat1], origin=1899-12-30)
trace: `$-`(`*tmp*`, Dat1b, value = c(15758, 15765, 15772, 15779, 15786, 
15793, 15800, 15807, 15814, 15821, 15828))


 
  head(as.list(vec1),2)
 #[[1]]
 #[1] 2013-02-22
 #
 #[[2]]
 #[1] 2013-03-01
  head(data.frame(vec1),2)
 #vec1
 #1 2013-02-22
 #2 2013-03-01
 
 
 unlist(as.list(vec1))
 # [1] 15758 15765 15772 15779NA 15793 15800 15807NA 15821 15828
  
 
 Also, please check:
 
 http://r.789695.n4.nabble.com/as-vector-with-mode-quot-list-quot-and-POSIXct-td4667533.html

Interesting but I fail to see the connection to this instance other than R 
behaving somewhat differently than we might at one time have expected.

-- 
David.


 
 A.K.
 
 - Original Message -
 From: Denis Chabot chabot.de...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Thursday, May 23, 2013 10:06 PM
 Subject: Re: [R] subsetting and Dates
 
 Thank you for the 2 methods to make the columns class Date, but I would 
 really like to know why these variables were not in Date class with my code. 
 Do you know?
 
 Denis
 
 
 Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit :
 
 You could convert those columns to Date class by:
 
 
 Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01)
 #or
 Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date))
 
 
 #  dat1  dat2  Dat1a  Dat1b  Dat2a  Dat2b
 #1  41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22
 #2  41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01
 #3  41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08
 #4  41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15
 #5  41355NA 2013-03-22 2013-03-22   NA   NA
 #6  41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29
 #7  41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05
 #8  41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12
 #9  41383NA 2013-04-19 2013-04-19   NA   NA
 #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26
 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03
 A.K.
 
 - Original Message -
 From: Denis Chabot chabot.de...@gmail.com
 To: R-help@r-project.org
 Cc: 
 Sent: Thursday, May 23, 2013 5:35 PM
 Subject: [R] subsetting and Dates
 
 Hi,
 
 I am trying to understand why creating Date variables does not work if I 
 subset to avoid NAs. 
 
 I had problems creating these Date variables in my code and I thought that 
 the presence of NAs was the cause. So I used a condition to avoid NAs.
 
 It turns out that NAs are not a problem and I do not need to subset, but I'd 
 like to understand why subsetting causes the problem.
 The strange numbers I start with are what I get when I read an Excel sheet 
 with the function read.xls() from package gdata.  
 
 dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 
 41390, 41397)
 dat2 = dat1
 dat2[c(5,9)]=NA
 Data = data.frame(dat1,dat2)
 
 keep1 = !is.na(Data$dat1)
 keep2 = !is.na(Data$dat2)
 
 
 Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
 Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
 Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
 Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 
 
 Data
  dat1  dat2  Dat1a Dat1b  Dat2a Dat2b
 1  41327 41327 2013-02-22 15758 2013-02-22 15758
 2  41334 41334 2013-03-01 15765 2013-03-01 15765
 3  41341 41341 2013-03-08 15772 2013-03-08 15772
 4  41348 41348 2013-03-15 15779 2013-03-15 15779
 5  41355NA 2013-03-22 15786   NANA
 6  41362 41362 2013-03-29 15793 2013-03-29 15793
 7  41369 41369 2013-04-05 15800 2013-04-05 15800
 8  41376 41376 2013-04-12 15807 2013-04-12 15807
 9  41383NA 2013-04-19 15814   NANA
 10 41390 41390 2013-04-26 15821 2013-04-26 15821
 11 41397 41397 2013-05-03 15828 2013-05-03 15828
 
 So variables Dat1b and Dat2b are not converted to Date class.
 
 
 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base