date:20110206

Re: [R] different results in MASS's mca and SAS's corresp

2011-02-06 Thread Mark Difford


 When I came to David's comment, I understood the theory, but not the 
 numbers in his answer.  I wanted to see the MASS mca answers match 
 up with SAS, and the example did not (yet).

I am inclined to write, O yea of little faith. David showed perfectly well
that when the results of the two algorithms are put on the same scale
(z-scored) then they are identical to 4th decimal place (except for a change
of sign in the second dimension, which is of no import).

Surely that is miracle-enough?

Mark.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/different-results-in-MASS-s-mca-and-SAS-s-corresp-tp3261494p3262600.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Legend outside the plot? xpd?

2011-02-06 Thread David Winsemius



On Feb 6, 2011, at 2:41 AM, Matt Cooper wrote:


Hi All,

BG: Will try be brief. I'd like 3 graphs on a page (below each other
mfrow=c(3,1)), saved to pdf. The three plot data on the same subject  
so I'm

having one legend, to the right of the center graph. I'm using
mar=c(5,15,4,15) to bring the sides in so that the graphs are square  
and not
stretched wide. To have the graph to the side I'm thinking xpd=T.  
Each graph

has a number of points and lines overlaid, so plot
();lines(),lines(),lines() etc.

The data is somewhat of a subset of a larger set, so I'm limiting  
what is

being displayed. The lines plot trends of the wider data.

P: With xpd=T, the lines are going right out of the graph boxes to  
the outer
limit of the plot boundaries, as would be the intended behaviour of  
xpd.

Goes without saying that this is undesirable.

Q: Is there a better way to achieve what I want? At this stage I  
have xpd=T
in the par(), and xpd=F in the plot() commands and all the line  
commands,

just so I can get the legend to actually print...


Why not just use xpd=T inside legend()?



Aside: Given the way I've done this, the lines() are all clipped  
RIGHT at
the limit of the plot box. So given the plot is called first these  
lines
leave little coloured dashes on the black of the plot box. To sort  
this I've

called box() at the end. This all seems very redundant?

Thanks
Matt


--
mattcst...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Legend outside the plot? xpd?

2011-02-06 Thread Matt Cooper

On Sun, Feb 6, 2011 at 3:56 PM, David Winsemius dwinsem...@comcast.netwrote:


 Why not just use xpd=T inside legend()?


 David Winsemius, MD
 West Hartford, CT


I had a feeling I'd look stupid after sending that. Thanks, works great. For
some reason I thought if xpd would have to be set to T at the par() level to
ever be able to get anything outside of a plot square. Alas.

Thanks David
Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] manipulate dataframe

2011-02-06 Thread André de Boer

Hello,

Can someone give me hint to change a data.frame.
I want to split a column in more columns depending on the value of a other
column.
Thanks for the reaction,
Andre

Example:
 dat
  x1 x2
1  1  a
2  1  b
3  1  c
4  2  d
5  2  e
6  2  f
7  3  g
8  3  h
9  3  i

in

 dur
  d1 d2 d3
1  a  d  g
2  b  e  h
3  c  f  i

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] anova() interpretation and error message

2011-02-06 Thread Jinsong Zhao


Hi there,

I have a data frame as listed below:

 Ca.P.Biomass.A
 P   Biomass
1 334.5567 0.287
2 737.5400 0.571
3 894.5300 0.639
4 782.3800 0.5836667
5 857.5900 0.600
6 829.2700 0.588

I have fit the data using logistic, Michaelis–Menten, and linear model, 
they all give significance.


 fm1 - nls(Biomass~SSlogis(P, phi1, phi2, phi3), data=Ca.P.Biomass.A)
 fm2 - nls(Biomass~SSmicmen(P, phi1, phi2), data=Ca.P.Biomass.A)
 fm3 - lm(Biomass~P, data = Ca.P.Biomass.A)

I hope to compare the difference among the three models, and I using 
anova(). As for the example here, the three models seem not have 
significant difference. However, I am confused by the negative df in the 
following ANOVA table. And my question is how to interpret the results, 
if the Pr  0.05.


 anova(fm1,fm2,fm3)
Analysis of Variance Table

Model 1: Biomass ~ SSlogis(P, phi1, phi2, phi3)
Model 2: Biomass ~ SSmicmen(P, phi1, phi2)
Model 3: Biomass ~ P
  Res.Df Res.Sum Sq Df  Sum Sq F value Pr(F)
1  3 0.00063741
2  4 0.00087249 -1 -0.00023508  1.1064 0.3701
3  4 0.00142751  0  0.

And when the argument position changed, the anova() give different 
results. It seems the anova() compare the first model with all other models.


 anova(fm2,fm1,fm3)
Analysis of Variance Table

Model 1: Biomass ~ SSmicmen(P, phi1, phi2)
Model 2: Biomass ~ SSlogis(P, phi1, phi2, phi3)
Model 3: Biomass ~ P
  Res.Df Res.Sum Sq Df  Sum Sq F value Pr(F)
1  4 0.00087249
2  3 0.00063741  1  0.00023508  1.1064 0.3701
3  4 0.00142751 -1 -0.00079010  3.7187 0.1494

When I put the fm3, a linear model, in the first position, and two nls 
model following it, anova() give the following error message. It seems 
abnormal.


 anova(fm3,fm1,fm2)
Analysis of Variance Table

Response: Biomass
  Df   Sum Sq  Mean Sq F valuePr(F)
P  1 0.081163 0.081163  227.43 0.0001127 ***
Residuals  4 0.001428 0.000357
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In anova.lmlist(object, ...) :
  models with response c(NULL, NULL) removed because response 
differs from model 1


Any suggestions and comments will be really appreciated. Thanks in advance.

Regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] boot() versus loop, and statistics option

2011-02-06 Thread Sascha Vieweg


Hello R users

I am quite new to bootstrapping. Now, having some data x,

R: set.seed(1234)
R: x - runif(300)

I want to bootstrap simple statistics, mean and quantiles (.025, 
.975). Currently, I run a loop


R: res - as.data.frame(matrix(ncol = 3, dimnames = list(NULL,
...c(M, Lo, Hi
R: for (i in 1:100) {
...y - x[sample(1:length(x), length(x), repl = T)]
...res[i, ] - c(mean(y), quantile(y, c(0.025, 0.975)))
...}

and then apply mean()

R: apply(res, 2, mean)
 M Lo Hi
0.49377715 0.03089873 0.98120235

to get the indices of interest.

I found the package 'boot' with the function of the same name. I 
tried to replicate my tiny simulation using this code:


R: library(boot)
R: myfun - function(x) {
...return(c(mean(x), quantile(x, c(0.025, 0.975
...}
R: boot(x, myfun, 100, sim = parametric)
PARAMETRIC BOOTSTRAP

Call:
boot(data = x, statistic = myfun, R = 100, sim = parametric)

Bootstrap Statistics :
  original  biasstd. error
t1* 0.48925194   0   0
t2* 0.02806586   0   0
t3* 0.98335435   0   0

The outcome looks quite similar to what my loop returned, so 
that would be fine. Yet, there is three things I don't understand:


(1) I have to use the option 'sim=parametric'. If I don't use 
this option the function (provided via the statistic option) 
requires a second argument, which -- according to '?boot' will be 
a vector of indices, frequencies or weights which define the 
bootstrap sample. What is that? Or is my simulation simply 
parametric? Why?


(2) What are the advantages and/or disadvantages of 'boot()' over 
my loop?


(3) Can I in principle use 'boot()' to return all of the 100 
different data vectors used in the loop, or does 'boot()' by 
default return already-calculated statistics?


Thanks for hints and help, *S*


--
Sascha Vieweg, saschav...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Confidence interval based on MLE

2011-02-06 Thread Jinsong Zhao


Hi there,

I have fitted a sample (with size 20) to a normal and/or logistic 
distribution using fitdistr() in MASS or fitdist() in fitdistrplus 
package. It's easy to get the parameter estimates. Now, I hope to report 
the confidence interval for those parameter estimates. However, I don't 
find a function that could give the confidence interval in R.


I hope to write a function, however, I don't find some detailed 
information on the CI based on MLE. Would you please to give me some 
hints on the CI calculation based on MLE?


Any suggestions will be really appreciated. Thanks in advance.

Regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] boot() versus loop, and statistics option

2011-02-06 Thread Prof Brian Ripley


Package boot is support software for a book: have you consulted it?
It answers all your questions, and has copious examples.

On Sun, 6 Feb 2011, Sascha Vieweg wrote:


Hello R users

I am quite new to bootstrapping. Now, having some data x,

R: set.seed(1234)
R: x - runif(300)

I want to bootstrap simple statistics, mean and quantiles (.025, .975). 
Currently, I run a loop


R: res - as.data.frame(matrix(ncol = 3, dimnames = list(NULL,
...c(M, Lo, Hi
R: for (i in 1:100) {
...y - x[sample(1:length(x), length(x), repl = T)]
...res[i, ] - c(mean(y), quantile(y, c(0.025, 0.975)))
...}

and then apply mean()

R: apply(res, 2, mean)
M Lo Hi
0.49377715 0.03089873 0.98120235

to get the indices of interest.

I found the package 'boot' with the function of the same name. I tried to 
replicate my tiny simulation using this code:


R: library(boot)
R: myfun - function(x) {
...return(c(mean(x), quantile(x, c(0.025, 0.975
...}
R: boot(x, myfun, 100, sim = parametric)
PARAMETRIC BOOTSTRAP

Call:
boot(data = x, statistic = myfun, R = 100, sim = parametric)

Bootstrap Statistics :
 original  biasstd. error
t1* 0.48925194   0   0
t2* 0.02806586   0   0
t3* 0.98335435   0   0

The outcome looks quite similar to what my loop returned, so that would be 
fine. Yet, there is three things I don't understand:


(1) I have to use the option 'sim=parametric'. If I don't use this option 
the function (provided via the statistic option) requires a second argument, 
which -- according to '?boot' will be a vector of indices, frequencies or 
weights which define the bootstrap sample. What is that? Or is my simulation 
simply parametric? Why?


(2) What are the advantages and/or disadvantages of 'boot()' over my loop?

(3) Can I in principle use 'boot()' to return all of the 100 different data 
vectors used in the loop, or does 'boot()' by default return 
already-calculated statistics?


Thanks for hints and help, *S*


--
Sascha Vieweg, saschav...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] manipulate dataframe

2011-02-06 Thread jim holtman

Here is one way of doing it:

 x
  x1 x2 row
1  1  a   1
2  1  b   2
3  1  c   3
4  2  d   1
5  2  e   2
6  2  f   3
7  3  g   1
8  3  h   2
9  3  i   3
 # create indices for new table
 x$row - ave(x$x1, x$x1, FUN=seq_along)
 # create output matrix
 result - matrix('', max(x$row), max(x$x1))
 result[cbind(x$row, x$x1)] - x$x2
 result
 [,1] [,2] [,3]
[1,] a  d  g
[2,] b  e  h
[3,] c  f  i


On Sun, Feb 6, 2011 at 6:13 AM, André de Boer rnie...@gmail.com wrote:
 Hello,

 Can someone give me hint to change a data.frame.
 I want to split a column in more columns depending on the value of a other
 column.
 Thanks for the reaction,
 Andre

 Example:
 dat
  x1 x2
 1  1  a
 2  1  b
 3  1  c
 4  2  d
 5  2  e
 6  2  f
 7  3  g
 8  3  h
 9  3  i

 in

 dur
  d1 d2 d3
 1  a  d  g
 2  b  e  h
 3  c  f  i

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] manipulate dataframe

2011-02-06 Thread Patrick Hausmann


Hi André,

try this:

df1 - data.frame(x1 = rep(1:3, each=3), x2=letters[1:9])

dfs - split(df1, df1$x1)

df2 - data.frame(sapply(dfs, FUN=[[, x2))
colnames(df2) - paste(d, unique(df1$x1), sep=)
df2

HTH
Patrick


Am 06.02.2011 12:13, schrieb André de Boer:

Hello,

Can someone give me hint to change a data.frame.
I want to split a column in more columns depending on the value of a other
column.
Thanks for the reaction,
Andre

Example:

dat

   x1 x2
1  1  a
2  1  b
3  1  c
4  2  d
5  2  e
6  2  f
7  3  g
8  3  h
9  3  i

in


dur

   d1 d2 d3
1  a  d  g
2  b  e  h
3  c  f  i

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence interval based on MLE

2011-02-06 Thread Ben Bolker

Jinsong Zhao jszhao at yeah.net writes:

 
 Hi there,
 
 I have fitted a sample (with size 20) to a normal and/or logistic 
 distribution using fitdistr() in MASS or fitdist() in fitdistrplus 
 package. It's easy to get the parameter estimates. Now, I hope to report 
 the confidence interval for those parameter estimates. However, I don't 
 find a function that could give the confidence interval in R.
 
 I hope to write a function, however, I don't find some detailed 
 information on the CI based on MLE. Would you please to give me some 
 hints on the CI calculation based on MLE?

   Well, for the normal distribution I believe that the standard-error-
based confidence intervals are the same as those based on the MLE,
but in general I would suggest something along these lines:

 library(bbmle)
 z - rnorm(20)
 m - mle2(z~dnorm(mean=mu,sd=sd),start=list(mu=0,sd=1),data=data.frame(z))
Warning message:
In dnorm(x, mean, sd, log) : NaNs produced
 confint(m)
Profiling...
 2.5 %   97.5 %
mu -0.07880835 0.985382
sd  0.87314467 1.633600

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] manipulate dataframe

2011-02-06 Thread Victor F Seabra


   another option is to use reshape()
   x-data.frame(x1=rep(1:3,each=3),x2=letters[1:9])
   x$id-rep(1:3,3)
   dur-reshape(x,timevar=x1,idvar=id,direction=wide)
   dur-dur[,-1]
   colnames(dur) - paste(d, unique(x$x1), sep=)
   dur
   cheers, Victor
 _

   Em 06/02/2011 12:50, Patrick Hausmann  patrick.hausm...@uni-bremen.de 
   escreveu:
   Hi AndrÃ©,
   try this:
   df1 - data.frame(x1 = rep(1:3, each=3), x2=letters[1:9])
   dfs - split(df1, df1$x1)
   df2 - data.frame(sapply(dfs, FUN=[[, x2))
   colnames(df2) - paste(d, unique(df1$x1), sep=)
   df2
   HTH
   Patrick
   Am 06.02.2011 12:13, schrieb AndrÃ© de Boer:
Hello,
   
Can someone give me hint to change a data.frame.
I want to split a column in more columns depending on the value of a other
column.
Thanks for the reaction,
Andre
   
Example:
dat
x1 x2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 3 g
8 3 h
9 3 i
   
in
   
dur
d1 d2 d3
1 a d g
2 b e h
3 c f i
   
[[alternative HTML version deleted]]
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] anova() interpretation and error message

2011-02-06 Thread Peter Ehlers


See comments inline.

On 2011-02-06 03:17, Jinsong Zhao wrote:

Hi there,

I have a data frame as listed below:

Ca.P.Biomass.A
   P   Biomass
1 334.5567 0.287
2 737.5400 0.571
3 894.5300 0.639
4 782.3800 0.5836667
5 857.5900 0.600
6 829.2700 0.588

I have fit the data using logistic, Michaelis–Menten, and linear model,
they all give significance.

fm1- nls(Biomass~SSlogis(P, phi1, phi2, phi3), data=Ca.P.Biomass.A)
fm2- nls(Biomass~SSmicmen(P, phi1, phi2), data=Ca.P.Biomass.A)
fm3- lm(Biomass~P, data = Ca.P.Biomass.A)

I hope to compare the difference among the three models, and I using
anova(). As for the example here, the three models seem not have
significant difference. However, I am confused by the negative df in the
following ANOVA table. And my question is how to interpret the results,
if the Pr  0.05.

anova(fm1,fm2,fm3)
Analysis of Variance Table

Model 1: Biomass ~ SSlogis(P, phi1, phi2, phi3)
Model 2: Biomass ~ SSmicmen(P, phi1, phi2)
Model 3: Biomass ~ P
Res.Df Res.Sum Sq Df  Sum Sq F value Pr(F)
1  3 0.00063741
2  4 0.00087249 -1 -0.00023508  1.1064 0.3701
3  4 0.00142751  0  0.


Read the Details section in help(anova.lm) to see why
you get negative DF values. The reason is simply that
3 - 4 =  -1. Compare

 anova(fm1, fm2)

with

 anova(fm2, fm1)

That's why I prefer to list my models in order of increasing
complexity. Note also that for interpretation models should
be nested. Yours aren't.



And when the argument position changed, the anova() give different
results. It seems the anova() compare the first model with all other models.

anova(fm2,fm1,fm3)
Analysis of Variance Table

Model 1: Biomass ~ SSmicmen(P, phi1, phi2)
Model 2: Biomass ~ SSlogis(P, phi1, phi2, phi3)
Model 3: Biomass ~ P
Res.Df Res.Sum Sq Df  Sum Sq F value Pr(F)
1  4 0.00087249
2  3 0.00063741  1  0.00023508  1.1064 0.3701
3  4 0.00142751 -1 -0.00079010  3.7187 0.1494

When I put the fm3, a linear model, in the first position, and two nls
model following it, anova() give the following error message. It seems
abnormal.


Not so strange. If you look at

 methods(anova)

you will see that there are different functions for
different classes of model. So, if your first model is
of class nls, then anova.nls will be used; for a
first model of class lm anova.lm is used. The main
thing is not to mix models of different class. If you
want a linear model to compare with the nonlinear
ones, use nls() to estimate the linear model. But you
still have the problem of non-nestedness.



anova(fm3,fm1,fm2)
Analysis of Variance Table

Response: Biomass
Df   Sum Sq  Mean Sq F valuePr(F)
P  1 0.081163 0.081163  227.43 0.0001127 ***
Residuals  4 0.001428 0.000357
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In anova.lmlist(object, ...) :
models with response c(NULL, NULL) removed because response
differs from model 1

Any suggestions and comments will be really appreciated. Thanks in advance.



After the above comments, I have one more:

You have 6 observations and you want to fit relatively
complex models whose estimated coefficients will be
*extremely* sensitive to *one* of your observations
(I assume that you have looked at a plot of the data).
The only way this could make any sense is if there is
well established theory that specifies one particular
model and you want to check that your data are at
least not obviously inconsistent with that theory.
Fishing through several models with six data points
makes no sense. I think that the best you can conclude
from your data is that Biomass probably increases with P.

Peter Ehlers


Regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] random interaction effect in lmer

2011-02-06 Thread Federico Bonofiglio

Hi dears while modeling an interaction random effect in lmer i receive the
instantaneous error message

 ldlM4-lmer(ldl~rt*cd4+age+rf+pharmac+factor(hcv)+
+ hivdur+(rt:cd4|id),na.action=na.omit,REML=F)
*Warning message:
In mer_finalize(ans) : false convergence (8)
*
I think the matter lies in syntax, 'cause i sistematically receive the same
message even when changing response...
PS: the above model runs quite well in lme and only rarely I recive :
*
Errore in lme.formula(fixed = ldl ~ rt + cd4 + age + rf + pharmac +
factor(hcv) +  :
  nlminb problem, convergence error code = 1
  message = iteration limit reached without convergence (9)
*
..reason for which I switch  into lmer

Thank u in advance foe any hints...;)
-- 
*Little u can do against ignorance,it will always disarm u:
is the 2nd principle of thermodinamics made manifest, ...entropy in
expansion.**But setting order is the real quest 4 truth, ..and the
mission of a (temporally) wise dude.
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiplying elements of vectors

2011-02-06 Thread Mariana Martinez-Morales

Super Phil!!
Thank you very much!!

Mariana

On Sat, Feb 5, 2011 at 8:54 PM, Phil Spector spec...@stat.berkeley.edu wrote:
 If the seq(5,205) was a typo, and should have been
 seq(5,20,5), then what you're looking for is the outer
 product of x and y:

 x = seq(5,20,5)
 y = seq(5,20,5)
 x %o% y

     [,1] [,2] [,3] [,4]
 [1,]   25   50   75  100
 [2,]   50  100  150  200
 [3,]   75  150  225  300
 [4,]  100  200  300  400

 outer(x,y)

     [,1] [,2] [,3] [,4]
 [1,]   25   50   75  100
 [2,]   50  100  150  200
 [3,]   75  150  225  300
 [4,]  100  200  300  400

 The outer() function will accepts a FUN= argument which defaults to '*'.

                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                         spec...@stat.berkeley.edu


 On Sat, 5 Feb 2011, Mariana Martinez-Morales wrote:

 Hi guys:

 Sorry if this question is very basic. I’m learning basic matrix and
 vectors multiplication to develop a population matrix model for
 plants. I’m trying to multiply the elements of two vectors (each of
 the “x” values by each of the “y” values) to obtain a square matrix of
 xy values.

 f.e.
 x-seq(5,205)
 y-seq(5,20,5)
 stages-c(“Sdl”, “Juv”, “Ad1”, “Ad2”)

 If I just multiply xy as a matrix
 xy-matrix(x,y,nrow=4,ncol=4,dimnames=list(stages,stages))

 I obtain this

 xy
       Sdl Juv A1 A2
 Sdl   5  10 15 20
 Juv   5  10 15 20
 A1    5  10 15 20
 A2    5  10 15 20

 but what I want to obtain is this matrix

       Sdl    Juv     A1    A2
 Sdl   25    50      75     100
 Juv   50    100    150   200
 A1    75    50      225   300
 A2    100  200    300   400

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with integrating R and c/c++

2011-02-06 Thread Rohit Pandey

Hi,

I have been using R for close to two years now and have grown quite
comfortable with the language. I am presently trying to implement an
optimization routine in R (Newton Rhapson). I have some R functions that
calculate the gradient and hessian (pre requisite matrices) fairly
efficiently. Now, I have to call this function iteratively until some
convergance criterion is reached. I think the standard method of doing this
in most programming languages is a while loop. However, I know R can get
pretty slow when you use loops. In order to make this efficient, I want to
transfer this part of my code to a more efficient programming language like
c++ or c. However, I have been trying to learn this all day without any
luck. I found a package called Rcpp that makes this easier. However, it
seems some functional knowledge of writing R packages is a pre requisite. I
tried to follow the standard manual for doing this, but could not find a
simple example to get me started. I know I am supposed to make a cpp file
and put it some where before it can be called from R, but I'm confused as to
how this can be done.

My requirement is to start with a parameter vector, update it according to
the gradient and hessian, check if the parameter satisfies some convergance
criterion and continue doing this until it does. Is there a way to
efficiently do this through an R function (replicate?). The problem is that
the number of iterations is not fixed. If there is no function in R, is
there a way I can quickly use Rcpp or some thing to have this last part of
my code in a C or C++ program which repeatedly calls my R functions for
updating the parameters?

-- 
Thanks in advance,
Rohit
Mob: 91 9819926213

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] random interaction effect in lmer

2011-02-06 Thread Jason Morgan

Hello Federico,

You should try sending this to the mixed models mailing list (link
below). Also, it would probably help to know what the data looks like. With the
information you provide, it's hard to say what the problem could be.

r-sig-mixed-mod...@r-project.org

Best,
~Jason

On 2011.02.06, Federico Bonofiglio wrote:
 Hi dears while modeling an interaction random effect in lmer i receive the
 instantaneous error message
 
  ldlM4-lmer(ldl~rt*cd4+age+rf+pharmac+factor(hcv)+
 + hivdur+(rt:cd4|id),na.action=na.omit,REML=F)
 *Warning message:
 In mer_finalize(ans) : false convergence (8)
 *
 I think the matter lies in syntax, 'cause i sistematically receive the same
 message even when changing response...
 PS: the above model runs quite well in lme and only rarely I recive :
 *
 Errore in lme.formula(fixed = ldl ~ rt + cd4 + age + rf + pharmac +
 factor(hcv) +  :
   nlminb problem, convergence error code = 1
   message = iteration limit reached without convergence (9)
 *
 ..reason for which I switch  into lmer
 
 Thank u in advance foe any hints...;)
 -- 
 *Little u can do against ignorance,it will always disarm u:
 is the 2nd principle of thermodinamics made manifest, ...entropy in
 expansion.**But setting order is the real quest 4 truth, ..and the
 mission of a (temporally) wise dude.
 *
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Jason W. Morgan
Ph.D. Candidate
Department of Political Science
*The Ohio State University*
154 North Oval Mall
Columbus, Ohio 43210

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with integrating R and c/c++

2011-02-06 Thread David Winsemius



On Feb 6, 2011, at 10:28 AM, Rohit Pandey wrote:


Hi,

I have been using R for close to two years now and have grown quite
comfortable with the language. I am presently trying to implement an
optimization routine in R (Newton Rhapson). I have some R functions  
that

calculate the gradient and hessian (pre requisite matrices) fairly
efficiently. Now, I have to call this function iteratively until some
convergance criterion is reached. I think the standard method of  
doing this
in most programming languages is a while loop. However, I know R can  
get
pretty slow when you use loops. In order to make this efficient, I  
want to
transfer this part of my code to a more efficient programming  
language like
c++ or c. However, I have been trying to learn this all day without  
any
luck. I found a package called Rcpp that makes this easier. However,  
it
seems some functional knowledge of writing R packages is a pre  
requisite. I
tried to follow the standard manual for doing this, but could not  
find a
simple example to get me started. I know I am supposed to make a cpp  
file
and put it some where before it can be called from R, but I'm  
confused as to

how this can be done.


Dirk Eddelbuettel has 8 vignettes linked from his Rcpp page:

http://dirk.eddelbuettel.com/code/rcpp.html

And reading further you will see about 20 packages written with Rcpp.

And there is a mailing list link which I had originally sought on the  
main Mailing List page (unsuccessfully):


http://lists.r-forge.r-project.org/pipermail/rcpp-devel/

--
David.


My requirement is to start with a parameter vector, update it  
according to
the gradient and hessian, check if the parameter satisfies some  
convergance

criterion and continue doing this until it does. Is there a way to
efficiently do this through an R function (replicate?). The problem  
is that
the number of iterations is not fixed. If there is no function in R,  
is
there a way I can quickly use Rcpp or some thing to have this last  
part of
my code in a C or C++ program which repeatedly calls my R functions  
for

updating the parameters?

--
Thanks in advance,
Rohit
Mob: 91 9819926213

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with integrating R and c/c++

2011-02-06 Thread Douglas Bates

On Sun, Feb 6, 2011 at 10:56 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Feb 6, 2011, at 10:28 AM, Rohit Pandey wrote:

 Hi,

 I have been using R for close to two years now and have grown quite
 comfortable with the language. I am presently trying to implement an
 optimization routine in R (Newton Rhapson). I have some R functions that
 calculate the gradient and hessian (pre requisite matrices) fairly
 efficiently. Now, I have to call this function iteratively until some
 convergance criterion is reached. I think the standard method of doing
 this
 in most programming languages is a while loop. However, I know R can get
 pretty slow when you use loops. In order to make this efficient, I want to
 transfer this part of my code to a more efficient programming language
 like
 c++ or c. However, I have been trying to learn this all day without any
 luck. I found a package called Rcpp that makes this easier. However, it
 seems some functional knowledge of writing R packages is a pre requisite.
 I
 tried to follow the standard manual for doing this, but could not find a
 simple example to get me started. I know I am supposed to make a cpp file
 and put it some where before it can be called from R, but I'm confused as
 to
 how this can be done.

 Dirk Eddelbuettel has 8 vignettes linked from his Rcpp page:

 http://dirk.eddelbuettel.com/code/rcpp.html

 And reading further you will see about 20 packages written with Rcpp.

Including one called RcppDE which does exactly what you want to do,
implementing an optimizer in C++ to be called from R.

Another package to look at is RcppArmadillo which provides a linkage
between Rcpp and the Armadillo C++ linear algebra library (although it
still doesn't do as much as I would like it to for writing
optimizers).

 And there is a mailing list link which I had originally sought on the main
 Mailing List page (unsuccessfully):

 http://lists.r-forge.r-project.org/pipermail/rcpp-devel/

 --
 David.

 My requirement is to start with a parameter vector, update it according to
 the gradient and hessian, check if the parameter satisfies some
 convergance
 criterion and continue doing this until it does. Is there a way to
 efficiently do this through an R function (replicate?). The problem is
 that
 the number of iterations is not fixed. If there is no function in R, is
 there a way I can quickly use Rcpp or some thing to have this last part of
 my code in a C or C++ program which repeatedly calls my R functions for
 updating the parameters?

 --
 Thanks in advance,
 Rohit
 Mob: 91 9819926213

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with integrating R and c/c++

2011-02-06 Thread Martin Morgan

On 02/06/2011 07:28 AM, Rohit Pandey wrote:
 Hi,
 
 I have been using R for close to two years now and have grown quite
 comfortable with the language. I am presently trying to implement an
 optimization routine in R (Newton Rhapson). I have some R functions that
 calculate the gradient and hessian (pre requisite matrices) fairly
 efficiently. Now, I have to call this function iteratively until some
 convergance criterion is reached. I think the standard method of doing this
 in most programming languages is a while loop. However, I know R can get
 pretty slow when you use loops. In order to make this efficient, I want to

Have you looked at ?optim or ?optimize or perhaps ?nlm ? The bulk of the
computation is in repeatedly evaluating the function and its gradient,
rather than the machinery to iterate to convergence. Maybe these parts
of your code can be written more efficiently?

Martin

 transfer this part of my code to a more efficient programming language like
 c++ or c. However, I have been trying to learn this all day without any
 luck. I found a package called Rcpp that makes this easier. However, it
 seems some functional knowledge of writing R packages is a pre requisite. I
 tried to follow the standard manual for doing this, but could not find a
 simple example to get me started. I know I am supposed to make a cpp file
 and put it some where before it can be called from R, but I'm confused as to
 how this can be done.
 
 My requirement is to start with a parameter vector, update it according to
 the gradient and hessian, check if the parameter satisfies some convergance
 criterion and continue doing this until it does. Is there a way to
 efficiently do this through an R function (replicate?). The problem is that
 the number of iterations is not fixed. If there is no function in R, is
 there a way I can quickly use Rcpp or some thing to have this last part of
 my code in a C or C++ program which repeatedly calls my R functions for
 updating the parameters?
 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with integrating R and c/c++

2011-02-06 Thread Dirk Eddelbuettel


On 6 February 2011 at 20:58, Rohit Pandey wrote:
| Hi,
| 
| I have been using R for close to two years now and have grown quite
| comfortable with the language. I am presently trying to implement an
| optimization routine in R (Newton Rhapson). I have some R functions that
| calculate the gradient and hessian (pre requisite matrices) fairly
| efficiently. Now, I have to call this function iteratively until some
| convergance criterion is reached. I think the standard method of doing this
| in most programming languages is a while loop. However, I know R can get
| pretty slow when you use loops. In order to make this efficient, I want to
| transfer this part of my code to a more efficient programming language like
| c++ or c. However, I have been trying to learn this all day without any
| luck. I found a package called Rcpp that makes this easier. However, it
| seems some functional knowledge of writing R packages is a pre requisite. I

What gave you that impression?  

Here is a counter-example, using the packages inline (for cxxfunction) and Rcpp:

  R library(inline)
  R src - 'std::cout  Hello C++_From_R World  std::endl; 
return(Rcpp::wrap(42));'
  R rohit - cxxfunction(signature(), src, plugin=Rcpp)
  R rohit()
  Hello C++_From_R World
  [1] 42
  R  

This compiled, linked and loaded a C++ routine built from the two-statement
program submitted as character variable.

The Rcpp documentation, including its eight vignettes, is full of other
examples. Start with Rcpp-introduction and maybe the Rcpp-FAQ.

| tried to follow the standard manual for doing this, but could not find a
| simple example to get me started. I know I am supposed to make a cpp file
| and put it some where before it can be called from R, but I'm confused as to
| how this can be done.
| 
| My requirement is to start with a parameter vector, update it according to
| the gradient and hessian, check if the parameter satisfies some convergance
| criterion and continue doing this until it does. Is there a way to
| efficiently do this through an R function (replicate?). The problem is that
| the number of iterations is not fixed. If there is no function in R, is
| there a way I can quickly use Rcpp or some thing to have this last part of
| my code in a C or C++ program which repeatedly calls my R functions for
| updating the parameters?

Give the example above a first try, and then read some more. The archives of
the rcpp-devel (CC'ed; post there for follow-ups after subscribing) list are
full of examples, and the CRAN page for Rcpp lists almost two dozen other
packages using Rcpp giving you plenty of examples should you want to write a
package using Rcpp. Several of these packages do optimization giving you
examples of you to pass parameters etc pp.

Hope this helps, Dirk

| 
| -- 
| Thanks in advance,
| Rohit
| Mob: 91 9819926213
| 
|   [[alternative HTML version deleted]]
| 
| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to use a value of an aboject in a line command?

2011-02-06 Thread Leandro Colli

This is my first time here at R Forum!

I am a new user of R. I am very happy with this fabulous software!

I know how to use greps, fors, seeds, %in%, paste, etc.

But I need to know how to use  a value of an aboject in a line command?

Example:
 i
[1] TP53

 TP53
  V1   V2  V3
1  1 TP53 1.1
2  2 TP53 1.2
3  3 TP53 1.3

I would like to do a t.test of TP53 x control.
But this is a gene list.
My line command would be

result = t.test( i[,3], control[,3]).

But using this, the test will be in the object i and I would like to do in
the object which is value of i.

I though to use `print(i)`, like Shell, to modify my test line command, but
i does not works.

Do I was clear? I need to do it because the test will be done in a list of
genes (for i in genes).

Thank you very much,

regards,

Leandro Colli

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Applying 'cbind/rbind' among different list object

2011-02-06 Thread B. Jonathan B. Jonathan

Hi, I am wondering whether we can apply 'cbind/rbind' on many **equivalent**
list objects. For example please consider following:

 list1 - list2 - vector(list, length=2); names(list1) - names(list2)
- c(a, b)
 list1[[1]] - matrix(1:25, 5)
 list1[[2]] - matrix(2:26, 5)
 list2[[1]] - 10:14
 list2[[2]] - 11:15
 list1
$a
 [,1] [,2] [,3] [,4] [,5]
[1,]16   11   16   21
[2,]27   12   17   22
[3,]38   13   18   23
[4,]49   14   19   24
[5,]5   10   15   20   25
$b
 [,1] [,2] [,3] [,4] [,5]
[1,]27   12   17   22
[2,]38   13   18   23
[3,]49   14   19   24
[4,]5   10   15   20   25
[5,]6   11   16   21   26
 list2
$a
[1] 10 11 12 13 14
$b
[1] 11 12 13 14 15

Here I want to rbind these 2 list-s according to a and b i.e. I want
to get another list of length 2 where each element will be matrix with (6x5)
dimension. How can I do that?

Thanks for your time

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fortran and long integers

2011-02-06 Thread Peter Langfelder

Hi all,

I'm hoping someone more knowledgeable in Fortran than I can chime in
with opinion.

I'm the maintainer of the flashClust package that implements fast
hierarchical clustering. The fortran code fails when the number of
clustered objects is larger than about 46300. My guess is that this is
because the code uses the following construct:

IOFFSET=J+(I-1)*N-(I*(I+1))/2


where N is the number of clustered objects and I, J can vary between 1
and N. The result is used as index to access an array (namely the
distance structure).

When N is more than 46341 (or 2^16/sqrt(2)), the expressions I*(I+1)
and (I-1)*N can overflow and turn negative, messing up the indexing
and crashing the code and the entire R session with a segmentation
fault.

My solution is to turn the integers into double precision's, calculate
the index and convert back as follows:

XI = DBLE(I)
IOFFSET=J+NINT( (XI-1)*N - (XI*(XI+1))/2)

I'm wondering if there's a better way, something along the lines of
unsigned and/or long integers that are available in C?

Thanks,

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying 'cbind/rbind' among different list object

2011-02-06 Thread jim holtman

will this do it for you:

 lapply(seq(length(list1)), function(i)rbind(list1[[i]], list2[[i]]))
[[1]]
 [,1] [,2] [,3] [,4] [,5]
[1,]16   11   16   21
[2,]27   12   17   22
[3,]38   13   18   23
[4,]49   14   19   24
[5,]5   10   15   20   25
[6,]   10   11   12   13   14

[[2]]
 [,1] [,2] [,3] [,4] [,5]
[1,]27   12   17   22
[2,]38   13   18   23
[3,]49   14   19   24
[4,]5   10   15   20   25
[5,]6   11   16   21   26
[6,]   11   12   13   14   15


On Sun, Feb 6, 2011 at 1:32 PM, B. Jonathan B. Jonathan
bkheijonat...@gmail.com wrote:
 Hi, I am wondering whether we can apply 'cbind/rbind' on many **equivalent**
 list objects. For example please consider following:

 list1 - list2 - vector(list, length=2); names(list1) - names(list2)
 - c(a, b)
 list1[[1]] - matrix(1:25, 5)
 list1[[2]] - matrix(2:26, 5)
 list2[[1]] - 10:14
 list2[[2]] - 11:15
 list1
 $a
     [,1] [,2] [,3] [,4] [,5]
 [1,]    1    6   11   16   21
 [2,]    2    7   12   17   22
 [3,]    3    8   13   18   23
 [4,]    4    9   14   19   24
 [5,]    5   10   15   20   25
 $b
     [,1] [,2] [,3] [,4] [,5]
 [1,]    2    7   12   17   22
 [2,]    3    8   13   18   23
 [3,]    4    9   14   19   24
 [4,]    5   10   15   20   25
 [5,]    6   11   16   21   26
 list2
 $a
 [1] 10 11 12 13 14
 $b
 [1] 11 12 13 14 15

 Here I want to rbind these 2 list-s according to a and b i.e. I want
 to get another list of length 2 where each element will be matrix with (6x5)
 dimension. How can I do that?

 Thanks for your time

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] using character vector as input argument to setkey (data.table pakcage)

2011-02-06 Thread Sean Zhang

Dear R helpers,

I wonder how to use a character vector as an input argument to setkey
(data.table package).
The following works:

library(data.table)
test.dt - data.table(expand.grid(a=1:30,b=LETTERS),c=seq(30*26))
setkey(test.dt,a,b)

I like a similar function, but can accept c('a','b') as an input argument as
below
setkey.wanted(test.dt,c('a','b'))

Your help will be highly appreciated.

-Sean

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with integrating R and c/c++

2011-02-06 Thread Ravi Varadhan

Hi Rohit,

It is fine if you are using the Newton's method just as an excuse to learn 
Rcpp.  If, however, your main goal is to develop a package to implement 
Newton's method, then you need to stop and look at the numerous optimization 
packages available in R.  A good place to start would be the optimx package, 
which unifies a large number of optimiaztion tools under one umbrella.

Hope this helps,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Rohit Pandey rohitpandey...@gmail.com
Date: Sunday, February 6, 2011 11:33 am
Subject: [R] Help with integrating R and c/c++
To: r-help@r-project.org


 Hi,
  
  I have been using R for close to two years now and have grown quite
  comfortable with the language. I am presently trying to implement an
  optimization routine in R (Newton Rhapson). I have some R functions that
  calculate the gradient and hessian (pre requisite matrices) fairly
  efficiently. Now, I have to call this function iteratively until some
  convergance criterion is reached. I think the standard method of 
 doing this
  in most programming languages is a while loop. However, I know R can 
 get
  pretty slow when you use loops. In order to make this efficient, I 
 want to
  transfer this part of my code to a more efficient programming 
 language like
  c++ or c. However, I have been trying to learn this all day without any
  luck. I found a package called Rcpp that makes this easier. However, 
 it
  seems some functional knowledge of writing R packages is a pre 
 requisite. I
  tried to follow the standard manual for doing this, but could not 
 find a
  simple example to get me started. I know I am supposed to make a cpp 
 file
  and put it some where before it can be called from R, but I'm 
 confused as to
  how this can be done.
  
  My requirement is to start with a parameter vector, update it 
 according to
  the gradient and hessian, check if the parameter satisfies some convergance
  criterion and continue doing this until it does. Is there a way to
  efficiently do this through an R function (replicate?). The problem 
 is that
  the number of iterations is not fixed. If there is no function in R, 
 is
  there a way I can quickly use Rcpp or some thing to have this last 
 part of
  my code in a C or C++ program which repeatedly calls my R functions for
  updating the parameters?
  
  -- 
  Thanks in advance,
  Rohit
  Mob: 91 9819926213
  
   [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use a value of an aboject in a line command?

2011-02-06 Thread Petr Savicky

On Sun, Feb 06, 2011 at 04:31:43PM -0200, Leandro Colli wrote:
 This is my first time here at R Forum!
 
 I am a new user of R. I am very happy with this fabulous software!
 
 I know how to use greps, fors, seeds, %in%, paste, etc.
 
 But I need to know how to use  a value of an aboject in a line command?
 
 Example:
  i
 [1] TP53
 
  TP53
   V1   V2  V3
 1  1 TP53 1.1
 2  2 TP53 1.2
 3  3 TP53 1.3
 
 I would like to do a t.test of TP53 x control.
 But this is a gene list.
 My line command would be
 
 result = t.test( i[,3], control[,3]).
 
 But using this, the test will be in the object i and I would like to do in
 the object which is value of i.
 
 I though to use `print(i)`, like Shell, to modify my test line command, but
 i does not works.
 
 Do I was clear? I need to do it because the test will be done in a list of
 genes (for i in genes).

If i understand correctly, then what you ask for is

  result = t.test( get(i)[,3], control[,3])

However, get() is only rarely needed. If you have a list of tables,
you can have them all in a single list variable and use a loop
over the list. I mean something like the following

  tab - list()
  tab[[1]] - data.frame(V1=1:3, V2=AA, V3=seq(1.1, 1.3, by=0.1))
  tab[[2]] - data.frame(V1=1:3, V2=BB, V3=seq(2.1, 2.3, by=0.1))
  tab[[3]] - data.frame(V1=1:3, V2=CC, V3=seq(3.1, 3.3, by=0.1))
  
  for (j in 1:3) {
  # here any command using tab[[j]] may be used
  # using print() for simplicity
  print(tab[[j]])
  }

See chapter 6 Lists and data frames of R-intro.pdf available at

  http://cran.r-project.org/manuals.html

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread Gene Leynes

On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote:

 
  However, I don't think you've told us what you're actually trying to
  accomplish...
 


I'm trying to aggregate the y value of a big data set which has several x's
and a y.
I'm using an abstracted example for many reasons.  Partially, I'm using an
abstracted example to comply with the posting guidelines of having a
reproducible example.  I'm really aggregating some incredibly boring and
complex customer data for an undisclosed client.

As it turns out,
Aggregate will not work when some of x's are NA, unless you convert them to
factors, with NA's included.

In my case, the data is so big that doing the conversions causes other
memory problems, and renders some of my numeric values useless for other
calculations.

My real data looks more like this (except with a few more categories and
records):

set.seed(100)
library(plyr)
dat=data.frame(
x1=sample(c(NA,'m','f'), 2e6, replace=TRUE),
x2=sample(c(NA, 1:10), 2e6, replace=TRUE),
x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE),
x4=sample(c(NA,T,F), 2e6, replace=TRUE),
x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6,
replace=TRUE),
x6=sample(c(NA, 1:10), 2e6, replace=TRUE),
x7=sample(c(NA,'married','divorced','separated','single','etc'),
2e6, replace=TRUE),
x8=sample(c(NA,T,F), 2e6, replace=TRUE),
y=trunc(rnorm(2e6)*1), stringsAsFactors=F)
str(dat)
## The control total
sum(dat$y, na.rm=T)
## The aggregate total
sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x)
## The ddply total
sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x)
{data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum)

ddply worked a little better than I expected at first, but it slows to a
crawl or has runs out of memory too often for me to invest the time learning
how to use it.  Just now it worked for 1m records, and it was just a bit
slower than aggregate.  But for the 2m example it hasn't finished
calculating.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] very basic HLM question

2011-02-06 Thread Sebastián Daza


Thank you for your reply and sorry for my ambiguity.

I computed:

summary(anova1 - aov(math ~ as.factor(schoolid), data=nels88))
ICC1(anova1)

ICC1 comes from the multilevel package. I have found an article where it 
is pointed out that with this formula:


  34.011/(34.011+72.256)
[1] 0.3200523

I should get practically an identical result than using ICC1. I have 
used the database of that article and I have got an identical result. 
But with the dataset I am using (schools) it doesn't work.


This is the example of the article mentioned above:

library(multilevel)
base(bh1996)

summary(lmer(WBEING ~ 1 + (1|GRP), data=bh1996))

0.035801/(0.035801+0.789497)

summary(test - aov(WBEING~as.factor(GRP),data=bh1996))
ICC1(test)


I have attached the small database where this equality doesn't exist.
Thank you in advance.


On 2/5/2011 11:07 PM, Paul Johnson wrote:

2011/2/5 Sebastián Dazasebastian.d...@gmail.com:

Hi everyone,

I need to get a between-component variance (e.g. random effects Anova), but
using lmer I don't get the same results (variance component) than using
random effects Anova. I am using a database of students, clustered on
schools (there is not the same number of students by school).

According to the ICC1 command, the interclass correlation is .44


ICC1(anova1)

[1] 0.4414491


If you don't tell us exactly what model you are calculating in
anova1, how would we guess if there is something wrong?

Similarly, I get this

ICC1

Error: object 'ICC1' not found

so it must mean you've loaded a package or written a function, which
you've not shown us.

I googled my way to a package called multilevel that has ICC1, and
its code for ICC1 shows a formula that does not match the one you used
to calculate ICC from lmer.

function (object)
{
 MOD- summary(object)
 MSB- MOD[[1]][1, 3]
 MSW- MOD[[1]][2, 3]
 GSIZE- (MOD[[1]][2, 1] + (MOD[[1]][1, 1] + 1))/(MOD[[1]][1,
 1] + 1)
 OUT- (MSB - MSW)/(MSB + ((GSIZE - 1) * MSW))
 return(OUT)
}

I'm not saying that's right or wrong, just not obviously identical to
the formula you proposed.



However, I cannot get the same ICC from the lmer output:


anova2- lmer(math ~ 1 + (1|schoolid), data=nels88)
summary(anova2- lmer(math ~ 1 + (1|schoolid), data=nels88))




Instead, do this (same thing, fits model only once):


anova2- lmer(math ~ 1 + (1|schoolid), data=nels88)
summary(anova2)


Note that lmer is going to estimate a normally distributed random
effect for each school, as well as an individual observation random
effect (usual error term) that is assumed independent of the
school-level effect.  What is anova1 estimating?




--
Sebastián Daza
sebastian.d...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsampling out of site*abundance matrix

2011-02-06 Thread B77S


Hello, 
How can I randomly sample individuals within a sites from a site (row) X
species abundance (column) data frame or matrix?  As an example, the matrix
abund2 made below.

# (sorry, Im a newbie and this is the only way I know to get an example
on here)

abund1 -c(150,  300,  0,  360,  150,  300,  0,  240,  150,   0,  60,  
0, 150,  0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0)
abund2 - matrix(data=abund1, nrow=4, ncol=7)
colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG)
rownames(abund2)-c(site1, site2, site3, site4) 

#

 abund2
  spA spB spC spD spa spF spG
site1 150 150 150 150   0 300 300
site2 300 300   0   0 300 300   0
site3   0   0  60 540   0   0 600
site4 360 240   0   0 240 360   0

How can I make a random subsample of 100 individuals from the abundances
given for each site?

This is probably really easy.
Thanks.
Bubba 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread Denis Kazakiewicz

Try to use formula notation and use na.action=na.pass
It is all described in the help(aggregate)


У Няд, 06/02/2011 у 14:54 -0600, Gene Leynes піша:
 On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote:
 
  
   However, I don't think you've told us what you're actually trying to
   accomplish...
  
 
 
 I'm trying to aggregate the y value of a big data set which has several x's
 and a y.
 I'm using an abstracted example for many reasons.  Partially, I'm using an
 abstracted example to comply with the posting guidelines of having a
 reproducible example.  I'm really aggregating some incredibly boring and
 complex customer data for an undisclosed client.
 
 As it turns out,
 Aggregate will not work when some of x's are NA, unless you convert them to
 factors, with NA's included.
 
 In my case, the data is so big that doing the conversions causes other
 memory problems, and renders some of my numeric values useless for other
 calculations.
 
 My real data looks more like this (except with a few more categories and
 records):
 
 set.seed(100)
 library(plyr)
 dat=data.frame(
 x1=sample(c(NA,'m','f'), 2e6, replace=TRUE),
 x2=sample(c(NA, 1:10), 2e6, replace=TRUE),
 x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE),
 x4=sample(c(NA,T,F), 2e6, replace=TRUE),
 x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6,
 replace=TRUE),
 x6=sample(c(NA, 1:10), 2e6, replace=TRUE),
 x7=sample(c(NA,'married','divorced','separated','single','etc'),
 2e6, replace=TRUE),
 x8=sample(c(NA,T,F), 2e6, replace=TRUE),
 y=trunc(rnorm(2e6)*1), stringsAsFactors=F)
 str(dat)
 ## The control total
 sum(dat$y, na.rm=T)
 ## The aggregate total
 sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x)
 ## The ddply total
 sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x)
 {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum)
 
 ddply worked a little better than I expected at first, but it slows to a
 crawl or has runs out of memory too often for me to invest the time learning
 how to use it.  Just now it worked for 1m records, and it was just a bit
 slower than aggregate.  But for the 2m example it hasn't finished
 calculating.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fund style categorizing

2011-02-06 Thread Ning Cheng

I've got a fund performance report for the last year,in which one
column states their management style,i.e. long/short,market neutral. I
want to categorize all the fund into subgroups by their management
style.How can I do that?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread Gene Leynes

By the way, thanks for sending that formula, it's quite thoughtful of you to
send an answer with an actual working line of code!

When I experimented with ddply earlier last week I couldn't figure out the
syntax for a single line aggregation, so it's good to have this example. I
will likely use it for other things.

On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote:

 oops. For clarity, that should have been

 sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y,
 na.rm=TRUE))})$y.sum)

 -Ista

 On Fri, Feb 4, 2011 at 7:52 PM, Ista Zahn iz...@psych.rochester.edu
 wrote:
  Hi again,
 
  On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes gleyne...@gmail.com wrote:
  Ista,
 
  Thank you again.
 
  I had figured that out... and was crafting another message when you
 replied.
 
  The NAs do come though on the variable that is being aggregated,
  However, they do not come through on the categorical variable(s).
 
  The aggregate function must be converting the data frame variables to
  factors, with the default omit=NA parameter.
 
  The help on aggregate says:
  na.action A function which indicates what should happen when the
 data
  contain NA values.
The default is to ignore missing values in the given
  variables.
  By data it must only refer to the aggregated variable, and not the
  categorical variables.  I thought it referred to both, because I thought
 it
  referred to the data argument, which is the underlying data frame.
 
  I think the proper way to accomplish this would be to recast my x
  (categorical) variables as factors.
 
  Yes, that would work.
 
  This is not feasible for me due to
  other complications.
  Also, (imho) the help should be more clear about what the na.action
  modifies.
 
  So, unless someone has a better idea, I guess I'm out of luck?
 
  Well, you can use ddply from the plyr package:
 
  library(plyr) # may need to install first.
  sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y,
  na.rm=TRUE))})$y)
 
  However, I don't think you've told us what you're actually trying to
  accomplish...
 
  Best,
  Ista
 
 
 
  On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn iz...@psych.rochester.edu
 wrote:
 
  Hi,
 
  On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes gleyne...@gmail.com
 wrote:
   Thank you both for the thoughtful (and funny) replies.
  
   I agree with both of you that sum is the one picking up aggregate.
   Although
   I didn't mention it, I did realize that in the first place.
   Also, thank you Phil for pointing out that aggregate only accepts a
   formula
   value in more recent versions!  I actually thought that was an older
   feature, but I must be thinking of other functions.
  
   I still don't see why these two values are not the same!
  
   It seems like a bug to me
 
  No, not a bug (see below).
 
  
   set.seed(100)
   dat=data.frame(
   + x1=sample(c(NA,'m','f'), 100, replace=TRUE),
   + x2=sample(c(NA, 1:10), 100, replace=TRUE),
   + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
   + x4=sample(c(NA,T,F), 100, replace=TRUE),
   + y=sample(c(rep(NA,5), rnorm(95
   sum(dat$y, na.rm=T)
   [1] 0.0815244116598
   sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass,
   na.rm=T)$y)
   [1] -4.45087666247
  
 
  Because in the first one you are only removing missing data in dat$y.
  In the second one you are removeing all rows that contain missing data
  in any of the columns.
 
  all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
  sum, na.action=na.pass, na.rm=T)$y))
  [1] TRUE
 
  Best,
  Ista
 
  
  
  
   On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn iz...@psych.rochester.edu
 
   wrote:
  
   Sorry, I didn't see Phil's reply, which is better than mine anyway.
  
   -Ista
  
   On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
 iz...@psych.rochester.edu
   wrote:
Hi,
   
Please see ?na.action
   
(just kidding!)
   
So it seems to me the problem is that you are passing na.rm to the
sum
function. So there is no missing data for the na.action argument
 to
operate on!
   
Compare
   
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
   
   
Best,
Ista
   
On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes gleyne...@gmail.com
wrote:
Can someone please tell me what is up with na.action in
 aggregate?
   
My (somewhat) reproducible example:
(I say somewhat because some lines wouldn't run in a separate
session,
more
below)
   
set.seed(100)
dat=data.frame(
   x1=sample(c(NA,'m','f'), 100, replace=TRUE),
   x2=sample(c(NA, 1:10), 100, replace=TRUE),
   x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
   x4=sample(c(NA,T,F), 100, replace=TRUE),
   y=sample(c(rep(NA,5),

[R] function optimization

2011-02-06 Thread garciap


Dear all,

this is my first time, and just begin to use R. But I've a question about
optimization using optimx library. It could sound stupid by I'm a bit
affraid with the problem, because anything I try, anything Error. 
The procedure was:
optimx(par=10,71,1,fn=(Prey*Provisioning)/Risk, control=list(maximize))

well the problem lies in the initial value parameters, again and again, R
tells me that it is not possible to optimize the function at this values,
even though these are real estimations.

Any suggestion?

Pablo
-- 
View this message in context: 
http://r.789695.n4.nabble.com/function-optimization-tp3263244p3263244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function optimization

2011-02-06 Thread Ben Bolker

garciap garciap at usal.es writes:


 this is my first time, and just begin to use R. But I've a question about
 optimization using optimx library. It could sound stupid by I'm a bit
 affraid with the problem, because anything I try, anything Error. 
 The procedure was:
 optimx(par=10,71,1,fn=(Prey*Provisioning)/Risk, control=list(maximize))
 
 well the problem lies in the initial value parameters, again and again, R
 tells me that it is not possible to optimize the function at this values,
 even though these are real estimations.
 

  I'm afraid there are quite a few things wrong with your approach.

* parameters need to be a **numeric vector** (e.g. c(10,71,1)) rather
than characters
* you haven't defined a function that takes the vector of parameters
as an argument and returns a goodness-of-fit measure or other objective
function value.  This is the biggest problem
(the rest are just basic misunderstandings of R syntax, easily corrected).
* if you want the function to maximize you need control=list(maximize=TRUE).
* I think you have not given us exactly the command you entered. If I enter
what you have typed above, the unmatched quotation mark before 'maximize'
means that R doesn't even accept my command: it gives me a 'continuation
character' (+) telling me that it is waiting for further input.  It's very
important to specify *exactly* the commands you typed; cutting and pasting
from the R session (or better, keeping a script file and copying from there)
is a good way to ensure this.

  I would suggest that you work carefully through the examples
in the optimx help page, making sure that you understand them
(or if you get totally stuck there, you can post to the help list
with a *specific* question about what you haven't understood).
I might also recommend that you try starting with the optim()
help page, which is a little bit simpler than the optimx() page.
optimx has better optimization routines, but for getting a basic
understanding of how R does optimization optim() should suffice.

  Don't forget to read the posting guide before you come back with
more questions.

  good luck,
   Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread jim holtman

Try 'data.table' package.  It took 3 seconds to aggregate the 500K
levels:  Is this what you were after?

 # note the characters are converted to factors that 'data.table' likes
 dat=data.frame(
+x1=sample(c(NA,'m','f'), 2e6, replace=TRUE),
+x2=sample(c(NA, 1:10), 2e6, replace=TRUE),
+x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE),
+x4=sample(c(NA,T,F), 2e6, replace=TRUE),
+x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6,
+ replace=TRUE),
+x6=sample(c(NA, 1:10), 2e6, replace=TRUE),
+x7=sample(c(NA,'married','divorced','separated','single','etc'),
+ 2e6, replace=TRUE),
+x8=sample(c(NA,T,F), 2e6, replace=TRUE),
+y=trunc(rnorm(2e6)*1))
 str(dat)
'data.frame':   200 obs. of  9 variables:
 $ x1: Factor w/ 2 levels f,m: NA NA 2 NA NA NA NA 1 1 1 ...
 $ x2: int  4 5 3 10 10 7 1 1 3 5 ...
 $ x3: Factor w/ 5 levels a,b,c,d,..: 3 2 1 2 1 5 1 1 2 1 ...
 $ x4: logi  NA TRUE TRUE NA FALSE NA ...
 $ x5: Factor w/ 4 levels active,deleted,..: 4 3 3 2 2 1 1 NA 3 3 ...
 $ x6: int  NA 2 7 2 1 9 NA 1 1 9 ...
 $ x7: Factor w/ 5 levels divorced,etc,..: 1 3 5 NA 2 3 1 2 2 2 ...
 $ x8: logi  NA NA NA FALSE FALSE FALSE ...
 $ y : num  3066 -13237 -7840 9728 1596 ...
 require(data.table)
 dat - data.table(dat)
 system.time(result - dat[, sum(y), by = list(x1,x2,x3,x4,x5,x6,x7,x8)])
   user  system elapsed
   3.110.163.26
 str(result)
Classes ‘data.table’ and 'data.frame':  568594 obs. of  9 variables:
 $ x1: Factor w/ 2 levels f,m: NA NA NA NA NA NA NA NA NA NA ...
 $ x2: int  NA NA NA NA NA NA NA NA NA NA ...
 $ x3: Factor w/ 5 levels a,b,c,d,..: NA NA NA NA NA NA NA NA NA NA ...
 $ x4: logi  NA NA NA NA NA NA ...
 $ x5: Factor w/ 4 levels active,deleted,..: NA NA NA NA NA NA NA
NA NA NA ...
 $ x6: int  NA NA NA NA NA NA NA NA NA NA ...
 $ x7: Factor w/ 5 levels divorced,etc,..: NA NA NA 1 1 1 2 2 2 3 ...
 $ x8: logi  NA FALSE TRUE NA FALSE TRUE ...
 $ V1: num  6641 -18158 3 -11202 -14437 ...




On Sun, Feb 6, 2011 at 3:54 PM, Gene Leynes gleyne...@gmail.com wrote:
 On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote:

 
  However, I don't think you've told us what you're actually trying to
  accomplish...
 


 I'm trying to aggregate the y value of a big data set which has several x's
 and a y.
 I'm using an abstracted example for many reasons.  Partially, I'm using an
 abstracted example to comply with the posting guidelines of having a
 reproducible example.  I'm really aggregating some incredibly boring and
 complex customer data for an undisclosed client.

 As it turns out,
 Aggregate will not work when some of x's are NA, unless you convert them to
 factors, with NA's included.

 In my case, the data is so big that doing the conversions causes other
 memory problems, and renders some of my numeric values useless for other
 calculations.

 My real data looks more like this (except with a few more categories and
 records):

 set.seed(100)
 library(plyr)
 dat=data.frame(
        x1=sample(c(NA,'m','f'), 2e6, replace=TRUE),
        x2=sample(c(NA, 1:10), 2e6, replace=TRUE),
        x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE),
        x4=sample(c(NA,T,F), 2e6, replace=TRUE),
        x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6,
 replace=TRUE),
        x6=sample(c(NA, 1:10), 2e6, replace=TRUE),
        x7=sample(c(NA,'married','divorced','separated','single','etc'),
 2e6, replace=TRUE),
        x8=sample(c(NA,T,F), 2e6, replace=TRUE),
        y=trunc(rnorm(2e6)*1), stringsAsFactors=F)
 str(dat)
 ## The control total
 sum(dat$y, na.rm=T)
 ## The aggregate total
 sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x)
 ## The ddply total
 sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x)
        {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum)

 ddply worked a little better than I expected at first, but it slows to a
 crawl or has runs out of memory too often for me to invest the time learning
 how to use it.  Just now it worked for 1m records, and it was just a bit
 slower than aggregate.  But for the 2m example it hasn't finished
 calculating.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsampling out of site*abundance matrix

2011-02-06 Thread David Winsemius



On Feb 6, 2011, at 3:25 PM, B77S wrote:



Hello,
How can I randomly sample individuals within a sites from a site  
(row) X
species abundance (column) data frame or matrix?  As an example, the  
matrix

abund2 made below.

# (sorry, Im a newbie and this is the only way I know to get an  
example

on here)

abund1 -c(150,  300,  0,  360,  150,  300,  0,  240,  150,
0,  60,

0, 150,  0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0)
abund2 - matrix(data=abund1, nrow=4, ncol=7)
colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG)
rownames(abund2)-c(site1, site2, site3, site4)


Perfect. Best submission of an example by a newbie in weeks.



#


abund2

 spA spB spC spD spa spF spG
site1 150 150 150 150   0 300 300
site2 300 300   0   0 300 300   0
site3   0   0  60 540   0   0 600
site4 360 240   0   0 240 360   0

How can I make a random subsample of 100 individuals from the  
abundances

given for each site?


samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100,  
prob=x, replace=TRUE) )

samptbl

   site1 site2 site3 site4
  [1,] spG spa spD spF
  [2,] spF spF spG spB
  [3,] spF spB spC spA
  [4,] spD spa spG spA
  [5,] spF spa spD spa
  [6,] spA spB spD spF
  [7,] spA spF spD spA
  [8,] spG spF spG spa
  [9,] spF spF spG spa
 [10,] spG spB spD spA

Snipped

apply() always transposes the results when called with row margins.  
The t() function would fix this if it needed to be arranged with  
rows by site. You could check by further apply-(cation) of table to  
the columns:

 apply(samptbl, 2, table)
$site1

spA spB spC spD spF spG
  8  13   6  13  32  28

$site2

spa spA spB spF
 25  31  25  19

$site3

spC spD spG
  9  51  40

$site4

spa spA spB spF
 22  27  19  32



This is probably really easy.




Thanks.
Bubba
--
View this message in context: 
http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fund style categorizing

2011-02-06 Thread Jim Holtman

?aggregate
?tapply
data.table package

example data and output would have been useful

Sent from my iPad

On Feb 6, 2011, at 15:02, Ning Cheng wakinchaue...@gmail.com wrote:

 I've got a fund performance report for the last year,in which one
 column states their management style,i.e. long/short,market neutral. I
 want to categorize all the fund into subgroups by their management
 style.How can I do that?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying 'cbind/rbind' among different list object

2011-02-06 Thread Phil Spector


Another possibility is

mapply(rbind,list1,list2,SIMPLIFY=FALSE)

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Sun, 6 Feb 2011, jim holtman wrote:


will this do it for you:


lapply(seq(length(list1)), function(i)rbind(list1[[i]], list2[[i]]))

[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,]16   11   16   21
[2,]27   12   17   22
[3,]38   13   18   23
[4,]49   14   19   24
[5,]5   10   15   20   25
[6,]   10   11   12   13   14

[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,]27   12   17   22
[2,]38   13   18   23
[3,]49   14   19   24
[4,]5   10   15   20   25
[5,]6   11   16   21   26
[6,]   11   12   13   14   15


On Sun, Feb 6, 2011 at 1:32 PM, B. Jonathan B. Jonathan
bkheijonat...@gmail.com wrote:

Hi, I am wondering whether we can apply 'cbind/rbind' on many **equivalent**
list objects. For example please consider following:


list1 - list2 - vector(list, length=2); names(list1) - names(list2)

- c(a, b)

list1[[1]] - matrix(1:25, 5)
list1[[2]] - matrix(2:26, 5)
list2[[1]] - 10:14
list2[[2]] - 11:15
list1

$a
    [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
$b
    [,1] [,2] [,3] [,4] [,5]
[1,]    2    7   12   17   22
[2,]    3    8   13   18   23
[3,]    4    9   14   19   24
[4,]    5   10   15   20   25
[5,]    6   11   16   21   26

list2

$a
[1] 10 11 12 13 14
$b
[1] 11 12 13 14 15

Here I want to rbind these 2 list-s according to a and b i.e. I want
to get another list of length 2 where each element will be matrix with (6x5)
dimension. How can I do that?

Thanks for your time

       [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Legend outside the plot? xpd?

2011-02-06 Thread Jim Holtman

you can use layout to setup 4 plot areas; 3 for the graphs and then along one 
on the right for the legend

Sent from my iPad

On Feb 6, 2011, at 2:41, Matt Cooper mattcst...@gmail.com wrote:

 Hi All,
 
 BG: Will try be brief. I'd like 3 graphs on a page (below each other
 mfrow=c(3,1)), saved to pdf. The three plot data on the same subject so I'm
 having one legend, to the right of the center graph. I'm using
 mar=c(5,15,4,15) to bring the sides in so that the graphs are square and not
 stretched wide. To have the graph to the side I'm thinking xpd=T. Each graph
 has a number of points and lines overlaid, so plot
 ();lines(),lines(),lines() etc.
 
 The data is somewhat of a subset of a larger set, so I'm limiting what is
 being displayed. The lines plot trends of the wider data.
 
 P: With xpd=T, the lines are going right out of the graph boxes to the outer
 limit of the plot boundaries, as would be the intended behaviour of xpd.
 Goes without saying that this is undesirable.
 
 Q: Is there a better way to achieve what I want? At this stage I have xpd=T
 in the par(), and xpd=F in the plot() commands and all the line commands,
 just so I can get the legend to actually print...
 
 Aside: Given the way I've done this, the lines() are all clipped RIGHT at
 the limit of the plot box. So given the plot is called first these lines
 leave little coloured dashes on the black of the plot box. To sort this I've
 called box() at the end. This all seems very redundant?
 
 Thanks
 Matt
 
 
 -- 
 mattcst...@gmail.com
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread Dennis Murphy

Hi:

There's definitely something amiss with aggregate() here since similar
functions from other packages can reproduce your 'control' sum. I expect
ddply() will have some timing issues because of all the subgrouping in your
data frame, but data.table did very well and the summaryBy() function in the
doBy package did OK:

library(data.table)
library(doBy)
# Utility function to remove missing values when computing the sum
# for use in summaryBy()
f - function(x) sum(x, na.rm = TRUE)

 system.time({
+ dt - data.table(dat)
+ setkey(dt, x1, x2, x3, x4, x5, x6, x7, x8)
+ udt - dt[, list(y = sum(y, na.rm = TRUE)),
+ by = 'x1, x2, x3, x4, x5, x6, x7, x8']
+ })
   user  system elapsed
   7.521.569.14
 system.time(
+ udb - summaryBy(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
+data = dat, FUN = f)
+)
   user  system elapsed
  83.820.83   85.21

# To verify against the control sum:
 sum(udt$y)
[1] -5611158
 sum(udb$y)
[1] -5611158

Notice the difference in the number of rows of uda in comparison to the
other two:

 uda - aggregate(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
data = dat, FUN = f)
 sum(uda$y)
[1] 6396661
 dim(uda)
[1] 77321 9
 dim(udt)
[1] 568353  9
 dim(udb)
[1] 568353  9

I used four different approaches to aggregate(), excluding the one above:

aggregate(y ~ ., data = dat, FUN = f)
aggregate(y ~ ., data = dat, FUN = sum, na.rm = TRUE)
aggregate(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
 data = dat, FUN = sum, na.rm = TRUE)
aggregate(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
 data = dat, FUN = sum, na.action= na.pass)

All yielded the same answer as in uda.

data.table has some distinct advantages in situations such as this - see its
vignette and FAQ for details.

HTH,
Dennis


On Sun, Feb 6, 2011 at 12:54 PM, Gene Leynes gleyne...@gmail.com wrote:

 On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu
 wrote:

  
   However, I don't think you've told us what you're actually trying to
   accomplish...
  
 

 I'm trying to aggregate the y value of a big data set which has several x's
 and a y.
 I'm using an abstracted example for many reasons.  Partially, I'm using an
 abstracted example to comply with the posting guidelines of having a
 reproducible example.  I'm really aggregating some incredibly boring and
 complex customer data for an undisclosed client.

 As it turns out,
 Aggregate will not work when some of x's are NA, unless you convert them to
 factors, with NA's included.

 In my case, the data is so big that doing the conversions causes other
 memory problems, and renders some of my numeric values useless for other
 calculations.

 My real data looks more like this (except with a few more categories and
 records):

 set.seed(100)
 library(plyr)
 dat=data.frame(
x1=sample(c(NA,'m','f'), 2e6, replace=TRUE),
x2=sample(c(NA, 1:10), 2e6, replace=TRUE),
x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE),
x4=sample(c(NA,T,F), 2e6, replace=TRUE),
x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6,
 replace=TRUE),
x6=sample(c(NA, 1:10), 2e6, replace=TRUE),
x7=sample(c(NA,'married','divorced','separated','single','etc'),
 2e6, replace=TRUE),
x8=sample(c(NA,T,F), 2e6, replace=TRUE),
y=trunc(rnorm(2e6)*1), stringsAsFactors=F)
 str(dat)
 ## The control total
 sum(dat$y, na.rm=T)
 ## The aggregate total
 sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x)
 ## The ddply total
 sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x)
{data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum)

 ddply worked a little better than I expected at first, but it slows to a
 crawl or has runs out of memory too often for me to invest the time
 learning
 how to use it.  Just now it worked for 1m records, and it was just a bit
 slower than aggregate.  But for the 2m example it hasn't finished
 calculating.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread Hadley Wickham

 There's definitely something amiss with aggregate() here since similar
 functions from other packages can reproduce your 'control' sum. I expect
 ddply() will have some timing issues because of all the subgrouping in your
 data frame, but data.table did very well and the summaryBy() function in the
 doBy package did OK:

Well, if you use the right plyr function, it works just fine:

system.time(count(dat, c(x1, x2, x3, x4, x4, x5, x6,
x7, x8), y))
#   user  system elapsed
#  9.754   1.314  11.073

Which illustrates something that I've believed for a while about
data.table - it's not the indexing that speed things up, it's the
custom data structure.  If you use ddply with data frames, it's slow
because data frames are slow.  I think the right way to resolve this
is to to make data frames more efficient, perhaps using some kind of
mutable interface where necessary for high-performance operations.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] questions about counting numbers

2011-02-06 Thread Carrie Li

Hello R-helpers,

I have a question about counting numbers.
Here is a simple example.

a=c(2, 3, 3,4)
 table(a)
a
2 3 4
1 2 1

so, I can to create another variables that has the corresponding counting
numbers.
In this case, I want to have:

b=c(1,2,2,1)

Is there any way coding for this ?

Thanks for helps!

Carrie--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] questions about counting numbers

2011-02-06 Thread Jorge Ivan Velez

Hi Carrie,

Try

 x - rle(a)
 rep(x$lengths, x$lengths)
[1] 1 2 2 1

HTH,
Jorge


On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li  wrote:

 Hello R-helpers,

 I have a question about counting numbers.
 Here is a simple example.

 a=c(2, 3, 3,4)
  table(a)
 a
 2 3 4
 1 2 1

 so, I can to create another variables that has the corresponding counting
 numbers.
 In this case, I want to have:

 b=c(1,2,2,1)

 Is there any way coding for this ?

 Thanks for helps!

 Carrie--

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] questions about counting numbers

2011-02-06 Thread Ray Brownrigg

It's not quite clear what the OP really wanted.  A more general solution may be:

 a=c(4, 3, 2, 3)
 ta - table(a)
 ta
a
2 3 4
1 2 1
 ta[as.character(a)]
a
4 3 2 3
1 2 1 2

HTH
Ray Brownrigg

On Mon, 07 Feb 2011, Jorge Ivan Velez wrote:
 Hi Carrie,

 Try

  x - rle(a)
  rep(x$lengths, x$lengths)

 [1] 1 2 2 1

 HTH,
 Jorge

 On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li  wrote:
  Hello R-helpers,
 
  I have a question about counting numbers.
  Here is a simple example.
 
  a=c(2, 3, 3,4)
 
   table(a)
 
  a
  2 3 4
  1 2 1
 
  so, I can to create another variables that has the corresponding counting
  numbers.
  In this case, I want to have:
 
  b=c(1,2,2,1)
 
  Is there any way coding for this ?
 
  Thanks for helps!
 
  Carrie--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread David Winsemius



On Feb 6, 2011, at 7:41 PM, Hadley Wickham wrote:

There's definitely something amiss with aggregate() here since  
similar
functions from other packages can reproduce your 'control' sum. I  
expect
ddply() will have some timing issues because of all the subgrouping  
in your
data frame, but data.table did very well and the summaryBy()  
function in the

doBy package did OK:


Well, if you use the right plyr function, it works just fine:

system.time(count(dat, c(x1, x2, x3, x4, x4, x5, x6,
x7, x8), y))
#   user  system elapsed
#  9.754   1.314  11.073

Which illustrates something that I've believed for a while about
data.table - it's not the indexing that speed things up, it's the
custom data structure.  If you use ddply with data frames, it's slow
because data frames are slow.  I think the right way to resolve this
is to to make data frames more efficient, perhaps using some kind of
mutable interface where necessary for high-performance operations.


Data.frames are also fat. Simply adding a single new column to a  
dataset bordering on large (5 million rows by 200 columns) requires  
more memory than even twice the size of the full dataframe. (Paging  
ensues on a Mac with 24GB.)  Unless, of course, there is a more memory- 
efficient strategy than:


 dfrm$newcol - with(dfrm, func(variables) ).

The table() operation on the other hand is blazingly fast and requires  
practically no memory.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] questions about counting numbers

2011-02-06 Thread Carrie Li

Many thanks!
all methods work well! thanks again!

On Sun, Feb 6, 2011 at 8:42 PM, Ray Brownrigg
ray.brownr...@ecs.vuw.ac.nzwrote:

 It's not quite clear what the OP really wanted.  A more general solution
 may be:

  a=c(4, 3, 2, 3)
  ta - table(a)
  ta
 a
 2 3 4
 1 2 1
  ta[as.character(a)]
 a
 4 3 2 3
 1 2 1 2
 
 HTH
 Ray Brownrigg

 On Mon, 07 Feb 2011, Jorge Ivan Velez wrote:
  Hi Carrie,
 
  Try
 
   x - rle(a)
   rep(x$lengths, x$lengths)
 
  [1] 1 2 2 1
 
  HTH,
  Jorge
 
  On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li  wrote:
   Hello R-helpers,
  
   I have a question about counting numbers.
   Here is a simple example.
  
   a=c(2, 3, 3,4)
  
table(a)
  
   a
   2 3 4
   1 2 1
  
   so, I can to create another variables that has the corresponding
 counting
   numbers.
   In this case, I want to have:
  
   b=c(1,2,2,1)
  
   Is there any way coding for this ?
  
   Thanks for helps!
  
   Carrie--


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] questions about counting numbers

2011-02-06 Thread jim holtman

?ave

 a
[1] 2 3 3 4
 cbind(a, ave(a, a, FUN=length))
 a
[1,] 2 1
[2,] 3 2
[3,] 3 2
[4,] 4 1



On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li carrieands...@gmail.com wrote:
 Hello R-helpers,

 I have a question about counting numbers.
 Here is a simple example.

 a=c(2, 3, 3,4)
 table(a)
 a
 2 3 4
 1 2 1

 so, I can to create another variables that has the corresponding counting
 numbers.
 In this case, I want to have:

 b=c(1,2,2,1)

 Is there any way coding for this ?

 Thanks for helps!

 Carrie--

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] delete rows

2011-02-06 Thread Christopher Porter

Hello. I came across your response in an R forum and could use your help. I 
have a data set with 472 rows. I want to delete rows 416 through 472. The name 
of my data set is MERGE.

I am an extreme R novice. How do I write a script to accomplish this?


Thank you.



---
Christopher H. Porter, M.A., M.Ed.
Director, Undergraduate Recruitment
College of Engineering and Science
Clemson University

106B Holtzendorff Hall
(864) 656-7870
(864) 656-1327 - Fax
AIM: ClemsonCES
http://www.clemson.edu/ces/psu/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding non-normal distributions per row of data frame?

2011-02-06 Thread DB1984


Hugo - thanks for the link, but that's a bit beyond me right now. I have
ideas of the distribution for some 'nuggets', but others might be new. I
hope that there's an easier way to pick them out than going after
pre-defined patterns...

Greg - I use bioconductor for a lot of processing, but I'm not aware of
anything that does this job in particular. Still looking though.

Thanks for your input.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3263461.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsampling out of site*abundance matrix

2011-02-06 Thread B77S


I figured there would be an even more straightforward way, but that works
David, thanks.  

There has to be a way to get the output I want/need (see below).  I tried to
bind or merge the elements of apply(samptbl, 2, table) but with no
success.  I could probably make a for loop with a merge statement, it would
work.. but I'm guessing unnecessary and just plain ugly.  

## what I want/need

 spA spB spC spD spa  spF  spG
site1   813   613   320 28
site2   31  25   0  0   25   19  0
site300   9 510 040
site4   27   19  0 022   32  0

If you know, I'd appreciate it.. thanks again for the help. 




David Winsemius wrote:
 
 
 On Feb 6, 2011, at 3:25 PM, B77S wrote:
 

 Hello,
 How can I randomly sample individuals within a sites from a site  
 (row) X
 species abundance (column) data frame or matrix?  As an example, the  
 matrix
 abund2 made below.

 # (sorry, Im a newbie and this is the only way I know to get an  
 example
 on here)

 abund1 -c(150,  300,  0,  360,  150,  300,  0,  240,  150,
 0,  60,
 0, 150,  0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0)
 abund2 - matrix(data=abund1, nrow=4, ncol=7)
 colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG)
 rownames(abund2)-c(site1, site2, site3, site4)
 
 Perfect. Best submission of an example by a newbie in weeks.
 

 #

 abund2
  spA spB spC spD spa spF spG
 site1 150 150 150 150   0 300 300
 site2 300 300   0   0 300 300   0
 site3   0   0  60 540   0   0 600
 site4 360 240   0   0 240 360   0

 How can I make a random subsample of 100 individuals from the  
 abundances
 given for each site?
 
 samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100,  
 prob=x, replace=TRUE) )
 samptbl
 
 site1 site2 site3 site4
[1,] spG spa spD spF
[2,] spF spF spG spB
[3,] spF spB spC spA
[4,] spD spa spG spA
[5,] spF spa spD spa
[6,] spA spB spD spF
[7,] spA spF spD spA
[8,] spG spF spG spa
[9,] spF spF spG spa
   [10,] spG spB spD spA
 
 Snipped
 
 apply() always transposes the results when called with row margins.  
 The t() function would fix this if it needed to be arranged with  
 rows by site. You could check by further apply-(cation) of table to  
 the columns:
   apply(samptbl, 2, table)
 $site1
 
 spA spB spC spD spF spG
8  13   6  13  32  28
 
 $site2
 
 spa spA spB spF
   25  31  25  19
 
 $site3
 
 spC spD spG
9  51  40
 
 $site4
 
 spa spA spB spF
   22  27  19  32
 

 This is probably really easy.
 
 
 Thanks.
 Bubba
 -- 
 View this message in context:
 http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263488.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] delete rows

2011-02-06 Thread Joshua Wiley

Hi,

On Sun, Feb 6, 2011 at 6:16 PM, Christopher Porter cpor...@clemson.edu wrote:
 Hello. I came across your response in an R forum and could use your help. I 
 have a data set with 472 rows. I want to delete rows 416 through 472. The 
 name of my data set is MERGE.

Try this:

MERGE2 - MERGE[-c(416:472), ]

Cheers,

Josh


 I am an extreme R novice. How do I write a script to accomplish this?


 Thank you.



 ---
 Christopher H. Porter, M.A., M.Ed.
 Director, Undergraduate Recruitment
 College of Engineering and Science
 Clemson University

 106B Holtzendorff Hall
 (864) 656-7870
 (864) 656-1327 - Fax
 AIM: ClemsonCES
 http://www.clemson.edu/ces/psu/


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] delete rows

2011-02-06 Thread Santosh Srinivas

?subset

On Mon, Feb 7, 2011 at 7:46 AM, Christopher Porter cpor...@clemson.edu wrote:
 Hello. I came across your response in an R forum and could use your help. I 
 have a data set with 472 rows. I want to delete rows 416 through 472. The 
 name of my data set is MERGE.

 I am an extreme R novice. How do I write a script to accomplish this?


 Thank you.



 ---
 Christopher H. Porter, M.A., M.Ed.
 Director, Undergraduate Recruitment
 College of Engineering and Science
 Clemson University

 106B Holtzendorff Hall
 (864) 656-7870
 (864) 656-1327 - Fax
 AIM: ClemsonCES
 http://www.clemson.edu/ces/psu/


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] delete rows

2011-02-06 Thread Bert Gunter

On Sun, Feb 6, 2011 at 6:16 PM, Christopher Porter cpor...@clemson.eduwrote:

 Hello. I came across your response in an R forum and could use your help. I
 have a data set with 472 rows. I want to delete rows 416 through 472. The
 name of my data set is MERGE.

 I am an extreme R novice. How do I write a script to accomplish this?


By first reading An Introduction to R before posting such basic questions
on this list??

-- Bert



 Thank you.



 ---
 Christopher H. Porter, M.A., M.Ed.
 Director, Undergraduate Recruitment
 College of Engineering and Science
 Clemson University

 106B Holtzendorff Hall
 (864) 656-7870
 (864) 656-1327 - Fax
 AIM: ClemsonCES
 http://www.clemson.edu/ces/psu/


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsampling out of site*abundance matrix

2011-02-06 Thread David Winsemius



On Feb 6, 2011, at 9:35 PM, B77S wrote:



I figured there would be an even more straightforward way, but that  
works

David, thanks.


I am rather puzzled. What prior experience with computing would lead  
you to believe that one line of code was not a straightforward method  
to do multinomial sampling from 4 different sets of probabilities???




There has to be a way to get the output I want/need (see below).  I  
tried to

bind or merge the elements of apply(samptbl, 2, table) but with no
success.  I could probably make a for loop with a merge statement,  
it would

work.. but I'm guessing unnecessary and just plain ugly.

## what I want/need

spA spB spC spD spa  spF  spG
site1   813   613   320 28
site2   31  25   0  0   25   19  0
site300   9 510 040
site4   27   19  0 022   32  0

If you know, I'd appreciate it.. thanks again for the help.


In order to keep the zero entries straight, there needs to be some  
sort of structure that records the fact that there were not  
occurrences in a column. Using a factor construct is the method R uses  
for that purpose. Factors work better in data.frames:


 set.seed(123)
 samptbl - apply(abund2, 1, function(x) sample(colnames(abund2),  
100, prob=x, replace=TRUE) )

 sampdf - as.data.frame(samptbl)
 sampdf[[1]] -  factor( sampdf[[1]], levels= colnames(abund2) )
 sampdf[[2]] -  factor( sampdf[[2]], levels= colnames(abund2) )
 sampdf[[3]] -  factor( sampdf[[3]], levels= colnames(abund2) )
 sampdf[[4]] -  factor( sampdf[[4]], levels= colnames(abund2) )
 sapply(sampdf, table)
site1 site2 site3 site4
spA1420 031
spB1230 020
spC 8 0 7 0
spD13 041 0
spa 020 019
spF2630 030
spG27 052 0

Again, the t() function would flip that:

 t( sapply(sampdf, table) )
  spA spB spC spD spa spF spG
site1  14  12   8  13   0  26  27
site2  20  30   0   0  20  30   0
site3   0   0   7  41   0   0  52
site4  31  20   0   0  19  30   0

--
david.



David Winsemius wrote:



On Feb 6, 2011, at 3:25 PM, B77S wrote:



Hello,
How can I randomly sample individuals within a sites from a site
(row) X
species abundance (column) data frame or matrix?  As an example, the
matrix
abund2 made below.

# (sorry, Im a newbie and this is the only way I know to get an
example
on here)

abund1 -c(150,  300,  0,  360,  150,  300,  0,  240,  150,
0,  60,
0, 150,  0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600,  
0)

abund2 - matrix(data=abund1, nrow=4, ncol=7)
colnames(abund2) - c(spA, spB, spC, spD, spa, spF,  
spG)

rownames(abund2)-c(site1, site2, site3, site4)


Perfect. Best submission of an example by a newbie in weeks.



#


abund2

spA spB spC spD spa spF spG
site1 150 150 150 150   0 300 300
site2 300 300   0   0 300 300   0
site3   0   0  60 540   0   0 600
site4 360 240   0   0 240 360   0

How can I make a random subsample of 100 individuals from the
abundances
given for each site?


samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100,
prob=x, replace=TRUE) )
samptbl

   site1 site2 site3 site4
  [1,] spG spa spD spF
  [2,] spF spF spG spB
  [3,] spF spB spC spA
  [4,] spD spa spG spA
  [5,] spF spa spD spa
  [6,] spA spB spD spF
  [7,] spA spF spD spA
  [8,] spG spF spG spa
  [9,] spF spF spG spa
 [10,] spG spB spD spA

Snipped

apply() always transposes the results when called with row margins.
The t() function would fix this if it needed to be arranged with
rows by site. You could check by further apply-(cation) of table to
the columns:

apply(samptbl, 2, table)

$site1

spA spB spC spD spF spG
  8  13   6  13  32  28

$site2

spa spA spB spF
 25  31  25  19

$site3

spC spD spG
  9  51  40

$site4

spa spA spB spF
 22  27  19  32



This is probably really easy.




Thanks.
Bubba
--
View this message in context:
http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
View this message in context: 
http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263488.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do

Re: [R] Fortran and long integers

2011-02-06 Thread Earl F Glynn



Peter Langfelder wrote:

Hi all,

I'm hoping someone more knowledgeable in Fortran than I can chime in
with opinion.

I'm the maintainer of the flashClust package that implements fast
hierarchical clustering. The fortran code fails when the number of
clustered objects is larger than about 46300. My guess is that this is
because the code uses the following construct:


2-byte (16 bit) signed integers would have a range from -32768 to 
+37267.  So, it looks like you may be using 2-byte integers and 46,300 
would definitely cause an overflow with 16-bit integers.


I haven't used Fortran for a long time, but there could be a compiler 
switch that forces all 2-byte integers, or a specific declaration that 
says I, J, N, IOFFSET are only 2-byte (16-bit) integers.


I'm guess, but you might try a specification like

  INTEGER*4 I, J, N, IOFFSET

assuming INTEGER*4 is legal with your Fortran compiler:

http://gcc.gnu.org/onlinedocs/gfortran/Old_002dstyle-kind-specifications.html#Old_002dstyle-kind-specifications

efg

Earl F Glynn
Overland Park, KS

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Giving vector of colors to line in plots

2011-02-06 Thread statmobile


Hey all,

I can't for the life of me figure out what I'm missing here.  I'm trying 
to change the color of the line in a time series type plot.  I can 
change the point colors and symbols no problem, but for some reason the 
colors do not get passed to the lines, regardless of if I do type=b or 
type=l.  The sample code I'm using is below.


Any help would be greatly appreciated.

Also, please CC me, as I only get daily summaries of the mailing list.

Thanks,
Brian



## Changing plot attributes through the plot
set.seed(33)
x - rpois(7,lambda=7)
y - rpois(7,lambda=5)

cols.x - c(rep(black,2),rep(red,3),rep(black,2))
cols.y - c(rep(blue,3),rep(yellow,2),rep(blue,2))

points.x - c(rep(x,2),rep(O,3),rep(x,2))
points.y - c(rep(8,3),rep(17,2),rep(8,2))

plot(x,col=cols.x,pch=points.x,type=b,ylim=c(0,15))
points(y,col=cols.y,pch=points.y,type=b)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortran and long integers

2011-02-06 Thread Berend Hasselman



Earl F Glynn wrote:
 
 
 2-byte (16 bit) signed integers would have a range from -32768 to 
 +37267.  So, it looks like you may be using 2-byte integers and 46,300 
 would definitely cause an overflow with 16-bit integers.
 
 I haven't used Fortran for a long time, but there could be a compiler 
 switch that forces all 2-byte integers, or a specific declaration that 
 says I, J, N, IOFFSET are only 2-byte (16-bit) integers.
 
 I'm guess, but you might try a specification like
 
INTEGER*4 I, J, N, IOFFSET
 
 assuming INTEGER*4 is legal with your Fortran compiler:
 

The overflow is not caused by 16 bits integers.
I'm quite sure the OP is using 32 bit integers.
The overflow is caused by  the multiplication N*(i-1) and/or i*(i+1).

In Fortran there's not much you can do about this unless your compiler
supports larger integers.
A pity that fortran doesn't have a posint.

Either your solution with doubles or a small C function looks like the way
out.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263605.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortran and long integers

2011-02-06 Thread Tsjerk Wassenaar

Hi,

Does it alleviate things if you rewrite the sums to avoid large products?

For I even:
J+I*(N-I/2)-(N-I/2)

For I odd:
J+I*(N-(I+1)/2)-(N-(I+1)/2)+(I+1)/2

Hope it helps,

Tsjerk


On Mon, Feb 7, 2011 at 7:21 AM, Berend Hasselman b...@xs4all.nl wrote:


 Earl F Glynn wrote:


 2-byte (16 bit) signed integers would have a range from -32768 to
 +37267.  So, it looks like you may be using 2-byte integers and 46,300
 would definitely cause an overflow with 16-bit integers.

 I haven't used Fortran for a long time, but there could be a compiler
 switch that forces all 2-byte integers, or a specific declaration that
 says I, J, N, IOFFSET are only 2-byte (16-bit) integers.

 I'm guess, but you might try a specification like

    INTEGER*4 I, J, N, IOFFSET

 assuming INTEGER*4 is legal with your Fortran compiler:


 The overflow is not caused by 16 bits integers.
 I'm quite sure the OP is using 32 bit integers.
 The overflow is caused by  the multiplication N*(i-1) and/or i*(i+1).

 In Fortran there's not much you can do about this unless your compiler
 supports larger integers.
 A pity that fortran doesn't have a posint.

 Either your solution with doubles or a small C function looks like the way
 out.

 Berend
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263605.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Tsjerk A. Wassenaar, Ph.D.

post-doctoral researcher
Molecular Dynamics Group
* Groningen Institute for Biomolecular Research and Biotechnology
* Zernike Institute for Advanced Materials
University of Groningen
The Netherlands

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortran and long integers

2011-02-06 Thread Berend Hasselman



Tsjerk Wassenaar wrote:
 
 Hi,
 
 Does it alleviate things if you rewrite the sums to avoid large products?
 
 For I even:
 J+I*(N-I/2)-(N-I/2)
 

Shouldn't that be 

J+I*(N-I/2)-(N+I/2)  ?

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263647.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortran and long integers

2011-02-06 Thread Tsjerk Wassenaar

Yes, thnx... Typo :$

On Mon, Feb 7, 2011 at 8:23 AM, Berend Hasselman b...@xs4all.nl wrote:


 Tsjerk Wassenaar wrote:

 Hi,

 Does it alleviate things if you rewrite the sums to avoid large products?

 For I even:
 J+I*(N-I/2)-(N-I/2)


 Shouldn't that be

 J+I*(N-I/2)-(N+I/2)  ?

 Berend
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263647.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Tsjerk A. Wassenaar, Ph.D.

post-doctoral researcher
Molecular Dynamics Group
* Groningen Institute for Biomolecular Research and Biotechnology
* Zernike Institute for Advanced Materials
University of Groningen
The Netherlands

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Image to plot

2011-02-06 Thread Alaios

Dear all I would like to plot the contents of a matrix as an Image. I found 
this code here http://www.phaget4.org/R/image_matrix.html but this is not only 
what I want. Instead of having inside every cell the color of the cell it would 
be nice to have also the arithmetic value over the background color. 

Having only the color sometimes does not make it clear to understand what is 
the value each cell has .. so I was thinking to combine colors and text inside 
every cell.

I would like to thank you in advance for your help
Best Regards
Alex.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice: Synchronizing key and lines; or: why is lty used as lwd in legend?

2011-02-06 Thread Dieter Menne

Dear latticists,

Getting legend and graph line types synchronized has been a challenge in
lattice; the last example below that works looks ridiculously ugly for such
a simple job, I am sure there is a more elegant solution.

When trying this out, I found a feature which I found confusing: why are
line types (lty) translated to line widths (lwd) in the legend? See
attached pdf in case this should not be reproducible on your computer. 

Dieter
 

Library(lattice)
os = OrchardSprays[OrchardSprays$rowpos 4,]
os$rowpos = as.factor(os$rowpos)

# Set it globally
# This is confusing: why is lty used as lwd in legend??
trellis.par.set(superpose.line = list(col=c(blue,red,green), 
  lwd = 2, lty=c(1,4,6)))
xyplot(decrease ~ treatment, os, groups = rowpos,
   type = a,  auto.key =
 list(space = right, points = FALSE, lines = TRUE))
dev.off()  

# To me, this seems to be the most consistent method if it worked.
s= simpleTheme(lty=c(1,4,6),col=c(red,blue,green),lwd=3)
xyplot(decrease ~ treatment, os, groups = rowpos,
   par.settings=s,
   type = a,  auto.key =
 list(space = right, points = FALSE, lines = TRUE))

# !!! There must be an easier way to synchronize legend and plot
xyplot(decrease~treatment,data=os,groups=rowpos,
  type = a,par.settings=s, 
  key = list(text=list(levels(os$rowpos)),
  lines=list(lty=s$superpose.line$lty,
  lwd=s$superpose.line$lwd,
  col= s$superpose.line$col)))


R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C   
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base 

other attached packages:
[1] lattice_0.19-17

loaded via a namespace (and not attached):
[1] grid_2.12.1  tools_2.12.1



lwdAsLty.pdf
Description: lwdAsLty.pdf
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsampling out of site*abundance matrix

2011-02-06 Thread B77S


hehe...
very true sir; I apologize, that was very straightforward.  Thank you for
your time. 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263598.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FW: multivariate regression

2011-02-06 Thread Deniz SIGIRLI



#I have got 3 dependent variables:

Y-matrix(c(3,5,6,3,4,2,4,5,3,2,3,5,6,3,4,2,4,5,3,2,3,5,6,3,4,2,4,5,3,2), nrow 
= 10, ncol=3, byrow=TRUE)
#I've got one independent variable:

X-matrix(c(42,54,67,76,45,76,54,87,34,65), nrow = 10, ncol=1, byrow=TRUE)
summary(lm(Y~X))


and the result is as below:
 Response Y1 :

Call:
lm(formula = Y1 ~ X)

Residuals:
Min  1Q  Median  3Q Max 
-1.5040 -0.8838 -0.3960  1.1174  2.1162 

Coefficients:
Estimate Std. Error t value Pr(|t|)  
(Intercept)  4.435071.70369   2.603   0.0315 *
X   -0.012250.02742  -0.447   0.6668  
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 1.401 on 8 degrees of freedom
Multiple R-squared: 0.02435,Adjusted R-squared: -0.09761 
F-statistic: 0.1997 on 1 and 8 DF,  p-value: 0.6668 


Response Y2 :

Call:
lm(formula = Y2 ~ X)

Residuals:
Min  1Q  Median  3Q Max 
-1.4680 -0.8437 -0.2193  0.9050  1.9960 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  1.379941.50111   0.9190.385
X0.038670.02416   1.6010.148

Residual standard error: 1.235 on 8 degrees of freedom
Multiple R-squared: 0.2426, Adjusted R-squared: 0.1479 
F-statistic: 2.562 on 1 and 8 DF,  p-value: 0.1481 


Response Y3 :

Call:
lm(formula = Y3 ~ X)

Residuals:
Min  1Q  Median  3Q Max 
-1.7689 -0.7316 -0.1943  1.1448  2.0933 

Coefficients:
Estimate Std. Error t value Pr(|t|)  
(Intercept)  4.389131.70626   2.5720.033 *
X   -0.011490.02746  -0.4180.687  
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 1.403 on 8 degrees of freedom
Multiple R-squared: 0.0214, Adjusted R-squared: -0.1009 
F-statistic: 0.175 on 1 and 8 DF,  p-value: 0.6867 

 

There are 3 F statistics, R2 and p-values. But I want just one R2 and pvalue 
for my multivariate regression model. 

 

 

 

 

 Date: Fri, 4 Feb 2011 08:23:39 -0500
 From: jsor...@grecc.umaryland.edu
 To: denizsigi...@hotmail.com; r-help@r-project.org
 Subject: Re: [R] multivariate regression
 
 Please help us help you. Follow the posting rules and send us a copy of your 
 code and output.
 John
 John Sorkin
 Chief Biostatistics and Informatics
 Univ. of Maryland School of Medicine
 Division of Gerontology and Geriatric Medicine
 jsor...@grecc.umaryland.edu 
 -Original Message-
 From: Deniz SIGIRLI denizsigi...@hotmail.com
 To: r-help@r-project.org
 
 Sent: 2/4/2011 7:54:56 AM
 Subject: [R] multivariate regression
 
 
 How can I run multivariate linear regression in R (I have got 3 dependent 
 variables and only 1 independent variable)? I tried lm function, but it gave 
 different R2 and p values for every dependent variable. I need one R2 and p 
 value for the model. 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Confidentiality Statement:
 This email message, including any attachments, is for ...{{dropped:5}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

66 matches

Mail list logo