[R] Background fill and border for a legend in dotplot

2011-09-02 Thread markm0705
Dear R help group

I've been working on this plot for a while now and now getting around to the
minor adjusments.  I would like to be able to put a border and background
fill around the legend in this plot.  

I understand the legend 'bty' should do this have this capablity but not
sure how the syntax works in this case

## initalise
library(lattice)
library(latticeExtra) # for mergedTrellisLegendGrob()

##read the data to a variable
#

Cal_dat - read.table(Calibration2.dat,header = TRUE,sep = \t,)

## set up plotting colours
#
col.pat-c(violet,cyan,green,red,blue,black,yellow) 
sym.pat-c(19,20,21)

##set up the plot key
#
key1 -
   draw.key(list(text=list(levels(Cal_dat$Commodity)),
 title=Ore type,
 points=list(pch=22, cex=1.3, fill=col.pat, col=black)),
draw = FALSE)
key2 -
   draw.key(list(text=list(levels(factor(Cal_dat$Year))),
 title=Year,
 points = list(pch = c(21, 22, 23), cex=1.3, col=black)),
draw = FALSE)

mkey -
   mergedTrellisLegendGrob(list(fun = key2),
   list(fun = key1),
   vertical = TRUE
)   

##set some parameters for the plot
#
trellis.par.set(
dot.line=list(col = grey90, lty=dashed),
axis.line=list(col = grey50),
axis.text=list(col =grey50, cex=0.8),
panel.background=list(col=transparent),
par.xlab.text= list(col=grey50),  
)

## Create the dot plot
#
with(Cal_dat,
dotplot(reorder(paste(Mine,Company), Resc_Gt) ~ Resc_Gt,
fill_var = Commodity,
pch_var = factor(Year),
cex=1.2,
pch = c(21, 22, 23),
col = black,
fill = col.pat,
aspect = 2.0,
alpha=0.6,
legend = list(inside = list(fun = mkey,corner = c(0.95, 0.01))),
scales = list(x = list(log = 10)), 
xscale.components = xscale.components.log10ticks,
origin = 0,
type = c(p,a), 
main = Mineral resources,
xlab= Total tonnes (billions),
panel = function(x, y, ..., subscripts,
 fill, pch, fill_var, pch_var) {
pch - pch[pch_var[subscripts]]
fill - fill[fill_var[subscripts]]
panel.dotplot(x, y, pch = pch, fill = fill, ...)
}))

panel = function(x, y, ..., subscripts,
 fill, pch, fill_var, pch_var) {
pch - pch[pch_var[subscripts]]
fill - fill[fill_var[subscripts]]
panel.dotplot(x, y, pch = pch, fill = fill, ...)
}

http://r.789695.n4.nabble.com/file/n3785003/Calibration2.dat
Calibration2.dat 

--
View this message in context: 
http://r.789695.n4.nabble.com/Background-fill-and-border-for-a-legend-in-dotplot-tp3785003p3785003.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Urgente

2011-09-02 Thread Frank Mill
 - This mail is in HTML. Some elements may be ommited in plain text. -

Meu nome é Frank P, eu trabalhei com o Banco SNS em Holland.I habe Você 
contactado por conta de parente falecido com os nossos Banj e que foi deixado 
sem pretensões.
Por favor, me envia um email com um número de telefone e também indica o melhor 
momento que eu possa chamá-lo por telefone.
Isto irá permitir-me dar mais informações sobre este assunto. Por favor, note 
que eu estou atualmente na Inglaterra, em estudos adicionais para o meu 
trabalho.
Regards
Frank P
www.snsbank.nl
447741782385


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about BIC of two different regression models? how should we compare two regression models?

2011-09-02 Thread Ben Bolker
Andra Isan andra_isan at yahoo.com writes:

 
 Hi All, 
 In order to compare two different logistic regressions, 
 I think I need to compare them based on their BIC
 values, but I am not sure if the smaller BIC would mean a better 
 model or the reverse is true?
 Thanks a lot,Andra

  Smaller (i.e. lower value) BIC is always better 
(even if BIC happens to be negative, as can happen in some cases; 
i.e. BIC=-1002 is better than BIC=-1000, BIC=1000 is better than BIC=1002).

  I would suggest however that (a) there are better venues for this
question (e.g. stats.stackexchange.com), since it's a stats and not
an R question; (b) it might be a good idea to review a stats text,
or even http://en.wikipedia.org/wiki/Bayesian_information_criterion ,
since this is a pretty basic question.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two question about plot

2011-09-02 Thread Petr PIKAL
Hi
 
 Re: [R] two question about plot
 
 The help for boxplot offers suggestions for both those things. You may 
be
 particularly interested in:
 
names: group labels which will be printed under each boxplot.  Can
   be a character vector or an expression (see plotmath).
 
 add: logical, if true _add_ boxplot to current plot.
 
 Sarah

It could be also worth to consult also bwplot from lattice or especially 
ggplot2 package for plotting.

Regards
Petr



 
 
 On Thu, Sep 1, 2011 at 1:31 PM, Jie TANG totang...@gmail.com wrote:
  1) how to modify the the tickment of x-axis or y-axis.
boxplot(data[,1:5])
   the tickment in x-axis in V1 V2 V3 V4 V5 ,I want to be some name for
  example
   name-c(1day,2day,3day,4day,5day)
 
  2) how to overlap two plot into one figure?
   plot(data[1:5])
   boxplot(newdata[,1:5])
  ?
 
  --
  TANG Jie
 
 
 
 
 -- 
 Sarah Goslee
 http://www.functionaldiversity.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 to create a square plot

2011-09-02 Thread Paul Hiemstra
 On 09/01/2011 09:02 PM, baptiste auguie wrote:
 Hi,

 Are you after this?

 last_plot() + opts(aspect.ratio=1)

Even better (I once got correct by Hadley for using aspect.ratio, but
this was plotting spatial data...)

last_plot() + coord_equal()

cheers,
Paul

 Also, see https://github.com/hadley/ggplot2/wiki/Themes for some
 settings re: plot margins.

 HTH,

 baptiste

 On 1 September 2011 05:18, Alaios ala...@yahoo.com wrote:
 Dear all,
 I am using ggplot with geom_tile to print as an image a matrix  I have. My 
 matrix is a squared one of 512*512 cells.

 The code that does that is written below


 print(v + geom_tile(aes(fill=dB))+ 
 opts(axis.text.x=theme_text(size=20),axis.text.y=theme_text(size=20), 
 axis.title.x=theme_text(size=25) , axis.title.y=theme_text(size=25), 
 legend.title=theme_text(size=25,hjust=-0.4) , 
 legend.text=theme_text(size=20)) + scale_x_continuous('km')  + 
 scale_y_continuous('km'))


 as you can see from the picture below

 http://imageshack.us/photo/my-images/171/backupf.jpg/

 this squared matrix is printed a bit squeezed with the height being bigger 
 than the width. Would be possible somehow to print that plot by keeping the 
 square-look of the matrix in the plot? Of course the other elements like 
 axis and legend will make the over all plot to not be square but I do not 
 care as the blue and red region forms a square.

 I would like to thank you in advance for your help
 B.R
 Alex

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 to create a square plot

2011-09-02 Thread Alaios
Dear Dennis,
I wouldl ike to thank you for your answer.
That did it work.
how can I know remove the space between the axis labels and the image (blue-red 
area)?

Right now things look like are floating a bit. I want the x,y labels to reduce 
the overall gap.

I would like to thank you in advance for your help

B.R
Alex




From: Dennis Murphy djmu...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Thursday, September 1, 2011 3:18 PM
Subject: Re: [R] ggplot2 to create a square plot

Hi:


 Dear Dennis,
 I would like to thank you for your reply.
 I also checked the web sites that you gave me, it is hard to find everything
 about ggplot2 at one place with concrete examples that help you understand
 directly what you are plotting.

There is work in progress to alleviate that problem, but it is still
being worked on and it may be a little while before it becomes
publicly available. That could mean anywhere from a few days to a few
months. I don't know what is the current status wrt the project.

 As you have already mentioned ggsave can save the image as I want to and
 that is what I am using now.
 One minor issue (as you have also mentioned is to remove they gray border
 between the x and y legend and the red and blue area.

See scale_continuous and look at the expand = argument; something like
  scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0))
might be what you need, but try it for yourself.

HTH,
Dennis

 Thus I checked the website and tried by applying the options to remove it.
 Unfortunately I ended up with a full list of more arguments
 print(v + geom_tile(aes(fill=dB))+
 opts(axis.text.x=theme_text(size=20),axis.text.y=theme_text(size=20),
 axis.title.x=theme_text(size=25) , axis.title.y=theme_text(size=25),
 legend.title=theme_text(size=25,hjust=-0.4) ,
 legend.text=theme_text(size=20) ,   panel.background=theme_blank() ,
 panel.margin=unit(100,lines) , panel.grid.major=theme_line(size=0.1) ,
 plot.margin=unit(c(0,0,0,0),lines) ) + scale_x_continuous('km')  +
 scale_y_continuous('km')    )
 so far I have not remove that extra space..
 Do you have any suggestion of how I can remove it?
 I would like to thank all for their time.
 B.R
 Alex
 
 From: Dennis Murphy djmu...@gmail.com

 Cc: R-help@r-project.org R-help@r-project.org
 Sent: Wednesday, August 31, 2011 9:34 PM
 Subject: Re: [R] ggplot2 to create a square plot

 Hi:

 I'd suggest using ggsave(); in particular, see its height = and width
 = arguments. If you have some time, you could look at some examples of
 ggplot2 themes:
 https://github.com/hadley/ggplot2/wiki/themes
 and some examples of how to use various opts():
 https://github.com/hadley/ggplot2/wiki/%2Bopts%28%29-List

 These can be useful if you need to reduce the amount of space around
 the plot or reposition the legend to the top or bottom to get more
 horizontal space for the plot. This sometimes is a germane issue when
 the plot is intended to be square.

 HTH,
 Dennis


 Dear all,
 I am using ggplot with geom_tile to print as an image a matrix  I have. My
 matrix is a squared one of 512*512 cells.

 The code that does that is written below


 print(v + geom_tile(aes(fill=dB))+
 opts(axis.text.x=theme_text(size=20),axis.text.y=theme_text(size=20),
 axis.title.x=theme_text(size=25) , axis.title.y=theme_text(size=25),
 legend.title=theme_text(size=25,hjust=-0.4) ,
 legend.text=theme_text(size=20)) + scale_x_continuous('km')  +
 scale_y_continuous('km')    )



 as you can see from the picture below

 http://imageshack.us/photo/my-images/171/backupf.jpg/

 this squared matrix is printed a bit squeezed with the height being bigger
 than the width. Would be possible somehow to print that plot by keeping the
 square-look of the matrix in the plot? Of course the other elements like
 axis and legend will make the over all plot to not be square but I do not
 care as the blue and red region forms a square.

 I would like to thank you in advance for your help
 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic Recoding

2011-09-02 Thread thomas.chesney
Thank you both for your replies. I've tried it with a small sample of the
data and it works perfectly. I have no idea yet how it works but I will
spend some time to figure it out.

Thank you!

Thomas

--
View this message in context: 
http://r.789695.n4.nabble.com/Automatic-Recoding-tp3784043p3785565.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calling R from PHP error

2011-09-02 Thread Katerina Karayianni
Hello,
I am having the following error while calling an R script through PHP.

/usr/local/bin/R: line 227: /kk/Programs/R-2.13.0/etc/ldpaths: Permission
denied
ERROR: R_HOME ('/kk/Programs/R-2.13.0') not found

I had compiled R from source and placed the generated R shell script in
/usr/local/bin.

Can you give me an insight of how to give permission to access the ldpaths
file and why is the R_HOME tree not found?

Thank you and regards

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : P values for vglm(zibinomial) function in VGAM

2011-09-02 Thread justin bem
Estimation is realized by MLE, estimators are asymptotically normal

Try this reg-vglm(...)
p.value-1-pnorm(abs(coef(reg)/sqrt(diag(vcov(reg)


 

Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774



De : suuz suuz_b...@hotmail.com
À : r-help@r-project.org
Envoyé le : Mardi 23 Août 2011 15h04
Objet : [R] P values for vglm(zibinomial) function in VGAM

Hi ,

I know this question has been asked twice in the past but to my knowldege,
it still hasn't been solved.

I am doing a zero inflated binomial model using the VGAM package, I need to
obtain p values for my Tvalues in the vglm output. code is as follows

 mod2=vglm(dmat~Season+Diel+Tidal.phase+Tidal.cycle,zibinomial, data=mp1)
 summary(mod2)

Call:
vglm(formula = dmat ~ Season + Diel + Tidal.phase + Tidal.cycle, 
    family = zibinomial, data = mp1)

Pearson Residuals:
               Min        1Q    Median        3Q     Max
logit(phi) -3.6496  0.273679  0.285619  0.296763  1.0974
logit(mu)  -6.3631 -0.029027 -0.020786 -0.011719 80.6051

Coefficients:
                  Value Std. Error   t value
(Intercept):1  2.365835   0.029142  81.18334
(Intercept):2 -3.182376   0.050054 -63.57944
SeasonSpring  -0.080840   0.054201  -1.49147
SeasonSummer   0.204781   0.049936   4.10083
SeasonWinter   0.385078   0.043874   8.77692
DielE         -0.079190   0.079859  -0.99163
DielM          0.071607   0.074620   0.95963
DielN          0.132377   0.036419   3.63488
Tidal.phaseNT -0.252715   0.054053  -4.67536
Tidal.phaseST  0.145777   0.045554   3.20010
Tidal.cycleF   0.114808   0.044897   2.55713
Tidal.cycleH  -0.074224   0.048063  -1.54428
Tidal.cycleL  -0.049681   0.047717  -1.04116

Number of linear predictors:  2 

Names of linear predictors: logit(phi), logit(mu)

Dispersion Parameter for zibinomial family:   1

Log-likelihood: -7831.693 on 30569 degrees of freedom

Number of Iterations: 13 

I have tried the suggestions 
pt(coef(summary(mod2)), 30569, lower.tail = TRUE, log.p = FALSE)

Value Std. Error      t value
(Intercept):1 0.9910021735  0.5116242 1.00e+00
(Intercept):2 0.0007310901  0.5199600 0.00e+00
SeasonSpring  0.4677849543  0.5216125 6.792427e-02
SeasonSummer  0.5811276264  0.5199133 9.999794e-01
SeasonWinter  0.6499089525  0.5174974 1.00e+00
DielE         0.4684409796  0.5318250 1.606939e-01
DielM         0.5285424555  0.5297410 8.313752e-01
DielN         0.5526565030  0.5145256 9.998607e-01
Tidal.phaseNT 0.4002451101  0.5215532 1.473475e-06
Tidal.phaseST 0.5579507181  0.5181669 9.993124e-01
Tidal.cycleF  0.5457011061  0.5179053 9.947207e-01
Tidal.cycleH  0.4704164848  0.5191670 6.126507e-02
Tidal.cycleL  0.4801883607  0.5190291 1.489050e-01

pt(coef(mod2), 30569)

(Intercept):1 (Intercept):2  SeasonSpring  SeasonSummer  SeasonWinter
0.9910021735  0.0007310901  0.4677849543  0.5811276264  0.6499089525 
        DielE         DielM         DielN Tidal.phaseNT Tidal.phaseST 
0.4684409796  0.5285424555  0.5526565030  0.4002451101  0.5579507181 
Tidal.cycleF  Tidal.cycleH  Tidal.cycleL 
0.5457011061  0.4704164848  0.4801883607


But these seem to be giving strange results, those which are clearly more
different to the basline levels (Autumn, D, bl and E) are coming up as least
significant. Perhaps it is my interpretation.

I couldnt follow
2*min(pt(coef(summary(mod2)), 30569), 1-pt(coef(summary(mod2)), 30569)) 
??

Does anyone know what I might be doing wrong or how to go about the last
code? 

I have read as much as I can find on VGAM and zib but have ran out of ideas.
I apologise if this appears to be a beginners question.

many thanks in advance

--
View this message in context: 
http://r.789695.n4.nabble.com/P-values-for-vglm-zibinomial-function-in-VGAM-tp3762858p3762858.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using capture.output within a function

2011-09-02 Thread Kristian Lind
Dear R-users

I'm running a maximum likelihood procedure using the spg package. I'd like
to save some output produced in each iteration to a file, but if I put the
capture.output() within the function I get the following message; Error in
spg(par = startval, fn = loglik, gr = NULL, method = 3, lower = lo,  :
  Failure in initial function evaluation!Error in -fn(par, ...) : invalid
argument to unary operator

I have considered putting the capture.output() after the function, but there
are some issues with R stalling on me so I'd like that the output is saved
for each iteration and not only at completion.

Any suggestions on how to get this done would be much appreciated.

Kristian Lind

*Below an example of what I'm trying to do...*


loglik - function(w){

state - c(b_1 = 0,
b_2 = 0,
a = 0)
#declaring ODEs
Kristian -function(t, state, w){
with(as.list(c(state, w)),
{
db_1 = -((w[1]+w[8])*b_1+(w[2]+w[6]*w[8]
+w[7]*w[9])*b_2+0.5*(b_1)^2+w[6]*b_1*b_2+0.5*
((w[6])^2+(w[7])^2)*(b_2)^2)
db_2 = -w[3]*b_2+1
da = w[1]*w[4]*b_1+(w[2]*w[4]+w[3]*w[5])*b_2
list(c(db_1, db_2, da))
})
}

# time making a sequence from t to T evaluated at each delta seq(t, T,
by = delta)
times - seq(0, 10, by = 0.5)

outmat - ode(y = state, times = times, func = Kristian, parms = w)
print(w)
print(outmat)
.
.
.
f-rep(NA, 1)
f[1] - 1/(T-1)*sum(log(pJ$p)-log(pJ$J))
f
capture.output(outmat, file = spgoutput.txt, append = TRUE)
}
fit - spg(fn =loglik, ...)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Maximum Likelihood using optim()

2011-09-02 Thread Thiem Alrik
Dear mailing list,

I would like to use the optim() command in order to maximize the logged 
likelihood of the following function, where p is the parameter of interest and 
should be constrained between 0 and positive infinity.

y =  1/2 * ((te - x)/(te - tc))^p

x and y are given by

x - c(5.18, 6.28, 7.00, 7.08, 7.54, 7.90, 8.24, 8.64, 12.17, 12.89, 14.27, 
15.38, 15.80, 16.46, 20.41, 21.27, 22.91)
y - c(0.63, 0.64, 0.66, 0.68, 0.69, 0.71, 0.73, 0.75, 0.76, 0.78, 0.8, 0.81, 
0.83, 0.85, 0.86, 0.88, 0.9)

te and tc are fixed at 5 and 25.

What is a good way to achieve this? Thanks a lot.

Alrik Thiem
Department of Humanities, Social and Political Sciences
ETH Zurich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] hydroTSM 0.3-0 and hydroGOF 0.3-0

2011-09-02 Thread Mauricio Zambrano-Bigiarini

Dear R users and hydrological/environmental community,

I'm glad to announce that a major (and recommended) update for the 
packages hydroTSM and hydroGOF are now available on CRAN:


-) hydroTSM: http://cran.r-project.org/web/packages/hydroTSM/
-) hydroGOF: http://cran.r-project.org/web/packages/hydroGOF/


###
# hydroTSM v0.3-0 #
###

hydroTSM is a package for management, analysis, interpolation and 
plotting of time series used in hydrology and related environmental 
sciences.


This new release collects feedback received since the first public 
release of the package (~ 1 year ago).
Major changes are related to improved plotting of time series, better 
and faster support for zoo objects, and new features in some functions.


A full list of changes can be found on:

http://www.rforge.net/hydroTSM/news.html


###
# hydroGOF v0.3-0 #
###

hydroGOF is a package implementing both statistical and graphical 
goodness-of-fit measures between observed and simulated values, mainly 
oriented to be used during the calibration, validation, and application 
of hydrological/environmental models.


Major changes in this new release are related to improved plotting of 
simulated vs observed values and a new vignette.


A full list of changes can be found on:

http://www.rforge.net/hydroGOF/news.html


###
#  Related Links  #
###

http://meetingorganizer.copernicus.org/EGU2010/EGU2010-13008.pdf

http://www.slideshare.net/hzambran/egu2010-ra-statisticalenvironmentfordoinghydrologicalanalysis-9095709

http://www.r-project.org/conferences/useR-2009/slides/Zambrano+Bigiarini.pdf


###
Bugs / comments / questions / collaboration of any kind are very
welcomed, and in particular, datasets that can be included in the
packages for academic purposes.



Kind regards,

Mauricio Zambrano-Bigiarini

--
===
FLOODS Action
Land Management and Natural Hazards Unit
Institute for Environment and Sustainability (IES)
European Commission, Joint Research Centre (JRC)
webinfo: http://floods.jrc.ec.europa.eu/
===
DISCLAIMER:\ The views expressed are purely those of th...{{dropped:7}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] betareg question - keeping the mean fixed?

2011-09-02 Thread betty_d
Thanks for your response, that does work, however, it is still not quite what
want. I would like to tell betareg what the mean is (in my case, 0.5) and
force it to use that value. Is this possible?

--
View this message in context: 
http://r.789695.n4.nabble.com/betareg-question-keeping-the-mean-fixed-tp3783303p3785683.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using capture.output within a function

2011-09-02 Thread Duncan Murdoch

On 11-09-02 5:24 AM, Kristian Lind wrote:

Dear R-users

I'm running a maximum likelihood procedure using the spg package. I'd like
to save some output produced in each iteration to a file, but if I put the
capture.output() within the function I get the following message; Error in
spg(par = startval, fn = loglik, gr = NULL, method = 3, lower = lo,  :
   Failure in initial function evaluation!Error in -fn(par, ...) : invalid
argument to unary operator


It looks as though you put capture.output() last in your function, so 
the result of the function is the result of the capture.output call, not 
the function value.


Duncan Murdoch



I have considered putting the capture.output() after the function, but there
are some issues with R stalling on me so I'd like that the output is saved
for each iteration and not only at completion.

Any suggestions on how to get this done would be much appreciated.

Kristian Lind

*Below an example of what I'm trying to do...*


loglik- function(w){

 state- c(b_1 = 0,
 b_2 = 0,
 a = 0)
 #declaring ODEs
 Kristian-function(t, state, w){
 with(as.list(c(state, w)),
 {
 db_1 = -((w[1]+w[8])*b_1+(w[2]+w[6]*w[8]
+w[7]*w[9])*b_2+0.5*(b_1)^2+w[6]*b_1*b_2+0.5*
((w[6])^2+(w[7])^2)*(b_2)^2)
 db_2 = -w[3]*b_2+1
 da = w[1]*w[4]*b_1+(w[2]*w[4]+w[3]*w[5])*b_2
 list(c(db_1, db_2, da))
 })
 }

 # time making a sequence from t to T evaluated at each delta seq(t, T,
by = delta)
 times- seq(0, 10, by = 0.5)

 outmat- ode(y = state, times = times, func = Kristian, parms = w)
 print(w)
 print(outmat)
.
.
.
f-rep(NA, 1)
f[1]- 1/(T-1)*sum(log(pJ$p)-log(pJ$J))
f
capture.output(outmat, file = spgoutput.txt, append = TRUE)
}
fit- spg(fn =loglik, ...)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Advice on large data structures

2011-09-02 Thread Jim Holtman
i would suggest that if you want to use R that you get a 64-bit version with 
24GB of memory to start.  if your data is a numeric matrix, you will need 8GB 
for a single copy.

Do you really need it all in memory at once, or can you partition the problem?  
Can you use a database to access the portion you need at any time?

If you only need one, or two, columns at a time, then the use of a database 
storing the columns might work.  You probably need some more analysis on 
exactly how you want to solve your problem understanding the limitations of the 
system.

Sent from my iPad

On Sep 2, 2011, at 1:13, Worik R wor...@gmail.com wrote:

 Friends
 
 I am starting on a (section of the) project where I need to build a matrix
 with on the order of 5 million rows and 200 columns
 
 I am wondering if I can stay in R.
 
 I need to do rollapply type operations on the columns, including some that
 will be functions of (windows of) two columns.
 
 I have been looking at the ff and bigmemory packages but am not sure that
 they will do.
 
 Before I get too deep can some one offer some wisdom about what the best
 direction to go would be?
 
 Switching to C/C++ is definitely an option if it is all too hard
 
 cheers
 Worik
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using capture.output within a function

2011-09-02 Thread Kristian Lind
Yes, that's true. It works as it's supposed to now.

Thank you.

Kristian Lind

2011/9/2 Duncan Murdoch murdoch.dun...@gmail.com

 On 11-09-02 5:24 AM, Kristian Lind wrote:

 Dear R-users

 I'm running a maximum likelihood procedure using the spg package. I'd like
 to save some output produced in each iteration to a file, but if I put the
 capture.output() within the function I get the following message; Error in
 spg(par = startval, fn = loglik, gr = NULL, method = 3, lower = lo,  :
   Failure in initial function evaluation!Error in -fn(par, ...) : invalid
 argument to unary operator


 It looks as though you put capture.output() last in your function, so the
 result of the function is the result of the capture.output call, not the
 function value.

 Duncan Murdoch


 I have considered putting the capture.output() after the function, but
 there
 are some issues with R stalling on me so I'd like that the output is saved
 for each iteration and not only at completion.

 Any suggestions on how to get this done would be much appreciated.

 Kristian Lind

 *Below an example of what I'm trying to do...*


 loglik- function(w){

 state- c(b_1 = 0,
 b_2 = 0,
 a = 0)
 #declaring ODEs
 Kristian-function(t, state, w){
 with(as.list(c(state, w)),
 {
 db_1 = -((w[1]+w[8])*b_1+(w[2]+w[6]***w[8]
 +w[7]*w[9])*b_2+0.5*(b_1)^2+w[**6]*b_1*b_2+0.5*
 ((w[6])^2+(w[7])^2)*(b_2)^2)
 db_2 = -w[3]*b_2+1
 da = w[1]*w[4]*b_1+(w[2]*w[4]+w[3]***w[5])*b_2
 list(c(db_1, db_2, da))
 })
 }

 # time making a sequence from t to T evaluated at each delta seq(t, T,
 by = delta)
 times- seq(0, 10, by = 0.5)

 outmat- ode(y = state, times = times, func = Kristian, parms = w)
 print(w)
 print(outmat)
 .
 .
 .
 f-rep(NA, 1)
 f[1]- 1/(T-1)*sum(log(pJ$p)-log(pJ$**J))
 f
 capture.output(outmat, file = spgoutput.txt, append = TRUE)
 }
 fit- spg(fn =loglik, ...)

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] If NA Problem!

2011-09-02 Thread Anna Dunietz
Hi All!

Please find code and the respective lists below.  My problem: I specify the
case that lilwin[[p]] is not an NA and want the code found in iwish to be
returned ONLY for that case.  Why do I get a list of length 2 (and why is
NULL the first element)? I understand that the code below is quite
senseless.  I have run into a problem while working on a large project and
wanted to simplify it in order for it to be more understandable and
accessible.  If I should not be using the if function, please let me know
what I should be doing instead.  I know that I must use the for function for
my project.  The thing I most want to understand is how, after specifying a
certain condition, one may save certain data that occurs when that condition
is met.   I hope I have been clear enough!

Thank you very much for your help!
Anna

biglist-list(a=1:4,b=2:6)
lilwin-list(x=NA,y=2)
lilloss-list(m=1,n=3)



 biglist$a
[1] 1 2 3 4

$b
[1] 2 3 4 5 6
 lilwin$x
[1] NA

$y
[1] 2
 lilloss$m
[1] 1

$n
[1] 3


iwish-list()
for(p in 1:length(biglist)){
if(is.na(lilwin[[p]])==F) iwish[p]-list(biglist[[p]][lilwin[[p]]])
}


 iwish[[1]]
NULL

[[2]]
[1] 3

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Background fill and border for a legend in dotplot

2011-09-02 Thread Dennis Murphy
Hi:

Try this:

key1 -
  draw.key(list(text=list(levels(Cal_dat$Commodity)),
title=Ore type,
border = TRUE,
background = 'ivory',
points=list(pch=22, cex=1.3, fill=col.pat, col=black)),
   draw = FALSE)
key2 -
  draw.key(list(text=list(levels(factor(Cal_dat$Year))),
title=Year,
border = TRUE,
background = 'ivory',
points = list(pch = c(21, 22, 23), cex=1.3, col=black)),
   draw = FALSE)

mkey -
  mergedTrellisLegendGrob(list(fun = key2),
  list(fun = key1),
  vertical = TRUE
)

Now rerun your dotplot; from the result I got, you may need to do some
positional tweaking and may well want to change the background color
of the legend to something else..

HTH,
Dennis

On Thu, Sep 1, 2011 at 4:47 PM, markm0705 markm0...@gmail.com wrote:
 Dear R help group

 I've been working on this plot for a while now and now getting around to the
 minor adjusments.  I would like to be able to put a border and background
 fill around the legend in this plot.

 I understand the legend 'bty' should do this have this capablity but not
 sure how the syntax works in this case

 ## initalise
 library(lattice)
 library(latticeExtra) # for mergedTrellisLegendGrob()

 ##read the data to a variable
 #

 Cal_dat - read.table(Calibration2.dat,header = TRUE,sep = \t,)

 ## set up plotting colours
 #
 col.pat-c(violet,cyan,green,red,blue,black,yellow)
 sym.pat-c(19,20,21)

 ##set up the plot key
 #
 key1 -
   draw.key(list(text=list(levels(Cal_dat$Commodity)),
                 title=Ore type,
                 points=list(pch=22, cex=1.3, fill=col.pat, col=black)),
            draw = FALSE)
 key2 -
   draw.key(list(text=list(levels(factor(Cal_dat$Year))),
                 title=Year,
                 points = list(pch = c(21, 22, 23), cex=1.3, col=black)),
            draw = FALSE)

 mkey -
   mergedTrellisLegendGrob(list(fun = key2),
                           list(fun = key1),
                           vertical = TRUE
 )

 ##set some parameters for the plot
 #
 trellis.par.set(
        dot.line=list(col = grey90, lty=dashed),
        axis.line=list(col = grey50),
        axis.text=list(col =grey50, cex=0.8),
        panel.background=list(col=transparent),
        par.xlab.text= list(col=grey50),
 )

 ## Create the dot plot
 #
 with(Cal_dat,
    dotplot(reorder(paste(Mine,Company), Resc_Gt) ~ Resc_Gt,
            fill_var = Commodity,
            pch_var = factor(Year),
            cex=1.2,
            pch = c(21, 22, 23),
            col = black,
            fill = col.pat,
            aspect = 2.0,
                alpha=0.6,
            legend = list(inside = list(fun = mkey,corner = c(0.95, 0.01))),
                scales = list(x = list(log = 10)),
                xscale.components = xscale.components.log10ticks,
            origin = 0,
            type = c(p,a),
            main = Mineral resources,
            xlab= Total tonnes (billions),
            panel = function(x, y, ..., subscripts,
                             fill, pch, fill_var, pch_var) {
                pch - pch[pch_var[subscripts]]
                fill - fill[fill_var[subscripts]]
                panel.dotplot(x, y, pch = pch, fill = fill, ...)
            }))

            panel = function(x, y, ..., subscripts,
                             fill, pch, fill_var, pch_var) {
                pch - pch[pch_var[subscripts]]
                fill - fill[fill_var[subscripts]]
                panel.dotplot(x, y, pch = pch, fill = fill, ...)
            }

 http://r.789695.n4.nabble.com/file/n3785003/Calibration2.dat
 Calibration2.dat

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Background-fill-and-border-for-a-legend-in-dotplot-tp3785003p3785003.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merge some columns

2011-09-02 Thread Joao Fadista
Dear all,

I would like to know how to merge columns like:

Input file:
  V1 V2 V3 V4 V5 V6
1  G  A  G  G  G  G
2  A  A  G  A  A  G

Desired output file:
V1  V2   V3
1  G/A G/G G/G
2  A/A G/A A/G

So for every 2 consecutive columns merge their content into one.
Thanks in advance.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If NA Problem!

2011-09-02 Thread Jean V Adams
Anna Dunietz wrote on 09/02/2011 07:16:45 AM:
 
 Hi All!
 
 Please find code and the respective lists below.  My problem: I specify 
the
 case that lilwin[[p]] is not an NA and want the code found in iwish to 
be
 returned ONLY for that case.  Why do I get a list of length 2 (and why 
is
 NULL the first element)? 


The list is length 2 because when p=2 in your for loop you save the output 
to iwish[2].  Since nothing was saved to iwish[1] (when lilwin[[1]] was 
NA) that gives a NULL value for iwish[1].  If you want to allow the length 
of iwish to be shorter, you need to change the way you save your output. 
For example,

iwish - list() 
count - 0
for(p in 1:length(biglist)){ 
if(is.na(lilwin[[p]])==F) {
count - count + 1
iwish[count] - list(biglist[[p]][lilwin[[p]]]) 
}
} 

Jean


 I understand that the code below is quite
 senseless.  I have run into a problem while working on a large project 
and
 wanted to simplify it in order for it to be more understandable and
 accessible.  If I should not be using the if function, please let me 
know
 what I should be doing instead.  I know that I must use the for function 
for
 my project.  The thing I most want to understand is how, after 
specifying a
 certain condition, one may save certain data that occurs when that 
condition
 is met.   I hope I have been clear enough!
 
 Thank you very much for your help!
 Anna
 
 biglist-list(a=1:4,b=2:6)
 lilwin-list(x=NA,y=2)
 lilloss-list(m=1,n=3)
 
 
 
  biglist$a
 [1] 1 2 3 4
 
 $b
 [1] 2 3 4 5 6
  lilwin$x
 [1] NA
 
 $y
 [1] 2
  lilloss$m
 [1] 1
 
 $n
 [1] 3
 
 
 iwish-list()
 for(p in 1:length(biglist)){
 if(is.na(lilwin[[p]])==F) iwish[p]-list(biglist[[p]][lilwin[[p]]])
 }
 
 
  iwish[[1]]
 NULL
 
 [[2]]
 [1] 3

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about BIC of two different regression models? how should we compare two regression models?

2011-09-02 Thread John Sorkin
I believe when using BIC one needs to compare nested models, i.e. , when 
comparing models A and B one must make sure that model A contains all the 
parameters of model B and additionally A contains one or more extra parameter 
beyond those in B. Further the comparison of BICs requires that models A and B 
be run on the same data. Thus if we have model A  y=age+sex, model B y=age, if 
some subjects are missing data on their sex, model A would be run on a subset 
of the data used when running model B. In this case the comparison of BICs 
would not be valid.
John 

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Ben Bolker bbol...@gmail.com 9/2/2011 2:35 AM 
Andra Isan andra_isan at yahoo.com writes:

 
 Hi All, 
 In order to compare two different logistic regressions, 
 I think I need to compare them based on their BIC
 values, but I am not sure if the smaller BIC would mean a better 
 model or the reverse is true?
 Thanks a lot,Andra

  Smaller (i.e. lower value) BIC is always better 
(even if BIC happens to be negative, as can happen in some cases; 
i.e. BIC=-1002 is better than BIC=-1000, BIC=1000 is better than BIC=1002).

  I would suggest however that (a) there are better venues for this
question (e.g. stats.stackexchange.com), since it's a stats and not
an R question; (b) it might be a good idea to review a stats text,
or even http://en.wikipedia.org/wiki/Bayesian_information_criterion ,
since this is a pretty basic question.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge some columns

2011-09-02 Thread Dennis Murphy
Hi:

Here's one approach:

d - read.table(textConnection(
V1 V2 V3 V4 V5 V6
1  G  A  G  G  G  G
2  A  A  G  A  A  G), header = TRUE, stringsAsFactors = FALSE)
closeAllConnections()

# Create two vectors of variable names, one for odd numbered,
# one for even numbered
vars1 - names(d)[seq_along(names(d)) %% 2 == 1]
vars2 - names(d)[seq_along(names(d)) %% 2 == 0]

# Apply the paste sequentially to corresponding pairs
# in vars1 and vars2; get() is used to get the data associated
# with the variable names in vars1 and vars2
d2 - sapply(seq_along(vars1),
 function(j) with(d, paste(get(vars1[j]), get(vars2[j]), sep = '/')))

# Convert to data frame:
d2 - as.data.frame(d2, stringsAsFactors = FALSE)
str(d2)

HTH,
Dennis

On Fri, Sep 2, 2011 at 5:34 AM, Joao Fadista joao.fadi...@med.lu.se wrote:
 Dear all,

 I would like to know how to merge columns like:

 Input file:
  V1 V2 V3 V4 V5 V6
 1  G  A  G  G  G  G
 2  A  A  G  A  A  G

 Desired output file:
    V1  V2   V3
 1  G/A G/G G/G
 2  A/A G/A A/G

 So for every 2 consecutive columns merge their content into one.
 Thanks in advance.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic Recoding

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 4:14 AM, thomas.chesney wrote:

Thank you both for your replies. I've tried it with a small sample  
of the
data and it works perfectly. I have no idea yet how it works but I  
will

spend some time to figure it out.



When you get around to putting in the time to figure it out, the  
solution to start with would be Dunlap's match strategy. Mine is  
really a perversion of an aspect of how factors work but isn't really  
how you are supposed to use them.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weights using Survreg

2011-09-02 Thread Boris Beranger
Thank you for your reply, it has been helpful.

Do you know if the parameters estimators are MLE estimators?

One more question:
In my case study I have failures that occured on different objects that have
different age and length, could I use weight to find the estimates of a
weibull law and so to find the probabilty of failure per unit of length for
example?


Thank you very much again for your help,

Boris

--
View this message in context: 
http://r.789695.n4.nabble.com/Weights-using-Survreg-tp3781803p3785931.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bandwith selectors for multivariate local regression

2011-09-02 Thread Soberon Velez, Alexandra Pilar
Dear all,



I'm trying to do a multivariate local regression whit the locfit instruction, 
such as:Y=m(Z,X,W)+u



However, I have a problem at the moment of calculte the bandwith of the 
regression. If I had a univarite local regression model I could use the 
instruction regband. However, this command only works in models with only one 
predictor.



Somebody knows how can I do that?



Please help I completely lose.



Thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Standard errors of sexual dimorphism?

2011-09-02 Thread ErikiLund
Hello!

I am working on a manuscript on sexual dimorphism in an aquatic
invertebrate, where we have estimated sexual dimorphism (SD) for 7 different
traits in four populations (a total of 28 SD-estimates). We have used the
following formula for estimating SD: 100 * (mean male trait value - mean
female trait value)/overall trait mean).

Then, we have used these SD-estimates to perform a GLM against other
interesting variables, such as the intersexual genetic correlations for each
of the traits.

Here are my questions:

1. Is there any procedure in R you would recommend that takes in to
account the sampling variance of the SD-estimates, rather than using the
mean value of each (which is supposed to reduce error and increase Type
I-error rates?

2. Is there a procedure to estimate SE for ratios such as this SD-estimate?

3. The data in these GLM:s might not be entirely statistically
non-independent (i. e. intersexual genetic correlations). Can you recommend
any R-procedure (package) that can deal with this problem (e. g.
bootstrapping or resampling)?

Many thanks in advance for input!

Erik Svensson

--
View this message in context: 
http://r.789695.n4.nabble.com/Standard-errors-of-sexual-dimorphism-tp3785770p3785770.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge some columns

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 8:34 AM, Joao Fadista wrote:


Dear all,

I would like to know how to merge columns like:

Input file:
 V1 V2 V3 V4 V5 V6
1  G  A  G  G  G  G
2  A  A  G  A  A  G



Looked like an mapply-type problem:

 with(dat,
  mapply(paste,
   list(V1, V3, V5),
   list(V2, V4, V6),
   MoreArgs=list(sep=/) )
   )

 [,1]  [,2]  [,3]
[1,] G/A G/G G/G
[2,] A/A G/A A/G



Desired output file:
   V1  V2   V3
1  G/A G/G G/G
2  A/A G/A A/G

So for every 2 consecutive columns merge their content into one.
Thanks in advance.


[[alternative HTML version deleted]]

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge some columns

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 9:30 AM, David Winsemius wrote:



On Sep 2, 2011, at 8:34 AM, Joao Fadista wrote:


Dear all,

I would like to know how to merge columns like:

Input file:
V1 V2 V3 V4 V5 V6
1  G  A  G  G  G  G
2  A  A  G  A  A  G



Looked like an mapply-type problem:

 with(dat,
 mapply(paste,
  list(V1, V3, V5),
  list(V2, V4, V6),
  MoreArgs=list(sep=/) )
  )


There is a further refinement that is possible that will result in  
naming of the columns made possible by the behavior of the USE.NAMES  
feature of mapply. From the help page: use names if the first ...  
argument has names, or if it is a character vector, use that character  
vector as the names;


with(dat, mapply(paste,
   list(V1 =V1, V2=V3, V3=V5),
list(V2, V4, V6),
  MoreArgs=list(sep=/) ) )
 V1V2V3
[1,] G/A G/G G/G
[2,] A/A G/A A/G



[,1]  [,2]  [,3]
[1,] G/A G/G G/G
[2,] A/A G/A A/G



Desired output file:
  V1  V2   V3
1  G/A G/G G/G
2  A/A G/A A/G

So for every 2 consecutive columns merge their content into one.
Thanks in advance.


[[alternative HTML version deleted]]

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maximum Likelihood using optim()

2011-09-02 Thread Sam Stewart
I think the following pdf will explain the details of how to use the
optim function.

http://www.unc.edu/~monogan/computing/r/MLE_in_R.pdf

Hope that helps,
Sam

On Fri, Sep 2, 2011 at 7:06 AM, Thiem  Alrik th...@sipo.gess.ethz.ch wrote:
 Dear mailing list,

 I would like to use the optim() command in order to maximize the logged 
 likelihood of the following function, where p is the parameter of interest 
 and should be constrained between 0 and positive infinity.

 y =  1/2 * ((te - x)/(te - tc))^p

 x and y are given by

 x - c(5.18, 6.28, 7.00, 7.08, 7.54, 7.90, 8.24, 8.64, 12.17, 12.89, 14.27, 
 15.38, 15.80, 16.46, 20.41, 21.27, 22.91)
 y - c(0.63, 0.64, 0.66, 0.68, 0.69, 0.71, 0.73, 0.75, 0.76, 0.78, 0.8, 0.81, 
 0.83, 0.85, 0.86, 0.88, 0.9)

 te and tc are fixed at 5 and 25.

 What is a good way to achieve this? Thanks a lot.

 Alrik Thiem
 Department of Humanities, Social and Political Sciences
 ETH Zurich

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] !!!function to do the knn!!!

2011-09-02 Thread jiliguala

really thxs to David Winsemius..


this websits helps a lot,

--
View this message in context: 
http://r.789695.n4.nabble.com/function-to-do-the-knn-tp3781137p3786251.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] post hoc testing of glmer in lme4

2011-09-02 Thread suuz
I have a mixed model with a binomial response, four factor variables and one
random factor.

m1=glmer(nbhf.hour~Season+Diel+Tidal.phase+Tidal.cycle+(1|POD.ID),family=binomial,data
=bl1,control=list(msVerbose=TRUE))

I have really need to try and find a post hoc test for this model and
finding the pariwise comparisons, note the dataset is unbalanced. I had read
many questions on this and there doesn't seem to be an acceptable/agreeable
answer although perhaps this was some time ago and the question can be
answered?

Any help is greatly appreciated. 

Thanks in advance

--
View this message in context: 
http://r.789695.n4.nabble.com/post-hoc-testing-of-glmer-in-lme4-tp3786201p3786201.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] !!!function to do the knn!!!

2011-09-02 Thread jiliguala

ths a lot, david. it helps a lot 

--
View this message in context: 
http://r.789695.n4.nabble.com/function-to-do-the-knn-tp3781137p3786253.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about BIC of two different regression models? how should we compare two regression models?

2011-09-02 Thread Patrick Breheny

On 09/02/2011 08:48 AM, John Sorkin wrote:

I believe when using BIC one needs to compare nested models


This is wrong.  Hypothesis tests rely on nested models; information 
criteria do not.


--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If NA Problem!

2011-09-02 Thread Peter Ehlers

On 2011-09-02 05:16, Anna Dunietz wrote:

Hi All!

Please find code and the respective lists below.  My problem: I specify the
case that lilwin[[p]] is not an NA and want the code found in iwish to be
returned ONLY for that case.  Why do I get a list of length 2 (and why is
NULL the first element)? I understand that the code below is quite
senseless.  I have run into a problem while working on a large project and
wanted to simplify it in order for it to be more understandable and
accessible.  If I should not be using the if function, please let me know
what I should be doing instead.  I know that I must use the for function for
my project.  The thing I most want to understand is how, after specifying a
certain condition, one may save certain data that occurs when that condition
is met.   I hope I have been clear enough!

Thank you very much for your help!
Anna

biglist-list(a=1:4,b=2:6)
lilwin-list(x=NA,y=2)
lilloss-list(m=1,n=3)




biglist$a

[1] 1 2 3 4

$b
[1] 2 3 4 5 6

lilwin$x

[1] NA

$y
[1] 2

lilloss$m

[1] 1

$n
[1] 3


iwish-list()
for(p in 1:length(biglist)){
if(is.na(lilwin[[p]])==F) iwish[p]-list(biglist[[p]][lilwin[[p]]])
}



iwish[[1]]

NULL

[[2]]
[1] 3


Jean has given you one fix. Here's another (see ?'c'):

 iwish-list()
 for(p in seq_along(biglist)){
   if(!is.na(lilwin[[p]]))
   iwish - c(iwish, biglist[[p]][lilwin[[p]]])
 }

BTW, it's not a good idea to use 'F' instead of FALSE and the
negation operator is usually a better way to test.

Peter Ehlers



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about BIC of two different regression models? how should we compare two regression models?

2011-09-02 Thread Prof Brian Ripley

On Fri, 2 Sep 2011, Patrick Breheny wrote:


On 09/02/2011 08:48 AM, John Sorkin wrote:

I believe when using BIC one needs to compare nested models


This is wrong.  Hypothesis tests rely on nested models; information criteria 
do not.


Actually, this is off-topic on this list. But blanket statements are 
often themselves untrue: there are hypothesis tests of non-nested 
models (most famously due to Cox, 1961), and Akaike explicitly 
considered only nested models in his paper introducing AIC. 
Certainly criteria such as AIC and BIC (in the sense of Schwarz: there 
are several criteria of that name) can be used with non-nested models 
but are much sharper tools for nested models.




--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about BIC of two different regression models? how should we compare two regression models?

2011-09-02 Thread Bert Gunter
Inline:

On Fri, Sep 2, 2011 at 8:09 AM, Patrick Breheny patrick.breh...@uky.edu wrote:
 On 09/02/2011 08:48 AM, John Sorkin wrote:

 I believe when using BIC one needs to compare nested models

 This is wrong.  Hypothesis tests rely on nested models; information criteria
 do not.


Yes, indeed. It may additionally be worth noting what has has been
noted on this list before: the actual definition of such criteria is
given only up to a constant, so different **software** may give
different answers on the same data. Hence be sure to compare results
using the same software or make any necessary additive adjustments
based on the details of how the software does the calculation when
results from different software are being compared.

Cheers,
Bert

 --
 Patrick Breheny
 Assistant Professor
 Department of Biostatistics
 Department of Statistics
 University of Kentucky

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mann Kendall Test for Trend

2011-09-02 Thread ScottM
Hi there, 

I'm trying to apply the Mann Kendall test for trend analysis of a time
series.  I have downloaded and installed the package Kendall and
subsequently loaded it into the software.

My time series is a .txt file with 2 columns - column 1 is the year (1985 -
2009) and column 2 is the corresponding entry variable.

According to the R guidelines, the call should be:

MannKendall(x) [whereby x is a data vector, usually a time series]

As such, I've loaded in my file 'data.txt' and then called on:

MannKendall(data), to which I get the following:

Error in Kendall(1:length(x), x) : length(x)3

What do I need to do to get beyond this highly annoying error?  I've tried
MannKendall(1:27(data), data), but then keep getting this:

Error in Kendall(1:length(x), x) : attempt to apply non-function

Any help greatly received!

S

--
View this message in context: 
http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] previous monday date

2011-09-02 Thread Ben qant
Hello,

I'm attempting to return the date (in form '%Y-%m-%d') of the Monday
previous to the current date. For example: since it is 2011-09-02 today, I
would expect 2011-08-29 to be the return value.

I found the following in:
http://www.mail-archive.com/r-help@r-project.org/msg144184.html

Start quote from link:
prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)

For example,

 prevmonday(Sys.Date())
[1] 2011-08-15
 prevmonday(prevmonday(Sys.Date()))
[1] 2011-08-15

End quote from link.

But when I do it I get:
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
 prevmonday(Sys.Date())
Error in as.Date.numeric(1 - 4) : 'origin' must be supplied

I've tried setting the 'origin' argument in as.Date() in different ways, but
it returns inaccurate results.

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Advice on large data structures

2011-09-02 Thread Joe Conway
On 09/01/2011 10:13 PM, Worik R wrote:
 I am starting on a (section of the) project where I need to build a matrix
 with on the order of 5 million rows and 200 columns
 
 I am wondering if I can stay in R.
 
 I need to do rollapply type operations on the columns, including some that
 will be functions of (windows of) two columns.

Perhaps useful to you -- I recently added WINDOW FUNCTION support to
PL/R*. Currently this new feature is only available in git master, but
within a few days I will push a new release. You can download the source
from git here if you want:

https://github.com/jconway/plr

The official docs have not been updated yet, but see the pre-release
docs here (specifically chapter 9):

http://www.joeconway.com/plr/doc/plr-git-US.pdf

HTH,

Joe

*PL/R allows you to execute R functions from within a PostgreSQL database

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about BIC of two different regression models? how should we compare two regression models?

2011-09-02 Thread Patrick Breheny

On 09/02/2011 11:26 AM, Prof Brian Ripley wrote:

This is wrong.  Hypothesis tests rely on nested models; information criteria
do not.


Actually, this is off-topic on this list. But blanket statements are
often themselves untrue: there are hypothesis tests of non-nested
models (most famously due to Cox, 1961), and Akaike explicitly
considered only nested models in his paper introducing AIC.
Certainly criteria such as AIC and BIC (in the sense of Schwarz: there
are several criteria of that name) can be used with non-nested models
but are much sharper tools for nested models.


Good point; my remark was only meant to refer to the simple case of 
logistic regression in the original post, and certainly should not be 
construed as a blanket statement applying to all possible hypothesis 
tests of all possible models.


--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann Kendall Test for Trend

2011-09-02 Thread R. Michael Weylandt
I don't have the Kendall package at my fingers, but it seems like you have
some deeper trouble with R syntax if you are writing things like

MannKendall(1:27(data),data)

when you know that MannKendall only takes one argument.

Can you verify that your data object actually has a full set of data to it?
(use some combination of length(),nrow(),dput(),head(),tail() etc)

For a general tip, data() is a function to R so its not the best idea for an
object name. Outside chance that might be what's causing your problem.

Michael Weylandt

On Fri, Sep 2, 2011 at 11:35 AM, ScottM scott.mcgr...@abdn.ac.uk wrote:

 Hi there,

 I'm trying to apply the Mann Kendall test for trend analysis of a time
 series.  I have downloaded and installed the package Kendall and
 subsequently loaded it into the software.

 My time series is a .txt file with 2 columns - column 1 is the year (1985 -
 2009) and column 2 is the corresponding entry variable.

 According to the R guidelines, the call should be:

 MannKendall(x) [whereby x is a data vector, usually a time series]

 As such, I've loaded in my file 'data.txt' and then called on:

 MannKendall(data), to which I get the following:

 Error in Kendall(1:length(x), x) : length(x)3

 What do I need to do to get beyond this highly annoying error?  I've tried
 MannKendall(1:27(data), data), but then keep getting this:

 Error in Kendall(1:length(x), x) : attempt to apply non-function

 Any help greatly received!

 S

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] previous monday date

2011-09-02 Thread Marc Schwartz
On Sep 2, 2011, at 10:35 AM, Ben qant wrote:

 Hello,
 
 I'm attempting to return the date (in form '%Y-%m-%d') of the Monday
 previous to the current date. For example: since it is 2011-09-02 today, I
 would expect 2011-08-29 to be the return value.
 
 I found the following in:
 http://www.mail-archive.com/r-help@r-project.org/msg144184.html
 
 Start quote from link:
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
 
 For example,
 
 prevmonday(Sys.Date())
 [1] 2011-08-15
 prevmonday(prevmonday(Sys.Date()))
 [1] 2011-08-15
 
 End quote from link.
 
 But when I do it I get:
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
 prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied
 
 I've tried setting the 'origin' argument in as.Date() in different ways, but
 it returns inaccurate results.
 
 Thanks,
 
 Ben


If memory serves, this is because Gabor used the version of as.Date() from his 
'zoo' package in that post, which does not require an origin to be specified, 
whereas the default as.Date() function in R's base package does:

prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)

 prevmonday(Sys.Date())
Error in as.Date.numeric(1 - 4) : 'origin' must be supplied

 require(zoo)
Loading required package: zoo

Attaching package: 'zoo'

The following object(s) are masked from 'package:base':

as.Date

 prevmonday(Sys.Date())
[1] 2011-08-29


# Remove 'zoo' to use the base function
detach(package:zoo)

 prevmonday(Sys.Date())
Error in as.Date.numeric(1 - 4) : 'origin' must be supplied


# Fix the function to use base::as.Date()
prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4, 
origin = 1970-01-01)

 prevmonday(Sys.Date())
[1] 2011-08-29


See ?as.Date

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann Kendall Test for Trend

2011-09-02 Thread ScottM
Michael,

Cheers for the input - still learning syntax as I go - very new to the use of 
programming language and having to self teach as part of my PhD.

The input data was fine - it was the call on the R website which was confusing 
me - by defining the location of the actual data in the second column and 
calling result - MannKendall(filename[,2]), it worked without issue.

Cheers again though,

Scott McGrane MA (Hons), MRes
SAGES Theme 1 PhD Student
Northern Rivers Institute
St Mary's Building
University of Aberdeen

http://www.abdn.ac.uk/nri

From: R. Michael Weylandt michael.weyla...@gmail.com [via R] 
[ml-node+3786437-1846790326-252...@n4.nabble.com]
Sent: 02 September 2011 04:47 PM
To: Mcgrane, Scott
Subject: Re: Mann Kendall Test for Trend

I don't have the Kendall package at my fingers, but it seems like you have
some deeper trouble with R syntax if you are writing things like

MannKendall(1:27(data),data)

when you know that MannKendall only takes one argument.

Can you verify that your data object actually has a full set of data to it?
(use some combination of length(),nrow(),dput(),head(),tail() etc)

For a general tip, data() is a function to R so its not the best idea for an
object name. Outside chance that might be what's causing your problem.

Michael Weylandt

On Fri, Sep 2, 2011 at 11:35 AM, ScottM [hidden 
email]/user/SendEmail.jtp?type=nodenode=3786437i=0 wrote:

 Hi there,

 I'm trying to apply the Mann Kendall test for trend analysis of a time
 series.  I have downloaded and installed the package Kendall and
 subsequently loaded it into the software.

 My time series is a .txt file with 2 columns - column 1 is the year (1985 -
 2009) and column 2 is the corresponding entry variable.

 According to the R guidelines, the call should be:

 MannKendall(x) [whereby x is a data vector, usually a time series]

 As such, I've loaded in my file 'data.txt' and then called on:

 MannKendall(data), to which I get the following:

 Error in Kendall(1:length(x), x) : length(x)3

 What do I need to do to get beyond this highly annoying error?  I've tried
 MannKendall(1:27(data), data), but then keep getting this:

 Error in Kendall(1:length(x), x) : attempt to apply non-function

 Any help greatly received!

 S

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=1 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
[hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=2 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786437.html
To unsubscribe from Mann Kendall Test for Trend, click 
herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3786392code=c2NvdHQubWNncmFuZUBhYmRuLmFjLnVrfDM3ODYzOTJ8OTM1MjEwNDY5.


The University of Aberdeen is a charity registered in Scotland, No SC013683.


--
View this message in context: 
http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786488.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] previous monday date

2011-09-02 Thread Ben qant
I didn't sort out the issue in  my email below but here is a (not very
R'ish) solution:

 pm = function(x) {
+   for(i in 1:7){
+  if(format(as.Date(Sys.Date()-i),'%w') == 1){
+  d = Sys.Date() - i;
+ }
+   }
+   d
+ }
 pm(Sys.Date())
[1] 2011-08-29

On Fri, Sep 2, 2011 at 9:35 AM, Ben qant ccqu...@gmail.com wrote:

 Hello,

 I'm attempting to return the date (in form '%Y-%m-%d') of the Monday
 previous to the current date. For example: since it is 2011-09-02 today, I
 would expect 2011-08-29 to be the return value.

 I found the following in:
 http://www.mail-archive.com/r-help@r-project.org/msg144184.html

 Start quote from link:
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)

 For example,

  prevmonday(Sys.Date())
 [1] 2011-08-15
  prevmonday(prevmonday(Sys.Date()))
 [1] 2011-08-15

 End quote from link.

 But when I do it I get:
  prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
  prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied

 I've tried setting the 'origin' argument in as.Date() in different ways,
 but it returns inaccurate results.

 Thanks,

 Ben


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] previous monday date

2011-09-02 Thread Ben qant
Oh OK, missed that.

Here is a solution using base: (already posted)

I didn't sort out the issue in  my email below but here is a (not very
R'ish) solution:

 pm = function(x) {
+   for(i in 1:7){
+  if(format(as.Date(Sys.Date()-
i),'%w') == 1){
+  d = Sys.Date() - i;
+ }
+   }
+   d
+ }
 pm(Sys.Date())
[1] 2011-08-29


On Fri, Sep 2, 2011 at 9:59 AM, Marc Schwartz marc_schwa...@me.com wrote:

 On Sep 2, 2011, at 10:35 AM, Ben qant wrote:

  Hello,
 
  I'm attempting to return the date (in form '%Y-%m-%d') of the Monday
  previous to the current date. For example: since it is 2011-09-02 today,
 I
  would expect 2011-08-29 to be the return value.
 
  I found the following in:
  http://www.mail-archive.com/r-help@r-project.org/msg144184.html
 
  Start quote from link:
  prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
 
  For example,
 
  prevmonday(Sys.Date())
  [1] 2011-08-15
  prevmonday(prevmonday(Sys.Date()))
  [1] 2011-08-15
 
  End quote from link.
 
  But when I do it I get:
  prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) +
 as.Date(1-4)
  prevmonday(Sys.Date())
  Error in as.Date.numeric(1 - 4) : 'origin' must be supplied
 
  I've tried setting the 'origin' argument in as.Date() in different ways,
 but
  it returns inaccurate results.
 
  Thanks,
 
  Ben


 If memory serves, this is because Gabor used the version of as.Date() from
 his 'zoo' package in that post, which does not require an origin to be
 specified, whereas the default as.Date() function in R's base package does:

 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)

  prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied

  require(zoo)
 Loading required package: zoo

 Attaching package: 'zoo'

 The following object(s) are masked from 'package:base':

as.Date

  prevmonday(Sys.Date())
 [1] 2011-08-29


 # Remove 'zoo' to use the base function
 detach(package:zoo)

  prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied


 # Fix the function to use base::as.Date()
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4,
 origin = 1970-01-01)

  prevmonday(Sys.Date())
 [1] 2011-08-29


 See ?as.Date

 HTH,

 Marc Schwartz



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann Kendall Test for Trend

2011-09-02 Thread R. Michael Weylandt
I'd suggest that you look into using a time series class to organize your
data rather than just keeping levels and times next to each other. This will
also make it alot easier to work with your data in specific time-series sort
of ways.

I really like the xts class and you can get to it with something like the
following

install.package(xts)
library(xts)
x = read.table(HOWEVER YOU NEED TO READ THE DATA)
x = xts(x[,2],as.Date(x[,1], format = %Y))

The syntax to the last line is to construct an xts using the second column
of the data as the levels of the time series while the first reads the
years, coverts them to date objects (you supply the format, here its just a
year number)) and puts the thing together.

Then, I think you can just run MannKendall(x)

Hope this helps,

Michael Weylandt

PS -- There are all sorts of wonderful introductions to R for people with a
variety of programming backgrounds. If you want a recommendation, just say a
little more about any programming experience you may or may not have had and
I'm sure someone could recommend a few good ones.

On Fri, Sep 2, 2011 at 12:00 PM, ScottM scott.mcgr...@abdn.ac.uk wrote:

 Michael,

 Cheers for the input - still learning syntax as I go - very new to the
 use of programming language and having to self teach as part of my PhD.

 The input data was fine - it was the call on the R website which was
 confusing me - by defining the location of the actual data in the second
 column and calling result - MannKendall(filename[,2]), it worked without
 issue.

 Cheers again though,

 Scott McGrane MA (Hons), MRes
 SAGES Theme 1 PhD Student
 Northern Rivers Institute
 St Mary's Building
 University of Aberdeen

 http://www.abdn.ac.uk/nri
 
 From: R. Michael Weylandt michael.weyla...@gmail.com [via R] [
 ml-node+3786437-1846790326-252...@n4.nabble.com]
 Sent: 02 September 2011 04:47 PM
 To: Mcgrane, Scott
 Subject: Re: Mann Kendall Test for Trend

 I don't have the Kendall package at my fingers, but it seems like you have
 some deeper trouble with R syntax if you are writing things like

 MannKendall(1:27(data),data)

 when you know that MannKendall only takes one argument.

 Can you verify that your data object actually has a full set of data to it?
 (use some combination of length(),nrow(),dput(),head(),tail() etc)

 For a general tip, data() is a function to R so its not the best idea for
 an
 object name. Outside chance that might be what's causing your problem.

 Michael Weylandt

 On Fri, Sep 2, 2011 at 11:35 AM, ScottM [hidden
 email]/user/SendEmail.jtp?type=nodenode=3786437i=0 wrote:

  Hi there,
 
  I'm trying to apply the Mann Kendall test for trend analysis of a time
  series.  I have downloaded and installed the package Kendall and
  subsequently loaded it into the software.
 
  My time series is a .txt file with 2 columns - column 1 is the year (1985
 -
  2009) and column 2 is the corresponding entry variable.
 
  According to the R guidelines, the call should be:
 
  MannKendall(x) [whereby x is a data vector, usually a time series]
 
  As such, I've loaded in my file 'data.txt' and then called on:
 
  MannKendall(data), to which I get the following:
 
  Error in Kendall(1:length(x), x) : length(x)3
 
  What do I need to do to get beyond this highly annoying error?  I've
 tried
  MannKendall(1:27(data), data), but then keep getting this:
 
  Error in Kendall(1:length(x), x) : attempt to apply non-function
 
  Any help greatly received!
 
  S
 
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=1 mailing
 list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=2 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786437.html
 To unsubscribe from Mann Kendall Test for Trend, click here
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3786392code=c2NvdHQubWNncmFuZUBhYmRuLmFjLnVrfDM3ODYzOTJ8OTM1MjEwNDY5
 .


 The University of Aberdeen is a charity registered in Scotland, No
 SC013683.


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786488.html
 

Re: [R] previous monday date

2011-09-02 Thread Marc Schwartz
 On Fri, Sep 2, 2011 at 9:59 AM, Marc Schwartz marc_schwa...@me.com wrote:
 
 On Sep 2, 2011, at 10:35 AM, Ben qant wrote:
 
 Hello,
 
 I'm attempting to return the date (in form '%Y-%m-%d') of the Monday
 previous to the current date. For example: since it is 2011-09-02 today,
 I
 would expect 2011-08-29 to be the return value.
 
 I found the following in:
 http://www.mail-archive.com/r-help@r-project.org/msg144184.html
 
 Start quote from link:
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
 
 For example,
 
 prevmonday(Sys.Date())
 [1] 2011-08-15
 prevmonday(prevmonday(Sys.Date()))
 [1] 2011-08-15
 
 End quote from link.
 
 But when I do it I get:
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) +
 as.Date(1-4)
 prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied
 
 I've tried setting the 'origin' argument in as.Date() in different ways,
 but
 it returns inaccurate results.
 
 Thanks,
 
 Ben
 
 
 If memory serves, this is because Gabor used the version of as.Date() from
 his 'zoo' package in that post, which does not require an origin to be
 specified, whereas the default as.Date() function in R's base package does:
 
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4)
 
 prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied
 
 require(zoo)
 Loading required package: zoo
 
 Attaching package: 'zoo'
 
 The following object(s) are masked from 'package:base':
 
   as.Date
 
 prevmonday(Sys.Date())
 [1] 2011-08-29
 
 
 # Remove 'zoo' to use the base function
 detach(package:zoo)
 
 prevmonday(Sys.Date())
 Error in as.Date.numeric(1 - 4) : 'origin' must be supplied
 
 
 # Fix the function to use base::as.Date()
 prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4,
 origin = 1970-01-01)
 
 prevmonday(Sys.Date())
 [1] 2011-08-29
 
 
 See ?as.Date
 
 HTH,
 
 Marc Schwartz

On Sep 2, 2011, at 11:02 AM, Ben qant wrote:

 Oh OK, missed that.
 
 Here is a solution using base: (already posted)
 
 I didn't sort out the issue in  my email below but here is a (not very
 R'ish) solution:
 
 pm = function(x) {
 +   for(i in 1:7){
 +  if(format(as.Date(Sys.Date()-
 i),'%w') == 1){
 +  d = Sys.Date() - i;
 + }
 +   }
 +   d
 + }
 pm(Sys.Date())
 [1] 2011-08-29


It occurs to me that another solution would be:

 as.Date(cut(Sys.Date(), weeks))
[1] 2011-08-29


This uses cut.Date() to create a vector that contains the beginning of weekly 
intervals for each date value, with the week beginning on a Monday by default. 
Note that cut.Date() returns a factor, not a Date class object, hence the 
additional coercion of the result.

See ?cut.Date

So if you want that in a function wrapper, you could do:

prev.monday - function(x) as.Date(cut(x, weeks))

 prev.monday(Sys.Date())
[1] 2011-08-29


HTH,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Avoiding for Loop for moving average

2011-09-02 Thread Noah Silverman
Hello,

I need to calculate a moving average and an exponentially weighted moving 
average over a fairly large data set (500K rows).

Doing this in a for loop works nicely, but is slow.

ewma - data$col[1]
N - dim(data)[1]
for(i in 2:N){
data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
}


Since the moving average accumulates as we move through the data, I'm not 
sure on the best/fastest way to do this.  

Does anyone have any suggestions on how to avoid a loop doing this?




--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building #8208
Los Angeles, CA 90095


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread R. Michael Weylandt
Have you looked at SMA/EMA from the TTR package? That's a pretty quick
implementation.

runmean from caTools is even better for the SMA but I don't think there's an
easy way to turn that into an EWMA.

Hope this helps,

Michael Weylandt

On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote:

 Hello,

 I need to calculate a moving average and an exponentially weighted moving
 average over a fairly large data set (500K rows).

 Doing this in a for loop works nicely, but is slow.

 ewma - data$col[1]
 N - dim(data)[1]
 for(i in 2:N){
data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
 }


 Since the moving average accumulates as we move through the data, I'm not
 sure on the best/fastest way to do this.

 Does anyone have any suggestions on how to avoid a loop doing this?




 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hints for Data Clustering

2011-09-02 Thread Lorenzo Isella

Dear All,
I will be confronted (relatively soon) with the following problem:
given a set of known statistical indicators {s_i} , i=1,2...N for a N 
countries I would like to be able to do some data clustering i.e. 
determining the best way to partition the N countries according to their 
known properties, encoded by the {s_i} set of indicators for those 
countries.
Some properties of these countries may be categorical or anyway 
non-numerical variables (e.g. the fact of belonging/not belonging to a 
certain group; joining/not joining a certain treaty etc...). I have seen 
some data clustering examples, but without categorical variables and I 
wonder if this is an inherent limitation of the methodology (on the top 
of my head, I would not know how to define the distance between 
non-numerical variables). Any suggestions about the general methodology 
and R packages/code snippets is really appreciated.
And also: do the units in which I express a statistical indicator play a 
role? For instance: for 2 given countries I could have the average age 
of the population, the average life expectancy and the average income 
per year in thousands of dollars. This would give rise e.g. to 
(40,72,26) and (44,75,36), but if I measure the average income in 
dollars, then I would get (40,72,26000) (44,75,36000). Would the units 
that I choose for an indicator impact on the clustering results? They 
should not, in my view, since the income does not change whichever way I 
express it, but I am not sure about the algorithm results.

Many thanks

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Joshua Ulrich
On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 Have you looked at SMA/EMA from the TTR package? That's a pretty quick
 implementation.

 runmean from caTools is even better for the SMA but I don't think there's an
 easy way to turn that into an EWMA.

SMA still calls Fortran code, so that's why it's slower than
caTools::runmean.  I've moved the EMA code to C, so it's about as fast
as it can be.

Noah, use EMA's ratio argument to replicate your for loop.

 Hope this helps,

 Michael Weylandt


Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com



 On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote:

 Hello,

 I need to calculate a moving average and an exponentially weighted moving
 average over a fairly large data set (500K rows).

 Doing this in a for loop works nicely, but is slow.

 ewma - data$col[1]
 N - dim(data)[1]
 for(i in 2:N){
        data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
 }


 Since the moving average accumulates as we move through the data, I'm not
 sure on the best/fastest way to do this.

 Does anyone have any suggestions on how to avoid a loop doing this?




 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Noah Silverman
Joshua,

Thanks for the tip.

I need to roll my own code on this.  But perhaps I can borrow some code from 
the package you mentioned.

Is the package just performing the loop, but in a faster language?


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building #8208
Los Angeles, CA 90095

On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote:

 On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:
 Have you looked at SMA/EMA from the TTR package? That's a pretty quick
 implementation.
 
 runmean from caTools is even better for the SMA but I don't think there's an
 easy way to turn that into an EWMA.
 
 SMA still calls Fortran code, so that's why it's slower than
 caTools::runmean.  I've moved the EMA code to C, so it's about as fast
 as it can be.
 
 Noah, use EMA's ratio argument to replicate your for loop.
 
 Hope this helps,
 
 Michael Weylandt
 
 
 Best,
 --
 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
 
 
 
 On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman 
 noahsilver...@ucla.eduwrote:
 
 Hello,
 
 I need to calculate a moving average and an exponentially weighted moving
 average over a fairly large data set (500K rows).
 
 Doing this in a for loop works nicely, but is slow.
 
 ewma - data$col[1]
 N - dim(data)[1]
 for(i in 2:N){
data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
 }
 
 
 Since the moving average accumulates as we move through the data, I'm not
 sure on the best/fastest way to do this.
 
 Does anyone have any suggestions on how to avoid a loop doing this?
 
 
 
 
 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Abhijit Dasgupta
There is a recent blog post by Dirk Eddelbeutel on how to do something 
similar using his Rcpp package and C++, with massive time improvements.

http://dirk.eddelbuettel.com/blog/

On 9/2/2011 12:43 PM, Noah Silverman wrote:
 Hello,

 I need to calculate a moving average and an exponentially weighted moving 
 average over a fairly large data set (500K rows).

 Doing this in a for loop works nicely, but is slow.

 ewma- data$col[1]
 N- dim(data)[1]
 for(i in 2:N){
   data$ewma- alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
 }


 Since the moving average accumulates as we move through the data, I'm not 
 sure on the best/fastest way to do this.

 Does anyone have any suggestions on how to avoid a loop doing this?




 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Joshua Ulrich
On Fri, Sep 2, 2011 at 12:06 PM, Noah Silverman noahsilver...@ucla.edu wrote:
 Joshua,

 Thanks for the tip.

 I need to roll my own code on this.  But perhaps I can borrow some code 
 from the package you mentioned.

 Is the package just performing the loop, but in a faster language?

As I said, the function is in C.  You could also use the compiler
package to compile your pure R function for a 3-4x speedup.

Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com



 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095

 On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote:

 On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:
 Have you looked at SMA/EMA from the TTR package? That's a pretty quick
 implementation.

 runmean from caTools is even better for the SMA but I don't think there's an
 easy way to turn that into an EWMA.

 SMA still calls Fortran code, so that's why it's slower than
 caTools::runmean.  I've moved the EMA code to C, so it's about as fast
 as it can be.

 Noah, use EMA's ratio argument to replicate your for loop.

 Hope this helps,

 Michael Weylandt


 Best,
 --
 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com



 On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman 
 noahsilver...@ucla.eduwrote:

 Hello,

 I need to calculate a moving average and an exponentially weighted moving
 average over a fairly large data set (500K rows).

 Doing this in a for loop works nicely, but is slow.

 ewma - data$col[1]
 N - dim(data)[1]
 for(i in 2:N){
        data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
 }


 Since the moving average accumulates as we move through the data, I'm not
 sure on the best/fastest way to do this.

 Does anyone have any suggestions on how to avoid a loop doing this?




 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Advice on large data structures

2011-09-02 Thread Alex Ruiz Euler

Along the lines of one of Jim's suggestions, if you have some
basic MySQL knowledge check out the RMySQL package. I use it to
convert / partition a matrix similar to yours to R objects and it
works fine.

Hope this helps,
A.


On Fri, 2 Sep 2011 06:33:13 -0400
Jim Holtman jholt...@gmail.com wrote:

 i would suggest that if you want to use R that you get a 64-bit version with 
 24GB of memory to start.  if your data is a numeric matrix, you will need 8GB 
 for a single copy.
 
 Do you really need it all in memory at once, or can you partition the 
 problem?  Can you use a database to access the portion you need at any time?
 
 If you only need one, or two, columns at a time, then the use of a database 
 storing the columns might work.  You probably need some more analysis on 
 exactly how you want to solve your problem understanding the limitations of 
 the system.
 
 Sent from my iPad
 
 On Sep 2, 2011, at 1:13, Worik R wor...@gmail.com wrote:
 
  Friends
  
  I am starting on a (section of the) project where I need to build a matrix
  with on the order of 5 million rows and 200 columns
  
  I am wondering if I can stay in R.
  
  I need to do rollapply type operations on the columns, including some that
  will be functions of (windows of) two columns.
  
  I have been looking at the ff and bigmemory packages but am not sure that
  they will do.
  
  Before I get too deep can some one offer some wisdom about what the best
  direction to go would be?
  
  Switching to C/C++ is definitely an option if it is all too hard
  
  cheers
  Worik
  
 [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hints for Data Clustering

2011-09-02 Thread Jean V Adams
Look at the function daisy in the package cluster.

require(cluster)
?daisy

Jean

Lorenzo Isella wrote on 09/02/2011 11:50:04 AM:
 
 Dear All,
 I will be confronted (relatively soon) with the following problem:
 given a set of known statistical indicators {s_i} , i=1,2...N for a N 
 countries I would like to be able to do some data clustering i.e. 
 determining the best way to partition the N countries according to their 

 known properties, encoded by the {s_i} set of indicators for those 
 countries.
 Some properties of these countries may be categorical or anyway 
 non-numerical variables (e.g. the fact of belonging/not belonging to a 
 certain group; joining/not joining a certain treaty etc...). I have seen 

 some data clustering examples, but without categorical variables and I 
 wonder if this is an inherent limitation of the methodology (on the top 
 of my head, I would not know how to define the distance between 
 non-numerical variables). Any suggestions about the general methodology 
 and R packages/code snippets is really appreciated.
 And also: do the units in which I express a statistical indicator play a 

 role? For instance: for 2 given countries I could have the average age 
 of the population, the average life expectancy and the average income 
 per year in thousands of dollars. This would give rise e.g. to 
 (40,72,26) and (44,75,36), but if I measure the average income in 
 dollars, then I would get (40,72,26000) (44,75,36000). Would the units 
 that I choose for an indicator impact on the clustering results? They 
 should not, in my view, since the income does not change whichever way I 

 express it, but I am not sure about the algorithm results.
 Many thanks
 
 Lorenzo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Noah Silverman
Thanks Joshua,

I really like the example given in the blog post that Abhijit pointed me to.

Doing it in C++ using the Inline seems like an easy way to get a massive 
improvement in speed without the hassle of writing a package.

I'm working on coding that now.

--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building #8208
Los Angeles, CA 90095

On Sep 2, 2011, at 10:32 AM, Joshua Ulrich wrote:

 On Fri, Sep 2, 2011 at 12:06 PM, Noah Silverman noahsilver...@ucla.edu 
 wrote:
 Joshua,
 
 Thanks for the tip.
 
 I need to roll my own code on this.  But perhaps I can borrow some code 
 from the package you mentioned.
 
 Is the package just performing the loop, but in a faster language?
 
 As I said, the function is in C.  You could also use the compiler
 package to compile your pure R function for a 3-4x speedup.
 
 Best,
 --
 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
 
 
 
 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095
 
 On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote:
 
 On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:
 Have you looked at SMA/EMA from the TTR package? That's a pretty quick
 implementation.
 
 runmean from caTools is even better for the SMA but I don't think there's 
 an
 easy way to turn that into an EWMA.
 
 SMA still calls Fortran code, so that's why it's slower than
 caTools::runmean.  I've moved the EMA code to C, so it's about as fast
 as it can be.
 
 Noah, use EMA's ratio argument to replicate your for loop.
 
 Hope this helps,
 
 Michael Weylandt
 
 
 Best,
 --
 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
 
 
 
 On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman 
 noahsilver...@ucla.eduwrote:
 
 Hello,
 
 I need to calculate a moving average and an exponentially weighted moving
 average over a fairly large data set (500K rows).
 
 Doing this in a for loop works nicely, but is slow.
 
 ewma - data$col[1]
 N - dim(data)[1]
 for(i in 2:N){
data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
 }
 
 
 Since the moving average accumulates as we move through the data, I'm 
 not
 sure on the best/fastest way to do this.
 
 Does anyone have any suggestions on how to avoid a loop doing this?
 
 
 
 
 --
 Noah Silverman
 UCLA Department of Statistics
 8117 Math Sciences Building #8208
 Los Angeles, CA 90095
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Including only a subset of the levels of a factor XXXX

2011-09-02 Thread peter dalgaard

On Sep 1, 2011, at 21:11 , R. Michael Weylandt wrote:

 Dropping all occurences of a factor does not drop that level. This actually
 turns out to be much more useful than it first might appear, but if you
 really need to get around it, it can be done.

...most expediently by using factor(), as others have pointed out. Or 
droplevels() for data frames.

We had the converse issue just the other day (Aug 30) when someone had problems 
with showing zero frequencies in xtabs, which turned out to be caused by the 
tabulated data _not_ being factors, hence not containing information about 
which values could have been there but wasn't.

The behavior of subsetting operators is so as to make things like tables and 
barplots consistent across subsets, but there are cases where you want the 
extra levels dropped. However, the default is as it is, because it is easier to 
drop levels than to reinstate them. Neither is impossible, of course.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
Døden skal tape! --- Nordahl Grieg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Patrick Burns

The 'filter' function should be able
to do what you want efficiently.

On 02/09/2011 18:06, Noah Silverman wrote:

Joshua,

Thanks for the tip.

I need to roll my own code on this.  But perhaps I can borrow some code from 
the package you mentioned.

Is the package just performing the loop, but in a faster language?


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building #8208
Los Angeles, CA 90095

On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote:


On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt
michael.weyla...@gmail.com  wrote:

Have you looked at SMA/EMA from the TTR package? That's a pretty quick
implementation.

runmean from caTools is even better for the SMA but I don't think there's an
easy way to turn that into an EWMA.


SMA still calls Fortran code, so that's why it's slower than
caTools::runmean.  I've moved the EMA code to C, so it's about as fast
as it can be.

Noah, use EMA's ratio argument to replicate your for loop.


Hope this helps,

Michael Weylandt



Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com




On Fri, Sep 2, 2011 at 12:43 PM, Noah Silvermannoahsilver...@ucla.eduwrote:


Hello,

I need to calculate a moving average and an exponentially weighted moving
average over a fairly large data set (500K rows).

Doing this in a for loop works nicely, but is slow.

ewma- data$col[1]
N- dim(data)[1]
for(i in 2:N){
data$ewma- alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
}


Since the moving average accumulates as we move through the data, I'm not
sure on the best/fastest way to do this.

Does anyone have any suggestions on how to avoid a loop doing this?




--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building #8208
Los Angeles, CA 90095


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Classifying large text corpora using R

2011-09-02 Thread andy1234
Dear everyone, 

I am new to R, and I am looking at doing text classification on a huge
collection of documents (500,000) which are distributed among 300 classes
(so basically, this is my training data). Would someone please be kind
enough to let me know about the R packages to use and their scalability
(time and space)? 

I am very new to R and do not know of the right packages to use. I started
off by trying to use the tm package (http://cran.r-project.org/package=tm)
for pre-processing and FSelector
(http://cran.r-project.org/web/packages/FSelector/index.html) package for
feature selection - but both of these are incredibly slow and completely
unusable for my task. 

So the question is what are the right packages to use (for pre-processing,
feature selection, and classification)? Please consider the fact that I may
be dealing with data of millions of dimensions which may not even fit in
memory. 

I posted on this issue twice
(http://r.789695.n4.nabble.com/Entropy-based-feature-selection-in-R-td3708056.html
,
http://r.789695.n4.nabble.com/R-s-handling-of-high-dimensional-data-td3741758.html)
but did not get any response. This is a very critical piece of my research
and I have been struggling with this issue for a long time. Please consider
helping me out, directly or by pointing me to any other software/website
that you think may be more appropriate. 

Many thanks in advance.

--
View this message in context: 
http://r.789695.n4.nabble.com/Classifying-large-text-corpora-using-R-tp3786787p3786787.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chemical Names in Data Frames

2011-09-02 Thread Durant, James T. (ATSDR/DTEM/PRMSB)
Greetings -

I am working on some data that contain chemical names with air concentrations, 
and I am creating a data frame with date/time and each chemical having its own 
column. However, these are organic chemicals (e.g. 1-butene, 
2,3,4-trimethylbenzene etc). The package I am going to be using the data with 
is openair, and many of the great functions require you to specify a column 
name which does not seem to work with improper column names- e.g. 
smoothTrend(mydata, pollutant=1-Butene and smoothTrend(mydata, 
pollutant=mydata[,1-Butene])

I was wondering if there was a function to automatically convert these chemical 
names (with all sorts of numbers and minuses in the beginning) to something 
openair can handle?  Or am I going to be stuck recoding several hundred 
chemical names in my database?

VR

Jim

James T. Durant, MSPH CIH
Emergency Response Coordinator
US Agency for Toxic Substances and Disease Registry
Atlanta, GA 30341
770-378-1695





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chemical Names in Data Frames

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 3:13 PM, Durant, James T. (ATSDR/DTEM/PRMSB) wrote:


Greetings -

I am working on some data that contain chemical names with air  
concentrations, and I am creating a data frame with date/time and  
each chemical having its own column. However, these are organic  
chemicals (e.g. 1-butene, 2,3,4-trimethylbenzene etc). The package I  
am going to be using the data with is openair, and many of the great  
functions require you to specify a column name which does not seem  
to work with improper column names- e.g. smoothTrend(mydata,  
pollutant=1-Butene and smoothTrend(mydata, pollutant=mydata[,1- 
Butene])


I was wondering if there was a function to automatically convert  
these chemical names (with all sorts of numbers and minuses in the  
beginning) to something openair can handle?  Or am I going to be  
stuck recoding several hundred chemical names in my database?




Try using back-ticks on the invalid names, rather than quotes.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chemical Names in Data Frames

2011-09-02 Thread Gustavo Carvalho
?make.names perhaps.

On Fri, Sep 2, 2011 at 4:13 PM, Durant, James T. (ATSDR/DTEM/PRMSB)
h...@cdc.gov wrote:
 Greetings -

 I am working on some data that contain chemical names with air 
 concentrations, and I am creating a data frame with date/time and each 
 chemical having its own column. However, these are organic chemicals (e.g. 
 1-butene, 2,3,4-trimethylbenzene etc). The package I am going to be using the 
 data with is openair, and many of the great functions require you to specify 
 a column name which does not seem to work with improper column names- e.g. 
 smoothTrend(mydata, pollutant=1-Butene and smoothTrend(mydata, 
 pollutant=mydata[,1-Butene])

 I was wondering if there was a function to automatically convert these 
 chemical names (with all sorts of numbers and minuses in the beginning) to 
 something openair can handle?  Or am I going to be stuck recoding several 
 hundred chemical names in my database?

 VR

 Jim

 James T. Durant, MSPH CIH
 Emergency Response Coordinator
 US Agency for Toxic Substances and Disease Registry
 Atlanta, GA 30341
 770-378-1695





        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chemical Names in Data Frames

2011-09-02 Thread Spencer Graves

possibly with unique = TRUE:


make.names(c(', '))

[1] X. X.

make.names(c(', '), unique=TRUE)

[1] X.   X..1




  Spencer


On 9/2/2011 12:28 PM, Gustavo Carvalho wrote:

?make.names perhaps.

On Fri, Sep 2, 2011 at 4:13 PM, Durant, James T. (ATSDR/DTEM/PRMSB)
h...@cdc.gov  wrote:

Greetings -

I am working on some data that contain chemical names with air concentrations, and I am creating a 
data frame with date/time and each chemical having its own column. However, these are organic 
chemicals (e.g. 1-butene, 2,3,4-trimethylbenzene etc). The package I am going to be using the data 
with is openair, and many of the great functions require you to specify a column name which does 
not seem to work with improper column names- e.g. smoothTrend(mydata, 
pollutant=1-Butene and smoothTrend(mydata, pollutant=mydata[,1-Butene])

I was wondering if there was a function to automatically convert these chemical 
names (with all sorts of numbers and minuses in the beginning) to something 
openair can handle?  Or am I going to be stuck recoding several hundred 
chemical names in my database?

VR

Jim

James T. Durant, MSPH CIH
Emergency Response Coordinator
US Agency for Toxic Substances and Disease Registry
Atlanta, GA 30341
770-378-1695





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to keep the same class?

2011-09-02 Thread Eduardo M. A. M.Mendes
Hello

 

Please see the example below

 

 class(testX)

[1] matrix

 class(testX[1,])

[1] numeric

 

Why not matrix?   What am I missing here?   Is there a way to keep the same
class?   

 

The reason for the question is that I want to implement a k-step ahead
prediction for my own routines and R wrecks does not seem to like [1,] as
shown below.

 

 predict(fit10,testX[1,])
Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  dims of 'test' and 'train differ
 predict(fit10,testX[1:2,])
[1] 81.00 76.36

 

Many thanks

 

Ed

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to keep the same class?

2011-09-02 Thread Henrique Dallazuanna
Try this:

predict(fit10,testX[1,,drop = FALSE])

On Fri, Sep 2, 2011 at 5:05 PM, Eduardo M. A. M.Mendes
emammen...@gmail.com wrote:
 Hello



 Please see the example below



 class(testX)

 [1] matrix

 class(testX[1,])

 [1] numeric



 Why not matrix?   What am I missing here?   Is there a way to keep the same
 class?



 The reason for the question is that I want to implement a k-step ahead
 prediction for my own routines and R wrecks does not seem to like [1,] as
 shown below.



 predict(fit10,testX[1,])
 Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :
  dims of 'test' and 'train differ
 predict(fit10,testX[1:2,])
 [1] 81.00 76.36



 Many thanks



 Ed




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to keep the same class?

2011-09-02 Thread Marc Schwartz
On Sep 2, 2011, at 3:05 PM, Eduardo M. A. M.Mendes wrote:

 Hello
 
 
 
 Please see the example below
 
 
 
 class(testX)
 
 [1] matrix
 
 class(testX[1,])
 
 [1] numeric
 
 
 
 Why not matrix?   What am I missing here?   Is there a way to keep the same
 class?   
 
 
 
 The reason for the question is that I want to implement a k-step ahead
 prediction for my own routines and R wrecks does not seem to like [1,] as
 shown below.
 
 
 
 predict(fit10,testX[1,])
 Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  dims of 'test' and 'train differ
 predict(fit10,testX[1:2,])
 [1] 81.00 76.36
 
 
 
 Many thanks
 
 
 
 Ed


Ed,

See:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-my-matrices-lose-dimensions_003f

and then use:

  predict(fit10, testX[1, , drop = FALSE])

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to keep the same class?

2011-09-02 Thread Eduardo M. A. M.Mendes
Many thanks to all for the reply.

I do apologize for bothering the list with a FAQ but I have to confess that,
although I read Faq in the past, I did not remember to do it again.

Cheers

Ed


-Original Message-
From: Marc Schwartz [mailto:marc_schwa...@me.com] 
Sent: Friday, September 02, 2011 5:16 PM
To: Eduardo M. A. M.Mendes
Cc: r-help@r-project.org
Subject: Re: [R] How to keep the same class?

On Sep 2, 2011, at 3:05 PM, Eduardo M. A. M.Mendes wrote:

 Hello
 
 
 
 Please see the example below
 
 
 
 class(testX)
 
 [1] matrix
 
 class(testX[1,])
 
 [1] numeric
 
 
 
 Why not matrix?   What am I missing here?   Is there a way to keep the
same
 class?   
 
 
 
 The reason for the question is that I want to implement a k-step ahead 
 prediction for my own routines and R wrecks does not seem to like [1,] 
 as shown below.
 
 
 
 predict(fit10,testX[1,])
 Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  dims of 'test' and 'train differ
 predict(fit10,testX[1:2,])
 [1] 81.00 76.36
 
 
 
 Many thanks
 
 
 
 Ed


Ed,

See:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-my-matrices-lose-dimensi
ons_003f

and then use:

  predict(fit10, testX[1, , drop = FALSE])

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conditional replacement of character strings in vectors

2011-09-02 Thread Josh Tewksbury
Hello, I have a dataframe that looks like this:

  a b  NA Honduras  China NA  NA Sudan  Japan NA  NA Mexico  NA Mexico
I would like to replace the NA values in column b with the non-NA values in
column a.  I have tried a number of techniques, (if, ifelse) but I must have
the logic wrong.

Thanks
-- 
Josh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hessian Matrix Issue

2011-09-02 Thread tzaihra
Dear All,

I am running a simulation to obtain coverage probability of Wald type
confidence intervals for my parameter d in a function of two parameters
(mu,d).

I am optimizing it using optim method L-BFGS-B to obtain MLE. As, I
want to invert the Hessian matrix to get Standard errors of the two
parameter estimates. However, my Hessian matrix at times becomes
non-invertible that is it is no more positive definite and I get the
following error msg:

Error in solve.default(ac$hessian) : system is computationally singular:
reciprocal condition number = 6.89585e-21
Thank you

Following is the code I am running I would really appreciate your comments
and suggestions:

#Start Code
#option to trace /recover error
#options(error = recover)

#Sample Size
n-30
mu-5
size- 2

#true values of parameter d
d.true-1+mu/size
d.true

#true value of  zero inflation index phi= 1+log(d)/(1-d)
z.true-1+(log(d.true)/(1-d.true))
z.true

# Allocating space for simulation vectors and setting counters for simulation
counter-0
iter-1
lower.d-numeric(iter)
upper.d-numeric(iter)

#set.seed(987654321)

#begining of simulation loop

for (i in 1:iter){
r.NB-rnbinom(n, mu = mu, size = size)
y-sort(r.NB)
iter.num-i
print(y)
print(iter.num)
#empirical estimates or sample moments
xbar-mean(y)
variance-(sum((y-xbar)^2))/length(y)
dbar-variance/xbar
#sample estimate of proportion of zeros and zero inflation index
pbar-length(y[y==0])/length(y)

### Simplified function #

NegBin-function(th){
mu-th[1]
d-th[2]
n-length(y)

arg1-n*mean(y)*ifelse(mu = 0, log(mu),0)
#arg1-n*mean(y)*log(mu)

#arg2-n*log(d)*((mean(y))+mu/(d-1))
arg2-n*ifelse(d=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)= 0, (d-1),
0.001))

aa-numeric(length(max(y)))
a-numeric(length(y))
for (i in 1:n)
{
for (j in 1:y[i]){
aa[j]-ifelse(((j-1)*(d-1))/mu 0,log(1+((j-1)*(d-1))/mu),0)
#aa[j]-log(1+((j-1)*(d-1))/mu)
#print(aa[j])
}

a[i]-sum(aa)
#print(a[i])
}
a
arg3-sum(a)
llh-arg1+arg2+arg3
if(! is.finite(llh))
llh-1e+20
-llh
}
ac-optim(NegBin,par=c(xbar,dbar),method=L-BFGS-B,hessian=TRUE,lower=
c(0,1) )
ac
print(ac$hessian)
muhat-ac$par[1]
dhat-ac$par[2]
zhat- 1+(log(dhat)/(1-dhat))
infor-solve(ac$hessian)
var.dhat-infor[2,2]
se.dhat-sqrt(var.dhat)
var.muhat-infor[1,1]
se.muhat-sqrt(var.muhat)
var.func-dhat*muhat
var.func
d.prime-cbind(dhat,muhat)

se.var.func-d.prime%*%infor%*%t(d.prime)
se.var.func
lower.d[i]-dhat-1.96*se.dhat
upper.d[i]-dhat+1.96*se.dhat

if(lower.d[i]  = d.true  d.true= upper.d[i])
counter -counter+1
}
counter
covg.prob-counter/iter
covg.prob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Platform of image

2011-09-02 Thread John Welsh
   Dear R users,

   When I Save Workspace... and then reopen it, my platform switches
from 64-bit to 32-bit, i.e. the Gui switches between these:

   R version 2.13.1 (2011-07-08)
   Copyright (C) 2011 The R Foundation for Statistical Computing
   ISBN 3-900051-07-0
   Platform: x86_64-pc-mingw32/x64 (64-bit)

   R version 2.13.1 (2011-07-08)
   Copyright (C) 2011 The R Foundation for Statistical Computing
   ISBN 3-900051-07-0
   Platform: i386-pc-mingw32/i386 (32-bit)

   How come? This is a Windows x64 system.

   John Welsh, Ph.D.
   Associate Professor
   Molecular and Cancer Biology
   Vaccine Research Institute of San Diego
   10835 Road to the Cure
   San Diego, CA 92121
   Phone: (858) 581-3960 ex.248
   Email: jwe...@sdibr.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] misclassification rate

2011-09-02 Thread Doussa
Hi users
I'm student who is struggling with basic R programming. Would you please
help me with this problem.

My english is bad I hope that my question is clear:

I have a matrix in wich there are two colmns( yp, yt)
Yp: predicted values from my model.
yt: true values ( my dependante variable y is a categorical;3 modalities
(0,1,2)
I don't know how to procede to calculate the misclassification rate and the
error Types.

Thank you for answring

Doussa

--
View this message in context: 
http://r.789695.n4.nabble.com/misclassification-rate-tp3787075p3787075.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parameters in Gamma Frailty model

2011-09-02 Thread syhuang
Dear all,

I'm new to frailty model. I have a question on the output from 'survival'
pack. Below is the output. What does gamma1,2,3  refer to? How do I
calculate joint hazard function or marginal hazard function using info
below? Many thanks!

Call:
coxph(formula = surv ~ as.factor(tibia) + frailty(as.factor(bdcat)), 
data = try)

  n=877 (1 observation deleted due to missingness)
  coef  se(coef) se2   Chisq DF   p 
as.factor(tibia)2 0.214 0.1260.125  2.89 1.00 0.0890
frailty(as.factor(bdcat))  10.24 1.65 0.0038

  exp(coef) exp(-coef) lower .95 upper .95
as.factor(tibia)2 1.238  0.808 0.968  1.58
gamma:1   0.716  1.398 0.496  1.03
gamma:2   1.036  0.965 0.750  1.43
gamma:3   1.248  0.801 0.901  1.73

Iterations: 10 outer, 27 Newton-Raphson
 Variance of random effect= 0.0648   I-likelihood = -1756.4 
Degrees of freedom for terms= 1.0 1.6 
Rsquare= 0.016   (max possible= 0.982 )
Likelihood ratio test= 14.3  on 2.64 df,   p=0.00171
Wald test= 11.5  on 2.64 df,   p=0.00668


--
View this message in context: 
http://r.789695.n4.nabble.com/Parameters-in-Gamma-Frailty-model-tp3787013p3787013.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional replacement of character strings in vectors

2011-09-02 Thread R. Michael Weylandt
Your data frame didn't come across legibly, try sending it in plain text
using the dput() command.

That said, I'd guess you want something like this:

d[is.na(d$a),a] - d[is.na(d$b),b]

The idea is that is.na(d$a) selects only those rows where column a is NA
and then moves b values into a for only those rows.

Write back with the dput() data frame if this doesn't work.

Hope this helps,

Michael Weylandt

On Fri, Sep 2, 2011 at 3:51 PM, Josh Tewksbury tewk...@uw.edu wrote:

 Hello, I have a dataframe that looks like this:

  a b  NA Honduras  China NA  NA Sudan  Japan NA  NA Mexico  NA Mexico
 I would like to replace the NA values in column b with the non-NA values in
 column a.  I have tried a number of techniques, (if, ifelse) but I must
 have
 the logic wrong.

 Thanks
 --
 Josh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional replacement of character strings in vectors

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 3:51 PM, Josh Tewksbury wrote:


Hello, I have a dataframe that looks like this:

 a b  NA Honduras  China NA  NA Sudan  Japan NA  NA Mexico  NA Mexico
I would like to replace the NA values in column b with the non-NA  
values in
column a.  I have tried a number of techniques, (if, ifelse) but I  
must have

the logic wrong.


Mangled data but no matter:

dfrm$b[is.na(dfrm$b)] - dfrm$a[is.na(dfrm$b)]

(Learn to post in plain text. This is a plain text list.)


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROCR package question for evaluating two regression models

2011-09-02 Thread Andra Isan
Hello All, 
I have used logistic regression glm in R and I am evaluating two models both 
learned with glm but with different predictors. model1 - glm (Y ~ x4+ x5+ x6+ 
x7, data = dat, family = binomial(link=logit))model2 - glm (Y~ x1 + x2 +x3 , 
data = dat, family = binomial(link=logit)) 
and I would like to compare these two models based on the prediction that I get 
from each model:
pred1 = predict(model1, test.data, type = response)pred2 = predict(model2, 
test.data, type = response)
I have used ROCR package to compare them:pr1 = prediction(pred1,test.y)pf1 = 
performance(pr1, measure = prec, x.measure = rec)  plot(pf1) which cutoff 
this plot is based on?
pr2 = prediction(pred2,test.y)pf2 = performance(pr2, measure = prec, 
x.measure = rec)pf2_roc  = performance(pr2,measure=err)plot(pf2)
First of all, I would like to use cutoff = 0.5 and plot the ROC, 
precision-recall curves based on that cutoff value. In other words, how to 
define a cut off value in performance function?For example, in pf2_roc  = 
performance(pr2,measure=err), when I do plot(pf2_roc), it plots for every 
single cutoff point. I only want to have one cut off point, is there any way to 
do that?Second, I would like to see the performance of the two models based on 
the above measures on the same plot so the comparison would be easier. In other 
words, how can I plot (pf1, pf2) and compare them together?plot(pf1, pf2) would 
give me an error as follows:Error in as.double(x) :   cannot coerce type 'S4' 
to vector of type 'double'
Could you please help me with that?
Thanks a lot,Andra



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Platform of image

2011-09-02 Thread Jeff Newmiller
You don't say how you re-open it, but I would guess you are double-clicking the 
.RData file and letting Windows look up which program to run. If so, you can 
either open the 64-bit GUI directly and use File Open Data to open your data, 
or you can change which version of the R GUI program is linked to .RData files 
in Windows.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

John Welsh jwe...@sdibr.org wrote:

Dear R users,

When I Save Workspace... and then reopen it, my platform switches
from 64-bit to 32-bit, i.e. the Gui switches between these:

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-mingw32/x64 (64-bit)

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)

How come? This is a Windows x64 system.

John Welsh, Ph.D.
Associate Professor
Molecular and Cancer Biology
Vaccine Research Institute of San Diego
10835 Road to the Cure
San Diego, CA 92121
Phone: (858) 581-3960 ex.248
Email: jwe...@sdibr.org

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional replacement of character strings in vectors

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 7:51 PM, R. Michael Weylandt wrote:

Your data frame didn't come across legibly, try sending it in plain  
text

using the dput() command.

That said, I'd guess you want something like this:

d[is.na(d$a),a] - d[is.na(d$b),b]


One of the rare instances where I disagree with Michael. The row index  
on the right hand side must be the same as the row index on the left  
hand side.




The idea is that is.na(d$a) selects only those rows where column a  
is NA

and then moves b values into a for only those rows.


Right, that is the idea.



Write back with the dput() data frame if this doesn't work.

Hope this helps,

Michael Weylandt

On Fri, Sep 2, 2011 at 3:51 PM, Josh Tewksbury tewk...@uw.edu wrote:


Hello, I have a dataframe that looks like this:

a b  NA Honduras  China NA  NA Sudan  Japan NA  NA Mexico  NA Mexico
I would like to replace the NA values in column b with the non-NA  
values in
column a.  I have tried a number of techniques, (if, ifelse) but I  
must

have
the logic wrong.

Thanks
--
Josh


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional replacement of character strings in vectors

2011-09-02 Thread R. Michael Weylandt
Y'all are both absolutely right -- I just skimmed the problem and wrote an
answer all too briefly that both didn't do what was asked and did what it
did instead entirely incorrectly. The syntax suggested by David's email of
7:52 is what I meant to get at...

My apologies for any confusion this caused to the OP or anyone else
following the thread,

Michael

On Fri, Sep 2, 2011 at 9:03 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Sep 2, 2011, at 7:51 PM, R. Michael Weylandt wrote:

  Your data frame didn't come across legibly, try sending it in plain text
 using the dput() command.

 That said, I'd guess you want something like this:

 d[is.na(d$a),a] - d[is.na(d$b),b]


 One of the rare instances where I disagree with Michael. The row index on
 the right hand side must be the same as the row index on the left hand side.



 The idea is that is.na(d$a) selects only those rows where column a is
 NA
 and then moves b values into a for only those rows.


 Right, that is the idea.



 Write back with the dput() data frame if this doesn't work.

 Hope this helps,

 Michael Weylandt

 On Fri, Sep 2, 2011 at 3:51 PM, Josh Tewksbury tewk...@uw.edu wrote:

  Hello, I have a dataframe that looks like this:

 a b  NA Honduras  China NA  NA Sudan  Japan NA  NA Mexico  NA Mexico
 I would like to replace the NA values in column b with the non-NA values
 in
 column a.  I have tried a number of techniques, (if, ifelse) but I must
 have
 the logic wrong.

 Thanks
 --
 Josh


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] confusion matrix

2011-09-02 Thread Doussa
hi users
I have a data frame in with there are two colomns real values and predicted
ones (for a dichotomic response).
How can i obtain a confusion matrix (miscalssification rat and errors)?
The costs are egal.

Thanks 


--
View this message in context: 
http://r.789695.n4.nabble.com/confusion-matrix-tp3787363p3787363.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

2011-09-02 Thread Maya Joshi
Dear R experts.

I might be missing something obvious. I have been trying to fix this problem
for some weeks. Please help.

#data
ped - c(rep(1, 4), rep(2, 3), rep(3, 3))
y - rnorm(10, 8, 2)

# variable set 1
M1a - sample (c(1, 2,3), 10, replace= T)
M1b - sample (c(1, 2,3), 10, replace= T)
M1aP1 - sample (c(1, 2,3), 10, replace= T)
M1bP2 - sample (c(1, 2,3), 10, replace= T)

# variable set 2
M2a - sample (c(1, 2,3), 10, replace= T)
M2b - sample (c(1, 2,3), 10, replace= T)
M2aP1 - sample (c(1, 2,3), 10, replace= T)
M2bP2 - sample (c(1, 2,3), 10, replace= T)

# variable set 3
M3a - sample (c(1, 2,3), 10, replace= T)
M3b - sample (c(1, 2,3), 10, replace= T)
M3aP1 - sample (c(1, 2,3), 10, replace= T)
M3bP2 - sample (c(1, 2,3), 10, replace= T)

mydf - data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2,
M3a,M3b,M3aP1,M3bP2, y)

# functions and further calculations

mmat - matrix
(c(M1a,M2a,M3a,M1b,M2b,M3b,M1aP1,M2aP1,M3aP1,
M1bP2,M2bP2,M3bP2), ncol = 4)

# first function
myfun - function(x) {
x- as.vector(x)
ot1 - ifelse(mydf[x[1]] == mydf[x[3]], 1, -1)
ot2 - ifelse(mydf[x[2]] == mydf[x[4]], 1, -1)
qt - ot1 + ot2
return(qt)
}
qt - apply(mmat, 1, myfun)
ydv - c((y - mean(y))^2)
qtd - data.frame(ped, ydv, qt)

# second function
myfun2 - function(dataframe) {
vydv - sum(ydv)*0.25
sumD - sum(ydv * qt)
Rt - vydv / sumD
return(Rt)
}

# using plyr
require(plyr)
dfsumd1 - ddply(mydf,.(mydf$ped),myfun2)

Here are 2 issues:
(1) The output just one, I need the output for all three set of variables
(as listed above)

(2)  all three values of dfsumd is returning to same for all level of ped:
1,2, 3
Means that the function is applied to whole dataset but only replicated in
output !!!

I tried with plyr not being lazy but due to my limited R knowledge, If you
have a different suggestion, you are welcome too.

Thank you in advance...

Maya

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

2011-09-02 Thread David Winsemius


On Sep 2, 2011, at 11:18 PM, Maya Joshi wrote:


Dear R experts.

I might be missing something obvious. I have been trying to fix this  
problem

for some weeks. Please help.

#data
ped - c(rep(1, 4), rep(2, 3), rep(3, 3))
y - rnorm(10, 8, 2)

# variable set 1
M1a - sample (c(1, 2,3), 10, replace= T)
M1b - sample (c(1, 2,3), 10, replace= T)
M1aP1 - sample (c(1, 2,3), 10, replace= T)
M1bP2 - sample (c(1, 2,3), 10, replace= T)

# variable set 2
M2a - sample (c(1, 2,3), 10, replace= T)
M2b - sample (c(1, 2,3), 10, replace= T)
M2aP1 - sample (c(1, 2,3), 10, replace= T)
M2bP2 - sample (c(1, 2,3), 10, replace= T)

# variable set 3
M3a - sample (c(1, 2,3), 10, replace= T)
M3b - sample (c(1, 2,3), 10, replace= T)
M3aP1 - sample (c(1, 2,3), 10, replace= T)
M3bP2 - sample (c(1, 2,3), 10, replace= T)

mydf - data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2,
M3a,M3b,M3aP1,M3bP2, y)

# functions and further calculations

mmat - matrix
(c(M1a,M2a,M3a,M1b,M2b,M3b,M1aP1,M2aP1,M3aP1,
M1bP2,M2bP2,M3bP2), ncol = 4)

# first function
myfun - function(x) {
x- as.vector(x)
ot1 - ifelse(mydf[x[1]] == mydf[x[3]], 1, -1)


You really ought to explain what you are trying to do. This code will  
compare two lists. The list from mydf[x2]] will be the column  
mydf[M2b] which will have as its first element the four element  
vector assigned above. I am guessing that was not what you wanted.  
Notice this simple case using the [ function as you are attempting  
throws an error:


 list(M2b = c(1,2,3)) == list(M2a = c(1,2,3))
Error in list(M2b = c(1, 2, 3)) == list(M2a = c(1, 2, 3)) :
  comparison of these types is not implemented'

So ... now that your code has failed to explain what you wanted why  
don't your try some natural language explanations.


Notice that the [[ function is generally what people want when  
extracting from lists:


 list(M2b = c(1,2,3))[[M2b]] == list(M2a = c(1,2,3))[[M2a]]
[1] TRUE TRUE TRUE



ot2 - ifelse(mydf[x[2]] == mydf[x[4]], 1, -1)
qt - ot1 + ot2
return(qt)
}
qt - apply(mmat, 1, myfun)
ydv - c((y - mean(y))^2)
qtd - data.frame(ped, ydv, qt)

# second function
myfun2 - function(dataframe) {
vydv - sum(ydv)*0.25
sumD - sum(ydv * qt)
Rt - vydv / sumD
return(Rt)
}

# using plyr
require(plyr)
dfsumd1 - ddply(mydf,.(mydf$ped),myfun2)

Here are 2 issues:
(1) The output just one, I need the output for all three set of  
variables

(as listed above)


An incredibly vague description.



(2)  all three values of dfsumd is returning to same for all level  
of ped:

1,2, 3


I sympathize with those forced to adopt the English language, but that  
is the standard this decade. So givne your apparent difficulties, you  
need to exert more effort at making explicit what is _supposed_ to be  
returned.


Means that the function is applied to whole dataset but only  
replicated in

output !!!

I tried with plyr not being lazy but due to my limited R knowledge,  
If you

have a different suggestion, you are welcome too.

Thank you in advance...

Maya

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Platform of image

2011-09-02 Thread Prof Brian Ripley

On Fri, 2 Sep 2011, Jeff Newmiller wrote:

You don't say how you re-open it, but I would guess you are 
double-clicking the .RData file and letting Windows look up which 
program to run. If so, you can either open the 64-bit GUI directly 
and use File Open Data to open your data, or you can change which 
version of the R GUI program is linked to .RData files in Windows.


See the rw-FAQ Q2.29 for why (and how to change it).


Jeff Newmiller

John Welsh jwe...@sdibr.org wrote:

Dear R users,

When I Save Workspace... and then reopen it, my platform switches
from 64-bit to 32-bit, i.e. the Gui switches between these:

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-mingw32/x64 (64-bit)

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)

How come? This is a Windows x64 system.

John Welsh, Ph.D.
Associate Professor
Molecular and Cancer Biology
Vaccine Research Institute of San Diego
10835 Road to the Cure
San Diego, CA 92121
Phone: (858) 581-3960 ex.248
Email: jwe...@sdibr.org


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.