date:20120117

Re: [R] Correct Localized Numbers on Plots, related to glibc!

2012-01-17 Thread Majid Einian

 You need to read ?Sys.setlocale (surely part of the homework the R posting 
 guide asked of you).

I have read that. And I have used the standard way of changing locale
to fa_IR.utf8 (i.e. using
Sys.setlocale(category=LC_ALL,locale=fa_IR.utf8). Also
Sys.setlocale(category=LC_NUMERIC,locale=fa_IR.utf8) )  The output
of Sys.localeconv at the end of my email should have made this clear.


 It is vital that you give the 'at a minimum' information requested in the 
 posting guide when asking such questions, and we also absolutely need to know 
 what graphics device you are trying to use.

The result is the same on RStudioGD, X11, pdf, ps, cairo_pdf,
cairo_ps, svg, png, jpeg, bmp. (I do not know if this is what you are
asking for. if not let me what exactly you are asking and how can  I
find it please.)
My sessionInfo is as follows:

R version 2.14.1 (2011-1-22)
Platform: i486-pc-linux-gnu (32 bit)

locale:
 [1]  LC_CTYPE=fa_IR.utf8   LC_NUMERIC=fa_IR.utf8
 [3]  LC_TIME=fa_IR.utf8LC_COLLATE=fa_IR.utf8
 [5]  LC_MONETARY=fa_IR.utf8LC_MESSAGES=en_US.UTF-8
 [7]  LC_PAPER=CLC_NAME=C
 [9]  LC_ADDRESS=C  LC_TELEPHONE=C
[11]  LC_MEASURMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats   graphicsgrDevices   utils   datasetsmethods 
base

loaded via a namespace (and not attached):
[1] tools_2.14.1

 I have read the posting guide. I had not posted the sessionInfo as
the information on system and graphic device seemed irrelevant, as it
does not change in different devices and also on Microsoft Windows
the locale Persian_Iran.1256 does not change anything. I have not
talked about windows as I don't expect windows to do anything
correctly on Persian.



 On 15/01/2012 08:26, Majid Einian wrote:

 Dear R Helpers,

 I want to localize my plots, i.e. the numbers by x  y axis be

 Persian, using Persian numerals and Persian decimal separator. I
 change the locale to fa_IR.utf8, but nothing on plots change. I can


 How, precisely?  Please show us exactly what you did, with a reproducible 
 example.


Well, I assume that changing the locale should do the trick, but it
does not. As I mentioned earlier:
X11()
Sys.setlocale(category=LC_ALL,locale=fa_IR.utf8)
Sys.setlocale(category=LC_NUMERIC,locale=fa_IR.utf8)
plot(gdpr)


 change the numerals shaping to Persian ones (۱۲۳۴ instead of 1234)
 using some non-standard fonts but the decimal point is a problem. I
 asked about that in Persian-Computing mailing list and I got the
 answer that follows. I don't know how should I use this l flag
 mentioned in the answer in R plots (I'm using simple R plots, no
 special library).

 Has anybody had similar problem in any language (maybe Arabic, other
 languages I'm not sure use different numeral characters).
 Also I don't have e.g. French locale on my system to see if the
 decimal separator changes accordingly to locale for them.


 It will if they followed the documentation.


 Thanks in advance.

 -
  Roozbeh Pournaderrooz...@gmail.com  Tue, Dec 6, 2011 at 3:47 AM
 To: Majid Einianeinia...@gmail.com
 Cc: persian-comput...@googlegroups.com

 The glibc model for generating numbers is kind of complex. For using
 native digits, one is supposed to use the I flag. For example, in
 order to get ۱۲٫۳, you should do printf(%I.1f, 12.3).

 This is to make sure applications have a way to output both ASCII
 numbers, and native numbers.

 Roozbeh




 --
 Brian D. Ripley,                  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,             Tel:  +44 1865 272861 (self)
 1 South Parks Road,                     +44 1865 272866 (PA)
 Oxford OX1 3TG, UK                Fax:  +44 1865 272595

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Majid Einian,
PhD Candidate in Economics,
Graduate School of Management and Economics,
Sharif University of Technology,
Tehran, IRAN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] MuMIn package, problem using model selection table from manually created list of models

2012-01-17 Thread Dunbar, Michael J.


The subject says it all really.

Question 1.
Here is some code created to illustrate my problem, can anyone spot where I'm 
going wrong?

Question 2.
The reason I'm following a manual specification of models relates to the fact 
that in reality I am using mgcv::gam, and I'm not aware that dredge is able to 
separate individual smooth terms out of say s(a,b). Hence an additional 
request, if anyone has example code for using gam in a multimodel inference 
framework, especially with bivariate smooths, I'd be most grateful.

Cheers and Thanks in Advance
Mike

require(MuMIn)
data(Cement)
# option 1, create model.selection object using dredge
fm0 - lm(y ~ ., data = Cement)
print(dd - dredge(fm0))
fm1 - lm(formula = y ~ X1 + X2, data = Cement)
fm2 - lm(formula = y ~ X1 + X2 + X4, data = Cement)
fm3 - lm(formula = y ~ X1 + X2 + X3, data = Cement)
fm4 - lm(formula = y ~ X1 + X4, data = Cement)
fm5 - lm(formula = y ~ X1 + X3 + X4, data = Cement)
# ranked with AICc by default
# obviously this works
model.avg(get.models(dd, delta  4))

#  option 2: the aim is to produce a model selection object comparable to that 
from get.models(dd, delta  4)
# but from a manually-specified list of models
my.manual.selection - mod.sel(list(fm1, fm2, fm3, fm4, fm5))
# works
model.avg(list(fm1, fm2, fm3, fm4, fm5)) # or jut model.avg(fm1, fm2, fm3, fm4, 
fm5)
# doesn't work
model.avg(my.manual.selection)
# hence this doesn't work
get.models(my.manual.selection, delta  4)


-- 
This message (and any attachments) is for the recipient ...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MuMIn package, problem using model selection table from manually created list of models

2012-01-17 Thread Kamil Bartoń


Dnieper 2012-01-17 10:51, Dunbar, Michael J. piste:


The subject says it all really.

Question 1.
Here is some code created to illustrate my problem, can anyone spot where I'm 
going wrong?

Question 2.
The reason I'm following a manual specification of models relates to the fact 
that in reality I am using mgcv::gam, and I'm not aware that dredge is able to 
separate individual smooth terms out of say s(a,b). Hence an additional 
request, if anyone has example code for using gam in a multimodel inference 
framework, especially with bivariate smooths, I'd be most grateful.


You can model average the coefficients, but not the terms.


Cheers and Thanks in Advance
Mike

require(MuMIn)
data(Cement)
# option 1, create model.selection object using dredge
fm0- lm(y ~ ., data = Cement)
print(dd- dredge(fm0))
fm1- lm(formula = y ~ X1 + X2, data = Cement)
fm2- lm(formula = y ~ X1 + X2 + X4, data = Cement)
fm3- lm(formula = y ~ X1 + X2 + X3, data = Cement)
fm4- lm(formula = y ~ X1 + X4, data = Cement)
fm5- lm(formula = y ~ X1 + X3 + X4, data = Cement)
# ranked with AICc by default
# obviously this works
model.avg(get.models(dd, delta  4))

#  option 2: the aim is to produce a model selection object comparable to that 
from get.models(dd, delta  4)
# but from a manually-specified list of models
my.manual.selection- mod.sel(list(fm1, fm2, fm3, fm4, fm5))
# works
model.avg(list(fm1, fm2, fm3, fm4, fm5)) # or jut model.avg(fm1, fm2, fm3, fm4, 
fm5)
# doesn't work
model.avg(my.manual.selection)



# hence this doesn't work
get.models(my.manual.selection, delta  4)


There is no need to recreate the models (which is what get.models does) once you have them already 
as a list.


models - list(fm1, fm2, fm3, fm4, fm5)
my.manual.selection - mod.sel(models)
model.avg(models[ my.manual.selection$delta  4 ])

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New PLYR issue

2012-01-17 Thread Gunnar Oehmichen

Hello everyone,

I have got the same problem, with the same error message.

Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP

The plyr mailing list does not provide any help until now.

 require(plyr)

 c(sample(c(1:100), 50, replace=TRUE))-V1

 c(rep( 1:5, 10))-f1 #variable to group V1

 data.frame(cbind(V1, f1))-DF

 str(DF)

 ddply(DF$V1, DF$f1, sd)
 ddply(.(DF$V1), .(DF$f1), sd)

/Error in if (empty(.data)) return(.data) : /
/missing value where TRUE/FALSE needed
/

/Thanks everyone,

/



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Display numbers on map

2012-01-17 Thread Jeffrey Joh


I have a text file with states and numbers.  I would like to display each 
number that corresponds to a state on a map.

I am trying to use the maps package, but it doesn't show Alaska or Hawaii.  Do 
you have suggestions on how to do this?

Jeffrey
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Averaging within a range of values

2012-01-17 Thread doggysaywhat

Thank you all for your help.  It's working now.  I chose to use the sqldf
method and fn$ in the gsubfn package.  I used fn$ so that I could put
variables into the sqldf statement.  This helped me to increase or decrease
the window size in the Gene dataframe if I wanted to include values in the
average both before or after the gene.  

For anyone else interested, this is what it looked like.

Avg_func-function(chr,begin,finish){
fn$sqldf(paste(select d1C,chr,.'ORF', avg(C0), avg(C1)
 from d1C,chr,, d2C,chr, 
 where d2C,chr,.Pos between d1C,chr,.Start-,begin, and
d1C,chr,.End+,finish, 
 group by d1C,chr,.'ORF',sep=))
}

I used d1C1 etc. because I had a different file for each chromosome that I
read into separate dataframes in R.  The dataframes also had ORF (Open
Reading Frame) instead of Group.  




--
View this message in context: 
http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4302910.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] meta-analysis normal quantile plot metafor

2012-01-17 Thread Michael Dewey


At 12:15 16/01/2012, Ricc wrote:

Hello,
I used the default parameters:
- envelope: default is TRUE
- level: the default is to take the value from the object (I do not
understand this very well)


When you specified your original model to rma.uni you either 
specified the level or let it default (to 95). That value is stored 
in the object returned by rma.uni and used by qqnorm.rma.uni unless 
you override it. You can see what is in the object returned by 
rma.uni by using str.



- bonferroni : no
- reps : default is 1000
- smooth: default is TRUE
- bass : default is 0

 I used no other arguments.


2012/1/13 Michael Dewey i...@aghmed.fsnet.co.uk:
 At 15:53 11/01/2012, Ricc wrote:

 Hello,

 I once used the metawin software to perform a meta-analysis (see
 metawinsoft, Rosenberg et al.) and produced normal qqplot to test for
 a potential bias in the dataset.
 I now want to re-use the same dataset with the package metafor by W.
 Viechtbauer (great package btw).

 I run the qqnorm.rma.uni function. I use standardized effect sizes as
 in metawin.


 I think it would help if you said which parameters you used to control the
 envelope. Did you smooth it? Did you use the Bonferroni correction?



 QQplot generated with metafor differs from the plot obtained with
 metawin: most of the datapoint fall outside the confidence envelope
 (using the same confidence level). I don't understand very well how
 the pseudo confidence envelope was created in metafor. Is it more
 conservative than that from metawin or created using the package
 envelope ? Unfortunately I do not have access to metawin's code so
 that I cannot compare implementations but the manual let me think that
 metawin print classical confidence interval...

 Thanks for input !
 Ricc

 More precisions:
 R version 2.13.1 (2011-07-08)
 Platform: x86_64-pc-linux-gnu (64-bit)
 metafor_1.6-0


 Michael Dewey
 i...@aghmed.fsnet.co.uk
 http://www.aghmed.fsnet.co.uk/home.html



Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking dates for entry errors

2012-01-17 Thread Thomas Mang


On 1/11/2012 11:07 PM, Paul Miller wrote:

Hello Everyone,

I have a question about how best to check dates for entry errors.


Try using regular expression matching and the functions grep, strsplit, 
regexpr etc.
If you are not familiar with regex: bit a bumpy road of getting into it 
but in the long term definitely worth the effort !


cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] meta-analysis normal quantile plot metafor

2012-01-17 Thread Ricc

@ Micheal: thanks I understand now.
@ Wolfgang: apparently, using the DL-estimator solved my issue and
leaded to a result with only a slight difference with metawin. thanks
!


2012/1/17 Michael Dewey i...@aghmed.fsnet.co.uk:
 At 12:15 16/01/2012, Ricc wrote:

 Hello,
 I used the default parameters:
 - envelope: default is TRUE
 - level: the default is to take the value from the object (I do not
 understand this very well)


 When you specified your original model to rma.uni you either specified the
 level or let it default (to 95). That value is stored in the object returned
 by rma.uni and used by qqnorm.rma.uni unless you override it. You can see
 what is in the object returned by rma.uni by using str.


 - bonferroni : no
 - reps : default is 1000
 - smooth: default is TRUE
 - bass : default is 0

  I used no other arguments.


 2012/1/13 Michael Dewey i...@aghmed.fsnet.co.uk:
  At 15:53 11/01/2012, Ricc wrote:
 
  Hello,
 
  I once used the metawin software to perform a meta-analysis (see
  metawinsoft, Rosenberg et al.) and produced normal qqplot to test for
  a potential bias in the dataset.
  I now want to re-use the same dataset with the package metafor by W.
  Viechtbauer (great package btw).
 
  I run the qqnorm.rma.uni function. I use standardized effect sizes as
  in metawin.
 
 
  I think it would help if you said which parameters you used to control
  the
  envelope. Did you smooth it? Did you use the Bonferroni correction?
 
 
 
  QQplot generated with metafor differs from the plot obtained with
  metawin: most of the datapoint fall outside the confidence envelope
  (using the same confidence level). I don't understand very well how
  the pseudo confidence envelope was created in metafor. Is it more
  conservative than that from metawin or created using the package
  envelope ? Unfortunately I do not have access to metawin's code so
  that I cannot compare implementations but the manual let me think that
  metawin print classical confidence interval...
 
  Thanks for input !
  Ricc
 
  More precisions:
  R version 2.13.1 (2011-07-08)
  Platform: x86_64-pc-linux-gnu (64-bit)
  metafor_1.6-0
 
 
  Michael Dewey
  i...@aghmed.fsnet.co.uk
  http://www.aghmed.fsnet.co.uk/home.html
 


 Michael Dewey
 i...@aghmed.fsnet.co.uk
 http://www.aghmed.fsnet.co.uk/home.html


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error predict with lda and cross validation

2012-01-17 Thread Riccardo Romoli

Hi, I use the lda function from the MASS package to classify some  
samples  according to some chemical properties. If I run lda without  
cross validation all is ok but, if I run lda with cross validation,  
the R consol say:


resLDA - ldaRedOx - lda(Activity ~ TRedOx[,1:6], CV=TRUE,  
data=dfDataRedOx, subset=train)

predLDA - predict(resLDA, newdata=dfDataRedOx[-train,])$class

 predLDA - predict(resLDA, newdata=dfDataRedOx[-train,])$class
Error in UseMethod(predict) :
  no applicable method for 'predict' applied to an object of class  
list



How should I use predict  function with lda with the cross validation?

Best
Riccardo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Mean of simulation runs given in a table

2012-01-17 Thread Irek Szczesniak

Hi,

I have the simulation results of the following structure:

run par   measured
1   1012
2   1014
1   2020
2   2026

Where run is the simulation run number, par is the parameter of
the simulation, and measured is the value measured in the
simulation.  This is only a simple example of my results.  There are
many values measured and many parameters.  But the basic structure
stays the same: there are many runs (identified by the run number) for
the same values of the parameters with various measured values -- they
constitute a sample.

I would like to calculate the mean of the measured value for a
sample, and so I would like to obtain the output as follows:

par mean
10  13
20  23

I would appreciate it if someone could write me how to do it.


Thank you,
Irek

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] r help

2012-01-17 Thread masarimk

df-data.frame(
group=rep(c(a,b), c(7,6)), 
sales=c(3,4,6,5,1,30,8,4,6,9,10,27,9), 
turn_over=c(1.5, 2.9, 1.9,20.5, 2.3, 1.65, 0.06, 3.4, 3.5, 2.23, 0.1,
9.8,1.4)
)

Hello all,

In this data set  ı need to replace the outliers with 1.5IQR for each group
and for each variable so the final data set should look like
sales=c(3,4,6,5,1,8,8,4,6,9,10,10,9), 
turn_over=c(1.5, 2.9, 1.9,2.9, 2.3, 1.65, 0.06, 3.4, 3.5, 2.23, 0.1,
3.5,1.4)
so far 
I did try
k=boxplot(df[, c(sales,turn_over)], df[,groupID]) any help will be
appreciated.


--
View this message in context: 
http://r.789695.n4.nabble.com/r-help-tp4303025p4303025.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] result numeric(0) when using variable1[which(variable2=max(variable2)]

2012-01-17 Thread Nerak

Dear all,
I have a question about the knowing for which row I have the max value of
one of my variables.
I calculated the Rsquared for different columns and made a list to gather
them. I unlisted this list to create a vector with this values. I want to
know for which column I have the max value of Rsquared.  
The columns were always named in the same way. They always start with
results4$depth_ following by the number. The numbers are constructed as:
seq(1,10,0.1). But if the R squared values are now in 1 column, I don’t know
for which column they are calculated. So I made a new data frame with both
columns:
R2 - unlist(LIST) 
Cvalue - c(seq(1,10,0.1)) 
results5 - data.frame(Cvalue,R2) 

# I know I can calculate the max value of Rsquared by this way: 

max(results5$R2) 

# now I want to know to which Cvalue this belongs. I would write it like
this: 
results5$Cvalue[which(results5$R2 == max(results5$R2))] 
# But I always get the solution: 
numeric(0) 
# I don’t know if these Rsquared values are in a kind of format that this
doesn’t work? (I used before for similar things, and I experienced that for
example it cannot works if R recognizes the values as a date).  Maybe
because it’s with a lot of decimals? (eg 2.907530e-01) I know that
max(results5$R2) is in this example 0.6081547 and I can see that that
belongs to the Cvalue == 1.8. It works in the opposite way. 
results5$R2[which(results5$Cvalue == 1.8)] 
# But neither 
results5$Cvalue[which(results5$R2 == 0.6081547)] 
# nor 
results5$Cvalue[which(results5$R2 == max(results5$R2))] 
# works…

I hope someone can help me with this problem
Kind regards
Nerak

--
View this message in context: 
http://r.789695.n4.nabble.com/result-numeric-0-when-using-variable1-which-variable2-max-variable2-tp4302887p4302887.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] visualization for k-mean clustering

2012-01-17 Thread mukul purva

hello,
 i want a visualization of the k-mean clustering.which one method
will be best for visualization??



thnkx

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scoring using cox model: probability of survival before time t

2012-01-17 Thread Aher

Dear Members,
I required to score probability of survival before specified time using
fitted cox model on scoring dataset.
On the training sample data I am able to get the probability of a survival
before time point(t), but on the scoring dataset, which will have only
predictor information I am facing some issues. It would be great help for me
if you tell me where am I going wrong!
Here is the sample script!

#
library(survival) 

n = 100
beta1 = 3; beta2 = -2; 
lambdaT = .01 
lambdaC = .6  
x1 = rnorm(n,0) 
x2 = rnorm(n,0) 

T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) 
C = rweibull(n, shape=1, scale=lambdaC)   
time = pmin(T,C)  
event = time==T   
train_sample=data.frame(time,event,x1,x2) 

rm(time,event,x1,x2)

fit_coxph - coxph(Surv(time, event)~ x1 + x2, data= train_sample,
method=breslow) 

#Save model to some directory
save(fit_coxph, file = file.path(C:/Desktop,fit_coxph.RData))

#I can get probabilities on train_sample as below:
library(peperr)
pred_train - predictProb.coxph(fit_coxph, Surv(train_sample$time,
train_sample$event),  train_sample, 0.4)
head(pred_train)

#[,1]
#[1,] 5.126281e-03
#[2,] 4.324882e-01
#[3,] 4.444506e-61
#[4,] 0.00e+00
#[5,] 0.00e+00
#[6,] 3.249947e-01


#In the same line, I need probabilities on scoring_data. Now, close the
earlier session and run the below script in the new #R session, it gives
error.

library(survival) 
library(peperr)


load(file = file.path(C:/Desktop,fit_coxph.RData))

n = 1000
set.seed(1) 
x1 = rnorm(n,0) 
x2 = rnorm(n,0) 
score_data - data.frame(x1,x2)
pred_score - predictProb.coxph(fit_coxph, Surv(time, event),  score_data,
0.04)

#Error in Surv(time, event) : Time variable is not numeric

#After  creating dummy place holder for Surv(time, event), it gives another
error:

time - rep(2, n)
event - rep(1, n)
pred_score - predictProb.coxph(fit_coxph, Surv(time, event),  score_data,
0.04)

#Error in inherits(x, data.frame) : object 'train_sample' not found  


Appreciate your help, is there any other way to get these probabilities on
newdata. 
Thanks in advance
~Aher



--
View this message in context: 
http://r.789695.n4.nabble.com/Scoring-using-cox-model-probability-of-survival-before-time-t-tp4302775p4302775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to loop on file names

2012-01-17 Thread Bretschneider SIG-R

Dear Hélène Genet,


Re:

 Dear all,
 
 I need to do the same procedure on several files. But I don't know how to 
 refer to the file name.
 Here is an example of what I am trying to do.
 
 List of files: file1(A,B,C, D1...Dn), file2(A,B,C,E1,...,En), 
 file3(A,B,C,F1,...,Fn)
 
 Procedure I want to apply on each file:
 
 dft - melt(df,id=c('A','B','C'))
 dft$X - substr(dft$variable,1,3)
 dft$Y - substr(dft$variable,4,8)
 dft1 - cast(dft, A+B+C+X ~ Y,value=response)
 
 As you see all the files contains the same 3 variables A,B,C that I use in 
 the procedure.
 So I want to apply the procedure on all the file in a loop.
 Something like :
 
 filelist - c('file1' , 'file2' , 'file3')
 
 for (i in 1:3) {
 filename - filelist[i]
 ...
 }
 
 Any suggestion to refer to these files in this loop?
 
 Thanks you in advance,
 
 Helene
 
 -- 
 Hélène Genet, PhD
 Institute of Arctic Biology
 University of Alaska Fairbanks
 Irving I Blg, Room 402
 Fairbanks AK 99775
 
 Phone: 907-474-5472
 Cell: 907-699-4340
 Email: hge...@alaska.edu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 



I use this procedure to point to a folder (of WAV files in this example) and 
process all those files in a loop:


#  select FOLDER w. WAV-files
fnam = dirname(file.choose())# choose any one file from the folder (= 
directory)
filist = list.files(fnam, recursive=TRUE, pattern=wav)
filist1 = paste(fnam,/,filist, sep=)
nfiles = length(filist1)
#  
#  filenames loop ===
for(i in 1:nfiles) {
inname=filist1[i]
ywave=readWave(inname)  # read the i'th file  (wav as an example, can 
be any filetype you need)
ywave2=ywave  # output is the same as input (yet)
###
###   DO ANY PROCESSING YOU WANT HERE to change ywave into ywave2
###
outname=paste(dirname(inname), /*,basename(inname), sep=)
writeWave(ywave2,outname)
}


I took these few lines out of a larger program with wave file processing not 
relevant here.
Hope I didn't forget something, and that this works for your case,


Beste wishes,



Franklin Bretschneider
Utrecht University
--
Dept Biologie
Kruytgebouw W711
Padualaan 8
3584 CH Utrecht
The Netherlands

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving WinBugs log file when using bugs()

2012-01-17 Thread Uwe Ligges




On 16.01.2012 18:00, chaps31 wrote:

Hi
The log file (not as .odc and as .txt file) is in R'd tempdir() after
WinBUGS returns to R.


Was this my statement?

I actually meant to write *both* rather than not above.

Best,
Uwe




Is there any way to save the .odc file?

Ruth

--
View this message in context: 
http://r.789695.n4.nabble.com/Saving-WinBugs-log-file-when-using-bugs-tp4299827p4300162.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] net classification improvement?

2012-01-17 Thread Essers, Jonah

Thanks for the reply. I think more the issue is whether it can be applied
to cross-sectional data. This I'm not sure. This method is heavily cited
in the New England Journal of Medicine, but thus far I've only seen it
used with longitudinal data.

On 1/16/12 10:23 PM, Kevin E. Thorpe kevin.tho...@utoronto.ca wrote:

On 01/16/2012 08:10 PM, Essers, Jonah wrote:
 Greetings,

 I have generated several ROC curves and would like to compare the AUCs.
 The data are cross sectional and the outcomes are binary. I am testing
 which of several models provide the best discrimination. Would it be
most
 appropriate to report AUC with 95% CI's?

 I have been looking in to the net reclassification improvement (see
 below for reference) but thus far I can only find a version in Hmisc
 package which requires survival data. Any idea what the best approach is
 for cross-sectional data?

I believe that the function in Hmisc that does this will also work on
binary data.


 Thanks

 Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating the
 added predictive ability of a new marker: from area under the ROC curve
to
 reclassification and beyond. Stat Med 2008;27:157-172



-- 
Kevin E. Thorpe
Biostatistician/Trialist,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] formula in function as text?

2012-01-17 Thread Julia Burggraaf

Hello all,

It might be a simple question, but I cannot find the solution, as I do not
know which subjects I should search on. So, much thanks for he/she we can
help me.
I am creating a function and would like to place a formula in the function,
without it being executed immediately. Like saving it temporary as 'text'.

Simplified version of what I would like to be able to do:

test-function(a,x){
if(a5){ b-3+ x[i]}
if(a5){ b- 6 + x[i]}
y-1:10
for (i in 1:10){y[i]-4 + b}
return(y)
}

In my perfect world, R will replace b in the formula y=4+b by the
appropriate b, indicated by the condition (value of a).
It now takes for 'b' only the first argument of x (+3 or 6). I know I can
solve the problem by also looping over b and turning it into a vector, but
I would like to know if it is also possible in the way stated above. If I
put 3+x[i] in  to make it a character, it will still be character at
y-4+b or when I use as.numeric, it will create NA

Thanks in advance,

Julia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] BLAS

2012-01-17 Thread Scott Raynaud

I'm setting up an Ubuntu virtual machine that will use 4-Intel Xeon CPU x5650.  
I'd like to compile R with a BLAS but the question is whcih one.  Seems 
like the only free ones are GotoBLAS which I'm not sure is being maintained 
for newer CPUs and OpenBLAS for Loongson CPUs.  I saw a favorable report 
on OpenBLAS 
(http://www.rochester.edu/college/gradstudents/jolmsted/files/computing/BLAS_Comparison.pdf),
 
but I'm not sure it's the right thing for my CPUs.  The webpage for OpenBLAS 
says, On X86 box, compile this library for loongson3a CPU.  Any opinions 
on whether this will work?  If not, any suggestions on another free BLAS?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting probability density and cumulative distribution function

2012-01-17 Thread chrisr34000

Hi!

I want to plot the probability density function and the cumulative
distribution function for the gamma, lognormal, exponential and Pareto
distribution. I want to vary the parameters and and have a plot with 2-3
different parameters in the same figure. It should look like this (example
weibull distribution):

http://en.wikipedia.org/wiki/File:Weibull_PDF.svg

library(Cairo)
CairoFonts(regular=DejaVu Sans:style=Regular)
CairoSVG(Weibull PDF.svg)
par(mar=c(3, 3, 1, 1))
x - seq(0, 2.5, length.out=1000)
plot(x, dweibull(x, .5), type=l, col=blue, xlab=, ylab=, xlim=c(0,
2.5), ylim=c(0, 2.5), xaxs=i, yaxs=i)
lines(x, dweibull(x, 1), type=l, col=red)
lines(x, dweibull(x, 1.5), type=l, col=magenta)
lines(x, dweibull(x, 5), type=l, col=green)
legend(topright, legend=paste(\u03bb = 1, k =, c(.5, 1, 1.5, 5)), lwd=1,
col=c(blue, red, magenta, green))
dev.off()

and

http://en.wikipedia.org/wiki/File:Weibull_CDF.svg

library(Cairo)
CairoFonts(regular=DejaVu Sans:style=Regular)
CairoSVG(Weibull CDF.svg)
par(mar=c(3, 3, 1, 1))
x - seq(0, 2.5, length.out=1000)
plot(x, pweibull(x, .5), type=l, col=blue, xlab=, ylab=, xlim=c(0,
2.5), ylim=c(0, 1), xaxs=i, yaxs=i)
lines(x, pweibull(x, 1), type=l, col=red)
lines(x, pweibull(x, 1.5), type=l, col=magenta)
lines(x, pweibull(x, 5), type=l, col=green)
legend(bottomright, legend=paste(\u03bb = 1, k =, c(.5, 1, 1.5, 5)),
lwd=1, col=c(blue, red, magenta, green))
dev.off()

How do I adapt the syntax to use it for the other distributions?



--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-probability-density-and-cumulative-distribution-function-tp4303198p4303198.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean of simulation runs given in a table

2012-01-17 Thread Uwe Ligges




On 17.01.2012 12:31, Irek Szczesniak wrote:

Hi,

I have the simulation results of the following structure:

run par   measured
1   1012
2   1014
1   2020
2   2026

Where run is the simulation run number, par is the parameter of
the simulation, and measured is the value measured in the
simulation.  This is only a simple example of my results.  There are
many values measured and many parameters.  But the basic structure
stays the same: there are many runs (identified by the run number) for
the same values of the parameters with various measured values -- they
constitute a sample.

I would like to calculate the mean of the measured value for a
sample, and so I would like to obtain the output as follows:

par mean
10  13
20  23

I would appreciate it if someone could write me how to do it.



For you data in a data.frame called dat:

aggregate(measured ~ par, dat, mean)

Uwe Ligges




Thank you,
Irek

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error predict with lda and cross validation

2012-01-17 Thread Uwe Ligges




On 17.01.2012 14:54, Riccardo Romoli wrote:

Hi, I use the lda function from the MASS package to classify some
samples according to some chemical properties. If I run lda without
cross validation all is ok but, if I run lda with cross validation, the
R consol say:

resLDA - ldaRedOx - lda(Activity ~ TRedOx[,1:6], CV=TRUE,
data=dfDataRedOx, subset=train)
predLDA - predict(resLDA, newdata=dfDataRedOx[-train,])$class

  predLDA - predict(resLDA, newdata=dfDataRedOx[-train,])$class
Error in UseMethod(predict) :
no applicable method for 'predict' applied to an object of class list
 

How should I use predict function with lda with the cross validation?


See ?lda : it has a very different output if CV=TRUE is used. Hence you 
have to prepare it with CV=FALSE in order to make predictions again.
It does not make sense to ask LDA for a cross validation an run another 
test later on, does it?


Uwe Ligges






Best
Riccardo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] result numeric(0) when using variable1[which(variable2=max(variable2)]

2012-01-17 Thread Duncan Murdoch


On 17/01/2012 5:35 AM, Nerak wrote:

Dear all,
I have a question about the knowing for which row I have the max value of
one of my variables.
I calculated the Rsquared for different columns and made a list to gather
them. I unlisted this list to create a vector with this values. I want to
know for which column I have the max value of Rsquared.
The columns were always named in the same way. They always start with
results4$depth_ following by the number. The numbers are constructed as:
seq(1,10,0.1). But if the R squared values are now in 1 column, I don’t know
for which column they are calculated. So I made a new data frame with both
columns:
R2- unlist(LIST)
Cvalue- c(seq(1,10,0.1))
results5- data.frame(Cvalue,R2)

# I know I can calculate the max value of Rsquared by this way:

max(results5$R2)

# now I want to know to which Cvalue this belongs. I would write it like
this:
results5$Cvalue[which(results5$R2 == max(results5$R2))]


Don't use quotes on the expression.  None of your R2 values are the string

max(results5$R2)

which is why you're getting numeric(0).

You can also make it simpler by using the which.max() function.

Duncan Murdoch


# But I always get the solution:
numeric(0)
# I don’t know if these Rsquared values are in a kind of format that this
doesn’t work? (I used before for similar things, and I experienced that for
example it cannot works if R recognizes the values as a date).  Maybe
because it’s with a lot of decimals? (eg 2.907530e-01) I know that
max(results5$R2) is in this example 0.6081547 and I can see that that
belongs to the Cvalue == 1.8. It works in the opposite way.
results5$R2[which(results5$Cvalue == 1.8)]
# But neither
results5$Cvalue[which(results5$R2 == 0.6081547)]
# nor
results5$Cvalue[which(results5$R2 == max(results5$R2))]
# works…

I hope someone can help me with this problem
Kind regards
Nerak

--
View this message in context: 
http://r.789695.n4.nabble.com/result-numeric-0-when-using-variable1-which-variable2-max-variable2-tp4302887p4302887.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] result numeric(0) when using variable1[which(variable2=max(variable2)]

2012-01-17 Thread Berend Hasselman


On 17-01-2012, at 11:35, Nerak wrote:

 Dear all,
 I have a question about the knowing for which row I have the max value of
 one of my variables.
 I calculated the Rsquared for different columns and made a list to gather
 them. I unlisted this list to create a vector with this values. I want to
 know for which column I have the max value of Rsquared.  
 The columns were always named in the same way. They always start with
 results4$depth_ following by the number. The numbers are constructed as:
 seq(1,10,0.1). But if the R squared values are now in 1 column, I don’t know
 for which column they are calculated. So I made a new data frame with both
 columns:
 R2 - unlist(LIST) 
 Cvalue - c(seq(1,10,0.1)) 
 results5 - data.frame(Cvalue,R2) 
 
 # I know I can calculate the max value of Rsquared by this way: 
 
 max(results5$R2) 
 
 # now I want to know to which Cvalue this belongs. I would write it like
 this: 
 results5$Cvalue[which(results5$R2 == max(results5$R2))] 
 # But I always get the solution: 
 numeric(0) 

You haven't provided a reproducible example.
So I tried this

set.seed(1)
x - round(runif(10),3)
x
which.max(x)
which(x==max(x))
x
which(x==0.945)
which(x==max(x))
which(x==max(x))
x[which(x==max(x))]

If you run this you will see that the last line results in numeric(0).

So; why are you using quotes in the which expression? Is results5$R2 a 
character string?
This should work

results5$Cvalue[which(results5$R2 == max(results5$R2))]

But this is shorter

results5$Cvalue[which.max[results5$R2)]

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula in function as text?

2012-01-17 Thread Berend Hasselman


On 17-01-2012, at 13:36, Julia Burggraaf wrote:

 Hello all,
 
 It might be a simple question, but I cannot find the solution, as I do not
 know which subjects I should search on. So, much thanks for he/she we can
 help me.
 I am creating a function and would like to place a formula in the function,
 without it being executed immediately. Like saving it temporary as 'text'.
 
 Simplified version of what I would like to be able to do:
 
 test-function(a,x){
 if(a5){ b-3+ x[i]}
 if(a5){ b- 6 + x[i]}
 y-1:10
 for (i in 1:10){y[i]-4 + b}
 return(y)
 }
 
 In my perfect world, R will replace b in the formula y=4+b by the
 appropriate b, indicated by the condition (value of a).
 It now takes for 'b' only the first argument of x (+3 or 6). I know I can
 solve the problem by also looping over b and turning it into a vector, but
 I would like to know if it is also possible in the way stated above. If I
 put 3+x[i] in  to make it a character, it will still be character at
 y-4+b or when I use as.numeric, it will create NA


You are complicating matters.

test - function(a,x){
if(a5){ b - 3+ x} else if(a5){ b- 6 + x} # b is now a vector
y - 4 + b
   return(y)
}

Puzzle for you to solve: what happens when a is identical to 5?

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula in function as text?

2012-01-17 Thread Petr PIKAL

Hi
 
 Hello all,
 
 It might be a simple question, but I cannot find the solution, as I do 
not
 know which subjects I should search on. So, much thanks for he/she we 
can
 help me.
 I am creating a function and would like to place a formula in the 
function,
 without it being executed immediately. Like saving it temporary as 
'text'.
 
 Simplified version of what I would like to be able to do:
 
 test-function(a,x){
 if(a5){ b-3+ x[i]}

What is i?

 if(a5){ b- 6 + x[i]}
 y-1:10
 for (i in 1:10){y[i]-4 + b}
 return(y)
 }
 
 In my perfect world, R will replace b in the formula y=4+b by the
 appropriate b, indicated by the condition (value of a).

If you want perfect R world you shall us R approach.

Let's suppose you have vector a with values below and above 5. and 
vector x which you want to use for computing b

set.seed(111)
a -sample(1:10, 10)
x -runif(10)

you can compute vector b according to your condition

b - (((a5)+1)*3) + x
# I included number 5 to computing 3

and based on this you can compute y

y - 4+b

You can put it in a function if you want.

test-function(a,x) {
b - (((a5)+1)*3) + x
y - 4+b
return(y)
}

Regards
Petr


 It now takes for 'b' only the first argument of x (+3 or 6). I know I 
can
 solve the problem by also looping over b and turning it into a vector, 
but
 I would like to know if it is also possible in the way stated above. If 
I
 put 3+x[i] in  to make it a character, it will still be character at
 y-4+b or when I use as.numeric, it will create NA
 
 Thanks in advance,
 
 Julia
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Display numbers on map

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 5:37 AM, Jeffrey Joh wrote:



I have a text file with states and numbers.  I would like to display  
each number that corresponds to a state on a map.


I am trying to use the maps package, but it doesn't show Alaska or  
Hawaii.  Do you have suggestions on how to do this?


This question suggests you are not yet aware of the search facility  
built into R:


RSiteSearch(maps Hawaii Alaska )
A search query has been submitted to http://search.r-project.org
The results page should open in your browser shortly

The third hit was back to the map function help page. (The examples  
should be read and executed.)


?example



[[alternative HTML version deleted]]

And the above suggests you still need to read the following:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question: how to select a column from a dataframe in a function

2012-01-17 Thread Petr PIKAL

Hi

 
 Hi,
 
 I am creating a function and ran into the problem of selecting a column
 from a dataset. It seems as though the $ function (as in 
data$columnname)
 does not apply in the function. In simplified version:
 
 This works:
  testf2-function(data,columnnumber){print(data[,columnnumber])}
 
 But this doesn't:
 testf-function(data,column){print(data$column)}
 
 Even though the first solution works, I would like to be able to insert 
the
 columnname in the function, instead of the columnnumber. How do I do 
that?

Not sure if you get any answer yet.

testf2-function(data,columnname){print(data[,columnname])}

Petr

 
 Thank you in advance,
 
 Julia
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking dates for entry errors

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 8:02 AM, Thomas Mang wrote:


On 1/11/2012 11:07 PM, Paul Miller wrote:

Hello Everyone,

I have a question about how best to check dates for entry errors.


Try using regular expression matching and the functions grep,  
strsplit, regexpr etc.
If you are not familiar with regex: bit a bumpy road of getting into  
it but in the long term definitely worth the effort !


Agree on all points, but also simply converting to the Date class and  
using the ?Comparison operators is informative. And there may be NA's  
that will signal impossible month-day combinations which would be  
tedious to identify with regex.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] BLAS

2012-01-17 Thread Peter Langfelder

On Tue, Jan 17, 2012 at 6:06 AM, Scott Raynaud scott.rayn...@yahoo.com wrote:
 I'm setting up an Ubuntu virtual machine that will use 4-Intel Xeon CPU x5650.
 I'd like to compile R with a BLAS but the question is whcih one.  Seems
 like the only free ones are GotoBLAS which I'm not sure is being maintained
 for newer CPUs and OpenBLAS for Loongson CPUs.  I saw a favorable report
 on OpenBLAS 
 (http://www.rochester.edu/college/gradstudents/jolmsted/files/computing/BLAS_Comparison.pdf),
 but I'm not sure it's the right thing for my CPUs.  The webpage for OpenBLAS
 says, On X86 box, compile this library for loongson3a CPU.  Any opinions
 on whether this will work?  If not, any suggestions on another free BLAS?


I've been using ATLAS (http://math-atlas.sourceforge.net/) with good success.

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot- using geom_point and geom_line at the same time

2012-01-17 Thread Hadley Wickham

On Mon, Jan 16, 2012 at 6:05 PM, Mary Kindall mary.kind...@gmail.com wrote:
 Thanks for reply
 I wanted to have legend name with spaces. Right now I am using the
 following code but it produce two legends. I have to use Gimp to cut the
 redundant legend.

Your basic problem is that you're using the fill and colour
aesthetics, but you only need colour.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] visualization for k-mean clustering

2012-01-17 Thread R. Michael Weylandt michael.weyla...@gmail.com

Will depend heavily on the structure of your data (you haven't even told use te 
number of dimensions or the metric in question), but I'd suggest something like 
a scatterplot color coded by cluster with an additional marker for cluster 
means. 

Michael Weylandt

On Jan 17, 2012, at 3:48 AM, mukul purva mukul.pu...@gmail.com wrote:

 hello,
 i want a visualization of the k-mean clustering.which one method
 will be best for visualization??
 
 
 
 thnkx
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting probability density and cumulative distribution function

2012-01-17 Thread R. Michael Weylandt michael.weyla...@gmail.com

Perhaps I misunderstand you, but it sounds like all you need to do is change 
weibull to the name of another distribution 

Michael

On Jan 17, 2012, at 8:11 AM, chrisr34000 cr...@wolke7.net wrote:

 Hi!
 
 I want to plot the probability density function and the cumulative
 distribution function for the gamma, lognormal, exponential and Pareto
 distribution. I want to vary the parameters and and have a plot with 2-3
 different parameters in the same figure. It should look like this (example
 weibull distribution):
 
 http://en.wikipedia.org/wiki/File:Weibull_PDF.svg
 
 library(Cairo)
 CairoFonts(regular=DejaVu Sans:style=Regular)
 CairoSVG(Weibull PDF.svg)
 par(mar=c(3, 3, 1, 1))
 x - seq(0, 2.5, length.out=1000)
 plot(x, dweibull(x, .5), type=l, col=blue, xlab=, ylab=, xlim=c(0,
 2.5), ylim=c(0, 2.5), xaxs=i, yaxs=i)
 lines(x, dweibull(x, 1), type=l, col=red)
 lines(x, dweibull(x, 1.5), type=l, col=magenta)
 lines(x, dweibull(x, 5), type=l, col=green)
 legend(topright, legend=paste(\u03bb = 1, k =, c(.5, 1, 1.5, 5)), lwd=1,
 col=c(blue, red, magenta, green))
 dev.off()
 
 and
 
 http://en.wikipedia.org/wiki/File:Weibull_CDF.svg
 
 library(Cairo)
 CairoFonts(regular=DejaVu Sans:style=Regular)
 CairoSVG(Weibull CDF.svg)
 par(mar=c(3, 3, 1, 1))
 x - seq(0, 2.5, length.out=1000)
 plot(x, pweibull(x, .5), type=l, col=blue, xlab=, ylab=, xlim=c(0,
 2.5), ylim=c(0, 1), xaxs=i, yaxs=i)
 lines(x, pweibull(x, 1), type=l, col=red)
 lines(x, pweibull(x, 1.5), type=l, col=magenta)
 lines(x, pweibull(x, 5), type=l, col=green)
 legend(bottomright, legend=paste(\u03bb = 1, k =, c(.5, 1, 1.5, 5)),
 lwd=1, col=c(blue, red, magenta, green))
 dev.off()
 
 How do I adapt the syntax to use it for the other distributions?
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Plotting-probability-density-and-cumulative-distribution-function-tp4303198p4303198.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] visualization for k-mean clustering

2012-01-17 Thread Yihui Xie

You mean the process of clustering (the algorithm)? Have you looked at
kmeans.ani() in the animation package?

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Tue, Jan 17, 2012 at 2:48 AM, mukul purva mukul.pu...@gmail.com wrote:
 hello,
         i want a visualization of the k-mean clustering.which one method
 will be best for visualization??



 thnkx

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] boxplot with diamond shape

2012-01-17 Thread csrabak


Em 16/1/2012 08:07, David martin escreveu:

Hi,
I haven't found in R a possibility to draw a boxplot with a diamond
shape (means and CI).

David,

Perhaps, even prejudicially, as I cannot see any advantage on the 
diamond shape for displaying just two dimensions, I would recommend you 
check if plotCI from plotrix package, or even better plotmeans from 
gplots package which does it more automatically, doesn't suffice for 
your needs.


HTH,

--
Cesar Rabak

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to loop on file names

2012-01-17 Thread R. Michael Weylandt michael.weyla...@gmail.com

Inline:

Michael

On Jan 16, 2012, at 10:26 PM, HÃ©lÃ¨ne Genet hge...@alaska.edu wrote:

 Dear all,
 
 I need to do the same procedure on several files. But I don't know how to 
 refer to the file name.
 Here is an example of what I am trying to do.
 
 List of files: file1(A,B,C, D1...Dn), file2(A,B,C,E1,...,En), 
 file3(A,B,C,F1,...,Fn)
 
 Procedure I want to apply on each file:
 

 filelist - c('file1' , 'file2' , 'file3')
for(i in filellist){
df - read.csv(i)
dft - melt(df,id=c('A','B','C'))
dft$X - substr(dft$variable,1,3)
dft$Y - substr(dft$variable,4,8)
dft1 - cast(dft, A+B+C ~ Y,value=response)
write.csv(paste(done-, i, sep = ))
}
 
 As you see all the files contains the same 3 variables A,B,C that I use in 
 the procedure.
 So I want to apply the procedure on all the file in a loop.
 Something like :
 
 filelist - c('file1' , 'file2' , 'file3')
 
 for (i in 1:3) {
 filename - filelist[i]
 ...
 }
 
 Any suggestion to refer to these files in this loop?
 
 Thanks you in advance,
 
 Helene
 
 -- 
 HÃ©lÃ¨ne Genet, PhD
 Institute of Arctic Biology
 University of Alaska Fairbanks
 Irving I Blg, Room 402
 Fairbanks AK 99775
 
 Phone: 907-474-5472
 Cell: 907-699-4340
 Email: hge...@alaska.edu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reference for dataset colon (package survival)

2012-01-17 Thread Matthias Gondan

Dear R team, dear Prof. Therneau,

library(survival)
data(colon)
?colon

gives me only a very rudimentary source (only a name). Is there a 
possibility to get a reference to the clinical trial these data
are taken from?

Many thanks in advance. With best wishes,

Matthias Gondan
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] breakpoints and nonlinear regression

2012-01-17 Thread crimsonengineer87

Dear Forum,

I have been wracking my head over this problem for the past few days. I have
a dataset of (x,y). I have been able to obtain a nonlinear regression line
using nls. However, we would like to do some statistical analysis. I would
like to obtain a confidence interval for the curve. We thought we could
divide up the curve into piecewise linear regressions and compute CIs from
those portions. There is a package called strucchange that seems helpful,
but I am thoroughly confused.

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions.  I have the following in my script:

bp.pavlu - breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

But I am confused as to how to graph the piecewise functions that make up
the curve. I am not even sure if I am using breakpoints correctly. Do I just
give it a linear relationhip (Na ~ yield), instead of what I have?

Is there an easier way to calculate the confidence interval for a non-linear
regression? 

I am new to R (as I've read in many questions), but I have most certainly
tried many things and am just getting frustrated with the lack of examples
for what I'd like to do with my data... I'd appreciate any insight. I can
also provide more information if I am not clear. Thanks in advance.

Julian

--
View this message in context: 
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303629.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread Kenneth Frost

Hi, Julian-
I'm not sure if this will be what you want but you could start by taking a look
at:
?predict.nls
Ken

On 01/17/12, crimsonengineer87 julianjonre...@gmail.com wrote:
Dear Forum,

I have been wracking my head over this problem for the past few days. I have
a dataset of (x,y). I have been able to obtain a nonlinear regression line
using nls. However, we would like to do some statistical analysis. I would
like to obtain a confidence interval for the curve. We thought we could
divide up the curve into piecewise linear regressions and compute CIs from
those portions. There is a package called strucchange that seems helpful,
but I am thoroughly confused.

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions. I have the following in my script:

bp.pavlu - breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

But I am confused as to how to graph the piecewise functions that make up
the curve. I am not even sure if I am using breakpoints correctly. Do I just
give it a linear relationhip (Na ~ yield), instead of what I have?

Is there an easier way to calculate the confidence interval for a non-linear
regression?

I am new to R (as I've read in many questions), but I have most certainly
tried many things and am just getting frustrated with the lack of examples
for what I'd like to do with my data... I'd appreciate any insight. I can
also provide more information if I am not clear. Thanks in advance.

Julian

--
View this message in context:
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303629.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread Kenneth Frost

Sorry, that wasn't to helpful...I see that the intervals and se.fit argument
are currently ignored.

On 01/17/12, crimsonengineer87 julianjonre...@gmail.com wrote:
Dear Forum,

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions. I have the following in my script:

bp.pavlu - breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

Is there an easier way to calculate the confidence interval for a non-linear
regression?

Julian

--
View this message in context:
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303629.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] boxplot with diamond shape

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 10:22 AM, csrabak wrote:


Em 16/1/2012 08:07, David martin escreveu:

Hi,
I haven't found in R a possibility to draw a boxplot with a diamond
shape (means and CI).

David,

Perhaps, even prejudicially, as I cannot see any advantage on the  
diamond shape for displaying just two dimensions, I would recommend  
you check if plotCI from plotrix package, or even better plotmeans  
from gplots package which does it more automatically, doesn't  
suffice for your needs.


Along the same lines of criticizing the premise of changing boxes to  
diamonds, I thought that a violin plot might be more informative. The  
lattice implementation is quite good.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] unable to find an inherited method for function make.db.names, for signature character, missing

2012-01-17 Thread Poul Kristensen

Hi !

I am new to R and I am using Rstudio on Linux.

Have I missed some library() ?

and if so does anyone have the time to write which?

I am trying to create some PostqreSQL tables from comma separated files.

I am a bit surpriced that character is a problem. The missing value
could be NULL.


Thanks

Poul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reference for dataset colon (package survival)

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 10:48 AM, Matthias Gondan wrote:


Dear R team, dear Prof. Therneau,

library(survival)
data(colon)
?colon

gives me only a very rudimentary source (only a name). Is there a
possibility to get a reference to the clinical trial these data
are taken from?


Wouldn't this seem to be the most promising place to look?

http://www.ncbi.nlm.nih.gov/pubmed?term=lin%20d[au]%205-fu%20levamisole 
%20colon%20cancer


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread crimsonengineer87

Hi Ken,

Thx for that advice. I took a brief look at it. I already have my curve by
just using the curve() function using the parameters a and b given by the
nls. Would se.fit and interval have computed the CI?

Maybe where I'm confused is at how I can break up my curve into pieces of
linear regressions. Then doing CI's from there? 

Thanks.

Julian

--
View this message in context: 
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303763.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Prediciting sports team scores

2012-01-17 Thread kerry1912

I am working on predicitng the scores for a days worth of matches of team
sports. I have already collected data for the teams for the season we are
concentrating on. 

I have been fitting poisson models for football games and have worked out
what model is best and which predictor variables are most important.

We would now like to predict the probability distribution for the scores for
each team. eg. What is the probability of Manchester United vs Chelsea
ending 1-1?  

--
View this message in context: 
http://r.789695.n4.nabble.com/Prediciting-sports-team-scores-tp4303708p4303708.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] An unsubsettable object in a mixed model

2012-01-17 Thread Sarah Jervis

I am having problems using the /lme /command to fit mixed models. I have 
a data set similar to longitudinal data, except the hypothesised 
correlation is between observations taken from different individuals in 
the same family rather than from the same individual at different times. 
As soon as I try to specify any correlation structure other than 
independent, I get the error message /Error in x$formula : object of 
type 'closure' is not subsettable/. Have you any idea what /R /is 
objecting to and how I might fix it, and if not, can you suggest h
any other way I can get R to fit a linear mixed model? The data set is 
VERY unbalanced with almost 2/3 of the families being single 
individuals, so maybe /R/ doesn't like being told to find correlation in 
univariate data subsets. ANY advice is welcome!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread Bert Gunter

On Tue, Jan 17, 2012 at 8:06 AM, Kenneth Frost kfr...@wisc.edu wrote:
Sorry, that wasn't to helpful...I see that the intervals and se.fit argument
are currently ignored.

Yes, because the fitted values are nonlinear in the parameters, which
makes finding exact confidence regions impossible. I think the usual
approach (subject to correction by experts) is to use a delta method
approximation for the fitted variances from the varcov matrix of the
parameters at the converged optimum (itself an approximation) and then
a standard t-interval based on that. However, this approximation can
be quite bad, because degrees of freedom don't mean much for
nonlinear models -- in fact, that's the essential (and huge!)
difference between linear and nonlinear models -- and the likelihood
surface may not be close enough to quadratic. So one may do better
with, e.g. a bootstrap approximation, although this can be
problematic, too, due to convergence and other issues.

What I think can be said with some certainty is that the idea of
approximating by a segmented regression and then using CI's for each
linear part in the usual way is a particularly bad one -- the CI's
will be underestimated because they don't take into account the
uncertainty in the location of the fitted breakpoints, which are
nonlinear **and** non-smooth functions of the data.

So if confidence intervals for the fitted values are really important,
I suggest that Julian work with his local statistician to come up with
the best approach for his particular situation. It's tricky.

Cheers,
Bert

On 01/17/12, crimsonengineer87 julianjonre...@gmail.com wrote:
Dear Forum,

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions. I have the following in my script:

bp.pavlu - breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

Is there an easier way to calculate the confidence interval for a non-linear
regression?

Julian

--
View this message in context:
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303629.html
Sent from the R help mailing list archive at Nabble.com.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Re: [R] Prediciting sports team scores

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 10:55 AM, kerry1912 wrote:

I am working on predicitng the scores for a days worth of matches of  
team
sports. I have already collected data for the teams for the season  
we are

concentrating on.

I have been fitting poisson models for football games and have  
worked out

what model is best and which predictor variables are most important.

We would now like to predict the probability distribution for the  
scores for

each team. eg. What is the probability of Manchester United vs Chelsea
ending 1-1?


This certainly sounds like homework. Please read the relevant section  
in the Posting Guide.




--
View this message in context: 
http://r.789695.n4.nabble.com/Prediciting-sports-team-scores-tp4303708p4303708.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Prediciting sports team scores

2012-01-17 Thread Albyn Jones


Robin Lock at St Lawrence has done this for hockey, see
http://it.stlawu.edu/~chodr/faq.html

As I recall, he has a poisson regression model with parameters for  
offense and defense, and perhaps home 'field' advantage.


I confess I am skeptical that this is the right approach for football  
- teams adjust their strategy and tactics as a function of the  
opponent and the current match score.  Teams are trying to maximize  
the probability of getting a result, not the probability of scoring  
goals.  The poisson model corresponds to a constant rate for scoring.


albyn

Quoting kerry1912 kerry1...@hotmail.com:


I am working on predicitng the scores for a days worth of matches of team
sports. I have already collected data for the teams for the season we are
concentrating on.

I have been fitting poisson models for football games and have worked out
what model is best and which predictor variables are most important.

We would now like to predict the probability distribution for the scores for
each team. eg. What is the probability of Manchester United vs Chelsea
ending 1-1?

--
View this message in context:  
http://r.789695.n4.nabble.com/Prediciting-sports-team-scores-tp4303708p4303708.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 stacked bar - sum of values rather than count

2012-01-17 Thread J Toll

On Mon, Jan 16, 2012 at 1:13 AM, Paul p...@paulhurley.co.uk wrote:
 On 16/01/12 02:08, J Toll wrote:
 Hi,

 I'm trying to create a stacked bar plot using ggplot2. Rather than
 plotting the count of each of the 13 Bar factors on the Y axis, I
 would like to represent the sum of the Values associated with each of
 the 13 Bar factors. Is there a way to do that?  Given the following
 data, that would obviously mean that there would be some negative sums
 represented.  Here's a bit of example data along with the command I've
 been using.

 library(ggplot2)
 x
           Value Bar Segment
 1   1.10020075   1       1
 2 -1.37734577   2       1
 3   2.50702876 3       1
 4 0.58737028   3       2
 5   0.21106851   3       3
 6  -2.50119261   4       1
 7 1.34984831   5       1
 8 -0.27556149   6       1
 9 -1.54401647   6       2
 10 -2.75975562   6       3
 11 -0.09527123   6       4
 12 1.36331646   7       1
 13 -0.36051429   8       1
 14 1.36790999   9       1
 15 0.15064633   9       2
 16 0.34022421   9       3
 17 -0.64512970  10       1
 18 0.83268199  11       1
 19 -1.50117728  12       1
 20  1.09004959  13       1
 qplot(factor(Bar), data = x, geom = bar, fill = factor(Segment))
 Thanks for any suggestions you might have.

 James

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 I'm not at my usual computer, but in ggplot2 the geom_bar geom is
 designed for stats, so by default does report count.  You can change the
 stat method (ie to identity to just use the values in the column) but
 the easiest (IIRC) is to give a weight;

 #Gives count
 ||

 qplot  http://had.co.nz/ggplot2/qplot.html(color,  data=diamonds,  
 geom=bar)

 #Gives sum of carat variable
 qplot  http://had.co.nz/ggplot2/qplot.html(color,  data=diamonds,  
 geom=bar,  weight=carat,  ylab=carat)

 #just gives raw values from meanprice column
 ||

 qplot  http://had.co.nz/ggplot2/qplot.html(cut,  meanprice,  geom=bar,  
 stat=identity)

 Check out the ggplot2 help page (http://had.co.nz/ggplot2/geom_bar.html) for 
 more info.

 Regards,

 Paul.



Paul,

Thank you for the help.  Using your second example, I added weight =
Value to my previous command to get the chart I wanted.

qplot(factor(Bar), data=x, geom=bar, weight = Value, fill=factor(Segment))

ggplot2 issues a warning message with my data because it has negative values:
Warning message:
Stacking not well defined when ymin != 0 

But that's not a real concern.

Anyway, thanks again for your help.


James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] net classification improvement?

2012-01-17 Thread Kevin E. Thorpe


On 01/17/2012 07:16 AM, Essers, Jonah wrote:

Thanks for the reply. I think more the issue is whether it can be applied
to cross-sectional data. This I'm not sure. This method is heavily cited
in the New England Journal of Medicine, but thus far I've only seen it
used with longitudinal data.


As I recall, the Pencina et al paper does not suggest it cannot be used 
outside of longitudinal data.  In fact, I don't remember them using 
longitudinal data at all.  So, unless I'm misunderstanding your 
question, I think the function in Hmisc (whose name I always forget) 
should be fine.




On 1/16/12 10:23 PM, Kevin E. Thorpekevin.tho...@utoronto.ca  wrote:


On 01/16/2012 08:10 PM, Essers, Jonah wrote:

Greetings,

I have generated several ROC curves and would like to compare the AUCs.
The data are cross sectional and the outcomes are binary. I am testing
which of several models provide the best discrimination. Would it be
most
appropriate to report AUC with 95% CI's?

I have been looking in to the net reclassification improvement (see
below for reference) but thus far I can only find a version in Hmisc
package which requires survival data. Any idea what the best approach is
for cross-sectional data?


I believe that the function in Hmisc that does this will also work on
binary data.



Thanks

Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating the
added predictive ability of a new marker: from area under the ROC curve
to
reclassification and beyond. Stat Med 2008;27:157-172






--
Kevin E. Thorpe
Biostatistician/Trialist,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] net classification improvement?

2012-01-17 Thread Essers, Jonah

Actually, I don't think I made myself clear and I wrote this late last
nightSorry. More the issue is that the raw model predictions (from 0
to 1) have no inherent clinical value to them. I.e. They aren't risk of
disease or risk of outcome. They are raw scores that are specific to
each model and are meant to discriminate one disease from another disease.
Trying to compare models is impossible because the NRI requires cutoff
values.  The cutoffs are different for each model.

So, as I've done more reading, it appears the the IRI--Integrated
Discrimination Improvement Index--which is naïve to cutoff values--may be
more what I'm looking for. Does this make sense? I guess I just need a
sanity check.

I have been toying with the PredictABEL package and this seems to like my
data inputs just fine and relies on HMISC and ROCR, both packages I know
well.

Thanks
jonah

On 1/17/12 11:49 AM, Kevin E. Thorpe kevin.tho...@utoronto.ca wrote:

On 01/17/2012 07:16 AM, Essers, Jonah wrote:
 Thanks for the reply. I think more the issue is whether it can be
applied
 to cross-sectional data. This I'm not sure. This method is heavily cited
 in the New England Journal of Medicine, but thus far I've only seen it
 used with longitudinal data.

As I recall, the Pencina et al paper does not suggest it cannot be used
outside of longitudinal data.  In fact, I don't remember them using
longitudinal data at all.  So, unless I'm misunderstanding your
question, I think the function in Hmisc (whose name I always forget)
should be fine.


 On 1/16/12 10:23 PM, Kevin E. Thorpekevin.tho...@utoronto.ca  wrote:

 On 01/16/2012 08:10 PM, Essers, Jonah wrote:
 Greetings,

 I have generated several ROC curves and would like to compare the
AUCs.
 The data are cross sectional and the outcomes are binary. I am testing
 which of several models provide the best discrimination. Would it be
 most
 appropriate to report AUC with 95% CI's?

 I have been looking in to the net reclassification improvement (see
 below for reference) but thus far I can only find a version in Hmisc
 package which requires survival data. Any idea what the best approach
is
 for cross-sectional data?

 I believe that the function in Hmisc that does this will also work on
 binary data.


 Thanks

 Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating
the
 added predictive ability of a new marker: from area under the ROC
curve
 to
 reclassification and beyond. Stat Med 2008;27:157-172




-- 
Kevin E. Thorpe
Biostatistician/Trialist,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Which date format to choose?

2012-01-17 Thread Jake Beaulieu

R offers a bewildering array of options when it comes to representing 
dates and times (e.g, as.Date, chron, strptime, zoo, etc).  Can anybody 
recommend a document that compares the relative merit of each method?  I'm 
not looking for help with any one method, but rather a guide that 
describes which method is best for a particular data analysis/plotting 
goal.

Thanks,
Jake


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] net classification improvement?

2012-01-17 Thread Kevin E. Thorpe


On 01/17/2012 11:55 AM, Essers, Jonah wrote:

Actually, I don't think I made myself clear and I wrote this late last
nightSorry. More the issue is that the raw model predictions (from 0
to 1) have no inherent clinical value to them. I.e. They aren't risk of
disease or risk of outcome. They are raw scores that are specific to
each model and are meant to discriminate one disease from another disease.
Trying to compare models is impossible because the NRI requires cutoff
values.  The cutoffs are different for each model.

So, as I've done more reading, it appears the the IRI--Integrated
Discrimination Improvement Index--which is naïve to cutoff values--may be
more what I'm looking for. Does this make sense? I guess I just need a
sanity check.


Yes, the IRI makes sense to me.



I have been toying with the PredictABEL package and this seems to like my
data inputs just fine and relies on HMISC and ROCR, both packages I know
well.

Thanks
jonah

On 1/17/12 11:49 AM, Kevin E. Thorpekevin.tho...@utoronto.ca  wrote:


On 01/17/2012 07:16 AM, Essers, Jonah wrote:

Thanks for the reply. I think more the issue is whether it can be
applied
to cross-sectional data. This I'm not sure. This method is heavily cited
in the New England Journal of Medicine, but thus far I've only seen it
used with longitudinal data.


As I recall, the Pencina et al paper does not suggest it cannot be used
outside of longitudinal data.  In fact, I don't remember them using
longitudinal data at all.  So, unless I'm misunderstanding your
question, I think the function in Hmisc (whose name I always forget)
should be fine.



On 1/16/12 10:23 PM, Kevin E. Thorpekevin.tho...@utoronto.ca   wrote:


On 01/16/2012 08:10 PM, Essers, Jonah wrote:

Greetings,

I have generated several ROC curves and would like to compare the
AUCs.
The data are cross sectional and the outcomes are binary. I am testing
which of several models provide the best discrimination. Would it be
most
appropriate to report AUC with 95% CI's?

I have been looking in to the net reclassification improvement (see
below for reference) but thus far I can only find a version in Hmisc
package which requires survival data. Any idea what the best approach
is
for cross-sectional data?


I believe that the function in Hmisc that does this will also work on
binary data.



Thanks

Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating
the
added predictive ability of a new marker: from area under the ROC
curve
to
reclassification and beyond. Stat Med 2008;27:157-172








--
Kevin E. Thorpe
Biostatistician/Trialist,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New PLYR issue

2012-01-17 Thread Jeff Newmiller

Replying to old messages without including context (particularly old ones) 
is rather bad netiquette.


Thank you for at least providing a reproducible example. Now if you can 
figure out how to read the documentation we will really make some 
progress.


Further responses below.

On Tue, 17 Jan 2012, Gunnar Oehmichen wrote:



Hello everyone,

I have got the same problem, with the same error message.


I wasn't able to draw a comparison between the problems, though the error 
messages were the same.



Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP

The plyr mailing list does not provide any help until now.

require(plyr)

c(sample(c(1:100), 50, replace=TRUE))-V1


Much better to use  -  than - for clarity of code (spaces and 
direction of assignment make a difference for readability)



c(rep( 1:5, 10))-f1 #variable to group V1

data.frame(cbind(V1, f1))-DF

str(DF)

ddply(DF$V1, DF$f1, sd)
ddply(.(DF$V1), .(DF$f1), sd)

Error in if (empty(.data)) return(.data) :
missing value where TRUE/FALSE needed

Thanks everyone,


If you hand a toothpick to a mechanic you should not be surprised when he 
tells you he cannot change a tire from your car.  You are giving a vector 
where a data frame is needed, another vector where a name or vector of 
names are required, and the name of a function where an actual function 
is needed, and the function is complaining. In the face of such confusion, 
it is not surprising that people were unable to figure out where to start 
setting you straight.  However, in return for your reproducible example I 
will give it a go.


A basic unifying concept for the plyr package is that the name of the 
function tells you something about what needs to go in, and what will come 
out. ddply starts with a d so it expects a data frame as input, and 
because the second letter is also a d it will yield a data frame result 
when it is done.


Argument 1:

DF$V1 is a vector. It happens to be the the column named V1 in the data 
frame DF.  To specify a data frame, don't apply operators to it, just 
write the name of the data frame DF.


Argument 2:

This argument tells ddply what the name of the grouping columns are. Do 
not actually give the grouping columns to ddply (which $ does).  I have 
found that while the .() function seems cleaner, I find it clearer to use 
a vector of strings ... in this case, there is only one grouping column, 
so I would forego the usual c() concatenator and just give it f1.


Argument 3:

This argument is supposed to be a function that will take a data frame 
(first d) and yield a data frame (second d) for one group of rows.  ddply 
will take care of stacking them as a single data frame for the final 
result.  You have given ddply the name (first error) of a function that 
takes a vector and returns a scalar (wrong type of function is error two).


The correct documentation for all of these arguments can be found by 
typing ?ddply at the R command line (after you have loaded plyr).  It 
looks like you have been reading the documentation for ?aggregate or 
?summaryBy (doBy package) and trying to use that to inform your use of 
ddply.


So the actual call should be:


ddply(DF,f1,function(df){data.frame(sdV1=sd(df$V1))})

  f1 sdV1
1  1 19.93016
2  2 35.96356
3  3 33.30349
4  4 26.62831
5  5 25.03087

In general, to add more simultaneous calculations, you add more columns to 
the data frame produced by your function that does the calculations. If 
you want to give it a function name, don't put it in quotes:



myfunction - function(df){

+  data.frame(sdV1=sd(df$V1),meanV1=mean(df$V1))
+ }

ddply(DF,f1,myfunction)

  f1 sdV1 meanV1
1  1 19.93016   49.1
2  2 35.96356   45.6
3  3 33.30349   44.7
4  4 26.62831   72.2
5  5 25.03087   30.1

Note that although ddply does a lot for you, it doesn't reproduce all of 
your calculations on all of the data columns like summaryBy does... you 
have to explicitly create every calculated column in your function.


---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using Aggregate() with FUN arguments, which require more than one input variables

2012-01-17 Thread RNoob

Dear all,

I am trying to apply the aggregate() function to calculate correlations for
subsets of a dataframe. My argument x is supposed to consist of 2 numerical
vectors, which represent x and y for the cor() function. 

The following error results when calling the aggregate function: Error in
FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think
the subsets aggregate puts into cor() are sort of list types and therefore
can't be handled by cor().

Can anyone provide me with a solution?

Regards,
RNoob

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4303936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which date format to choose?

2012-01-17 Thread Duncan Murdoch


On 17/01/2012 12:14 PM, Jake Beaulieu wrote:

R offers a bewildering array of options when it comes to representing
dates and times (e.g, as.Date, chron, strptime, zoo, etc).  Can anybody
recommend a document that compares the relative merit of each method?  I'm
not looking for help with any one method, but rather a guide that
describes which method is best for a particular data analysis/plotting
goal.


You could try /R Help Desk: Date and Time Classes in R / by Gabor 
Grothendieck and Thomas Petzoldt in R News 4(1) 
http://CRAN.R-project.org/doc/Rnews/Rnews_2004-1.pdf, 29-32.  Here's 
the link:

http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which date format to choose?

2012-01-17 Thread R. Michael Weylandt

R offers a bewildering array of options when it comes to representing
dates and times

Yes  no: read www.r-project.org/doc/Rnews/Rnews_2004-1.pdf (the help
desk section)

Brief summary: 3 major ways to deal with dates/times in R:

i ) the Date class from the base distribution -- no time support, but very easy
ii) the chron (package) -- no support for time zones but can do times of day
iii) POSIXt (in two forms) -- the most general -- handles time zones
and daylights savings, but has a few quirks

General rule: use the simplest one you can but no simpler.

Where things get more complicated is in the time series object itself:

zoo is very general (doesn't actually even require a time object for
the index) and its derivative xts is my personal workhorse. I can only
speak from a quant-finance perspective but, in that domain, the
decision comes down to xts from quantmod (actually standalone but
tightly integrated) vs timeDate from Rmetrics. They are both very good
-- one is S3 and one is S4 so they have different virtues; I'm not an
S4 guy myself so that drives me to the xts choice.  xts is pretty much
impossible to beat for speed though if that's a factor (it uses POSIXt
objects for the index and all sorts of great C routines) If speed
isn't a concern, I'd suggest you see whichever one has better support
for what you are trying to do and to make your decision based on that.
Rmetrics is an extensive platform for analysis and econometrics, but
the quantmod/quantstrat toolkit is more geared towards a trader (at
least, in my impression)

Others will hopefully chime in but it's going to be alot easier if you
can say a little more about your problem domain and what sort of
analysis you want to run. It also might behoove you to look at the
timeSeries CRAN task view.

Hope this helps,

Michael

On Tue, Jan 17, 2012 at 12:14 PM, Jake Beaulieu
beaulieu.j...@epamail.epa.gov wrote:
 R offers a bewildering array of options when it comes to representing
 dates and times (e.g, as.Date, chron, strptime, zoo, etc).  Can anybody
 recommend a document that compares the relative merit of each method?  I'm
 not looking for help with any one method, but rather a guide that
 describes which method is best for a particular data analysis/plotting
 goal.

 Thanks,
 Jake


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Separate ablines in lattice panels

2012-01-17 Thread Doran, Harold

Searched archives and found some old email threads on the topic. But mot 
exactly what I think I need. Suppose I have a datafile such as tmp.

tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 = gl(2, 1000))

I'd like a plot similar to the one below, but with an abline of v=0 in the 
lower panel and v=1 in the upper panel. Code below creates two lines in each 
panel, not quite sure how to separate them by panel.

densityplot(~ var1|var2, tmp,
  type = c('g', 'l'),
  layout = c(1,2),
 panel = function(x, ...){
 panel.densityplot(x, ...)
 panel.abline(v = c(0,1))
 }
)

Thank you
Harold

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New PLYR issue

2012-01-17 Thread Hadley Wickham

 Note that although ddply does a lot for you, it doesn't reproduce all of
 your calculations on all of the data columns like summaryBy does... you have
 to explicitly create every calculated column in your function.

Well, ddply doesn't, but colwise will.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separate ablines in lattice panels

2012-01-17 Thread Bert Gunter

?panel.number

This tells you what panel you're in and you can use that to determine
which line to draw.

-- Bert

On Tue, Jan 17, 2012 at 9:59 AM, Doran, Harold hdo...@air.org wrote:
 Searched archives and found some old email threads on the topic. But mot 
 exactly what I think I need. Suppose I have a datafile such as tmp.

 tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 = gl(2, 
 1000))

 I'd like a plot similar to the one below, but with an abline of v=0 in the 
 lower panel and v=1 in the upper panel. Code below creates two lines in each 
 panel, not quite sure how to separate them by panel.

 densityplot(~ var1|var2, tmp,
                              type = c('g', 'l'),
                              layout = c(1,2),
                                             panel = function(x, ...){
                                             panel.densityplot(x, ...)
                                             panel.abline(v = c(0,1))
                                             }
 )

 Thank you
 Harold

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread Achim Zeileis


On Tue, 17 Jan 2012, crimsonengineer87 wrote:


Dear Forum,

I have been wracking my head over this problem for the past few days. I have
a dataset of (x,y). I have been able to obtain a nonlinear regression line
using nls. However, we would like to do some statistical analysis. I would
like to obtain a confidence interval for the curve. We thought we could
divide up the curve into piecewise linear regressions and compute CIs from
those portions. There is a package called strucchange that seems helpful,
but I am thoroughly confused.

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions.  I have the following in my script:

bp.pavlu - breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

But I am confused as to how to graph the piecewise functions that make up
the curve. I am not even sure if I am using breakpoints correctly. Do I just
give it a linear relationhip (Na ~ yield), instead of what I have?


breakpoints() currently can just handle linear (in parameters) 
regressions. So unless f(., a, b) is either known or can be written as a 
linear predictor, breakpoints() cannot estimate breaks in the model of 
interest.


If you want approximate f(., a, b) by a piecewise linear function, then 
you would use breakpoints(Na ~ yield). The result however will typically 
not be continuous. To see the result fitted() can be used. See the 
references in ?breakpoints for some examples.


However, I doubt that this is a route worth pursuing given your problem 
description...



Is there an easier way to calculate the confidence interval for a non-linear
regression?


If you want to use nls(), you could use simulation techniques to obtain 
confidence intervals.


Another possible alternative would be to use a GAM formulation. See e.g. 
gam() in package mgcv.


hth,
Z


I am new to R (as I've read in many questions), but I have most certainly
tried many things and am just getting frustrated with the lack of examples
for what I'd like to do with my data... I'd appreciate any insight. I can
also provide more information if I am not clear. Thanks in advance.

Julian

--
View this message in context: 
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303629.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using Aggregate() with FUN arguments, which require more than one input variables

2012-01-17 Thread Uwe Ligges




On 17.01.2012 18:10, RNoob wrote:

Dear all,

I am trying to apply the aggregate() function to calculate correlations for
subsets of a dataframe. My argument x is supposed to consist of 2 numerical
vectors, which represent x and y for the cor() function.

The following error results when calling the aggregate function: Error in
FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think
the subsets aggregate puts into cor() are sort of list types and therefore
can't be handled by cor().



as.matrix() will probably help, but since you have not specified your 
reproducible code, we cannot show how to change that.


Uwe Ligges



Can anyone provide me with a solution?

Regards,
RNoob

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4303936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread Achim Zeileis

On Tue, 17 Jan 2012, Bert Gunter wrote:

On Tue, Jan 17, 2012 at 8:06 AM, Kenneth Frost kfr...@wisc.edu wrote:

Sorry, that wasn't to helpful...I see that the intervals and se.fit argument
are currently ignored.

I fully agree with Bert that, in this case, segmented regression does not
seem to be a fruitful approach and that it's best to consult a local

statistician.

However, I just wanted to clarify a theoretical detail about what
breakpoints() does. The breakpoints converge at the faster rate of n
while the parameter estimates just converge with sqrt(n). This is why in
principle, it is possible to get the usual inference from segmented
regressions. The price for this is to assume that the true model is in
fact a segmented regression (with only breakpoints/coefficients unknown).

Hence, segmented regression will be useful (in the Tukey
sense) if there are few relatively abrupt changes in a regression
relationship. On the other hand, for approximating smooth changes there
are typically better techniques available.

Best,
Z

Cheers,
Bert

On 01/17/12, crimsonengineer87 julianjonre...@gmail.com wrote:

Dear Forum,

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions. I have the following in my script:

bp.pavlu - breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

Is there an easier way to calculate the confidence interval for a non-linear
regression?

Julian

--
View this message in context:
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4303629.html
Sent from the R help mailing list archive at Nabble.com.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] Separate ablines in lattice panels

2012-01-17 Thread Doran, Harold

Thank you, Bert. The help page doesn't have a usage example and I can't seem to 
find one via google. Do you, or anyone else, have sample code?

 -Original Message-
 From: Bert Gunter [mailto:gunter.ber...@gene.com]
 Sent: Tuesday, January 17, 2012 1:07 PM
 To: Doran, Harold
 Cc: r-help@r-project.org
 Subject: Re: [R] Separate ablines in lattice panels

 ?panel.number

 This tells you what panel you're in and you can use that to determine
 which line to draw.

 -- Bert

 On Tue, Jan 17, 2012 at 9:59 AM, Doran, Harold hdo...@air.org wrote:
  Searched archives and found some old email threads on the topic. But mot
 exactly what I think I need. Suppose I have a datafile such as tmp.

  tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 = gl(2,
 1000))

  I'd like a plot similar to the one below, but with an abline of v=0 in the
 lower panel and v=1 in the upper panel. Code below creates two lines in each
 panel, not quite sure how to separate them by panel.

  densityplot(~ var1|var2, tmp,
                               type = c('g', 'l'),
                               layout = c(1,2),
                                              panel = function(x, ...){
                                              panel.densityplot(x, ...)
                                              panel.abline(v = c(0,1))
                                              }
  )

  Thank you
  Harold

         [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
 biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] An unsubsettable object in a mixed model

2012-01-17 Thread Ben Bolker

Sarah Jervis sj414 at medschl.cam.ac.uk writes:

 
 I am having problems using the /lme /command to fit mixed models. I have 
 a data set similar to longitudinal data, except the hypothesised 
 correlation is between observations taken from different individuals in 
 the same family rather than from the same individual at different times. 
 As soon as I try to specify any correlation structure other than 
 independent, I get the error message /Error in x$formula : object of 
 type 'closure' is not subsettable/. Have you any idea what /R /is 
 objecting to and how I might fix it, and if not, can you suggest h
 any other way I can get R to fit a linear mixed model? The data set is 
 VERY unbalanced with almost 2/3 of the families being single 
 individuals, so maybe /R/ doesn't like being told to find correlation in 
 univariate data subsets. ANY advice is welcome!

  It would help if you could provide a reproducible example, or
a test case.  You're also probably better off posting this to
the r-sig-mixed-mo...@r-project.org mailing list.  I don't know if
lme will choke when a correlation model is fitted to a data set where
some groups have only one individual, but you could (e.g.) very easily
test if this is the problem by (1) fitting the model with only groups
with n1 and (2) adding one group with n=1 to the data set and seeing
if it chokes at that point.
  However, I can also imagine that you're mis-specifying the command
in some point, and from that point of view it would be best to have
some more detail about what you're trying to do.  See
http://tinyurl.com/reproducible-000 ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bayesian mixed logit

2012-01-17 Thread Carlo Fezzi (ENV)

Dear all,

I am writing an R code to fit a Bayesian mixed logit (BML) via MCMC / MH 
algorithms following Train (2009, ch. 12).

Unfortunately, after many draws the covariance matrix of the correlated random 
parameters tend to become a matrix with almost perfect correlation, so I think 
there is a bug in the code I wrote but I do not seem to be able to find it.. 
dull I know.

Has anybody written a code for BML with R and would like to share it with me or 
even take a quick look at my code? I would be extremely grateful for any help.

Many thanks to everybody!

Carlo

***
Senior Research Associate

Centre for Social and Economic Research 
on the Global Environment (CSERGE),

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separate ablines in lattice panels

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 1:34 PM, Doran, Harold wrote:

Thank you, Bert. The help page doesn't have a usage example and I  
can't seem to find one via google. Do you, or anyone else, have  
sample code?


It did not seem particularly daring or complex when I tried this  
(which does appear to produce what was requested):


tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 = gl(2,
1000))

densityplot(~ var1|var2, tmp,
 type = c('g', 'l'),
 layout = c(1,2),
panel = function(x, ...){
panel.densityplot(x, ...)
panel.abline(v = c(0,1) 
[ panel.number() ])

}
)

--
David.


-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com]
Sent: Tuesday, January 17, 2012 1:07 PM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] Separate ablines in lattice panels

?panel.number

This tells you what panel you're in and you can use that to determine
which line to draw.

-- Bert

On Tue, Jan 17, 2012 at 9:59 AM, Doran, Harold hdo...@air.org  
wrote:
Searched archives and found some old email threads on the topic.  
But mot

exactly what I think I need. Suppose I have a datafile such as tmp.


tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 =  
gl(2,

1000))


I'd like a plot similar to the one below, but with an abline of  
v=0 in the
lower panel and v=1 in the upper panel. Code below creates two  
lines in each

panel, not quite sure how to separate them by panel.


densityplot(~ var1|var2, tmp,
 type = c('g', 'l'),
 layout = c(1,2),
panel =  
function(x, ...){
 
panel.densityplot(x, ...)

panel.abline(v = c(0,1))
}
)

Thank you
Harold

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separate ablines in lattice panels

2012-01-17 Thread Doran, Harold

It does indeed produce what I'm expecting. The input to panel.number seems to 
require a character string, but here the function is called with no argument. I 
am not entirely clear _why_ it works, but it does seem to. 

 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net]
 Sent: Tuesday, January 17, 2012 1:46 PM
 To: Doran, Harold
 Cc: Bert Gunter; r-help@r-project.org
 Subject: Re: [R] Separate ablines in lattice panels

 On Jan 17, 2012, at 1:34 PM, Doran, Harold wrote:

  Thank you, Bert. The help page doesn't have a usage example and I
  can't seem to find one via google. Do you, or anyone else, have
  sample code?

 It did not seem particularly daring or complex when I tried this
 (which does appear to produce what was requested):

 tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 = gl(2,
 1000))

 densityplot(~ var1|var2, tmp,
   type = c('g', 'l'),
   layout = c(1,2),
  panel = function(x, ...){
  panel.densityplot(x, ...)
  panel.abline(v = c(0,1)
 [ panel.number() ])
  }
  )

 --
 David.

  -Original Message-
  From: Bert Gunter [mailto:gunter.ber...@gene.com]
  Sent: Tuesday, January 17, 2012 1:07 PM
  To: Doran, Harold
  Cc: r-help@r-project.org
  Subject: Re: [R] Separate ablines in lattice panels

  ?panel.number

  This tells you what panel you're in and you can use that to determine
  which line to draw.

  -- Bert

  On Tue, Jan 17, 2012 at 9:59 AM, Doran, Harold hdo...@air.org
  wrote:
  Searched archives and found some old email threads on the topic.
  But mot
  exactly what I think I need. Suppose I have a datafile such as tmp.

  tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 =
  gl(2,
  1000))

  I'd like a plot similar to the one below, but with an abline of
  v=0 in the
  lower panel and v=1 in the upper panel. Code below creates two
  lines in each
  panel, not quite sure how to separate them by panel.

  densityplot(~ var1|var2, tmp,
   type = c('g', 'l'),
   layout = c(1,2),
  panel =
  function(x, ...){

  panel.densityplot(x, ...)
  panel.abline(v = c(0,1))
  }
  )

  Thank you
  Harold

 [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.

  --

  Bert Gunter
  Genentech Nonclinical Biostatistics

  Internal Contact Info:
  Phone: 467-7374
  Website:
  http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
  biostatistics/pdb-ncb-home.htm

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separate ablines in lattice panels

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 1:52 PM, Doran, Harold wrote:

It does indeed produce what I'm expecting. The input to panel.number  
seems to require a character string, but here the function is called  
with no argument. I am not entirely clear _why_ it works, but it  
does seem to.


?panel.numer

Says that there is a default ... the last object printed, i.e, the one  
for which you would want its number used as an idex into your vector  
of candidate values.


--
David.





-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Tuesday, January 17, 2012 1:46 PM
To: Doran, Harold
Cc: Bert Gunter; r-help@r-project.org
Subject: Re: [R] Separate ablines in lattice panels


On Jan 17, 2012, at 1:34 PM, Doran, Harold wrote:


Thank you, Bert. The help page doesn't have a usage example and I
can't seem to find one via google. Do you, or anyone else, have
sample code?


It did not seem particularly daring or complex when I tried this
(which does appear to produce what was requested):

tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 =  
gl(2,

1000))

densityplot(~ var1|var2, tmp,
 type = c('g', 'l'),
 layout = c(1,2),
panel = function(x, ...){
panel.densityplot(x, ...)
panel.abline(v = c(0,1)
[ panel.number() ])
}
)

--
David.


-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com]
Sent: Tuesday, January 17, 2012 1:07 PM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] Separate ablines in lattice panels

?panel.number

This tells you what panel you're in and you can use that to  
determine

which line to draw.

-- Bert

On Tue, Jan 17, 2012 at 9:59 AM, Doran, Harold hdo...@air.org
wrote:

Searched archives and found some old email threads on the topic.
But mot

exactly what I think I need. Suppose I have a datafile such as tmp.


tmp - data.frame(var1 = c(rnorm(1000), rnorm(1000, 1, 1)), var2 =
gl(2,

1000))


I'd like a plot similar to the one below, but with an abline of
v=0 in the

lower panel and v=1 in the upper panel. Code below creates two
lines in each
panel, not quite sure how to separate them by panel.


densityplot(~ var1|var2, tmp,
type = c('g', 'l'),
layout = c(1,2),
   panel =
function(x, ...){

panel.densityplot(x, ...)
   panel.abline(v =  
c(0,1))

   }
)

Thank you
Harold

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

guide.html

and provide commented, minimal, self-contained, reproducible code.




--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] pretty(range(data$z),10) Error

2012-01-17 Thread Mario Giesel

Hello, R-List,
I'm getting error messages when adding geom_density2d() [package ggplot2]:

Fehler in pretty(range(data$z), 10) : 
  NA/NaN/Inf in externem Funktionsaufruf (arg 1)
Zusätzlich: Warnmeldungen:
1: Removed 1 rows containing missing values (stat_contour). 
2: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück
3: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück
4: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück
5: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück

 
I installed 2.11.1 Patched in the hope to overcome this obstacle but in vain.
I tried reading spss and csv format - no effect.
 
While this one works:
ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ segment,ncol=3)
This one fails:
ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ segment,ncol=3) 
+ geom_density2d()
 
But it only fails for 2 out of 10 x-variables, although they do not differ in 
structure.
Data structure looks like this:
 
segment Qq01 Qq02 Qq02a Qq04 Qq05 Qq07 Qq08 Qq10 Qq11 Qq15 Qq17a
A 7 5 5 5 5 4 5 5 3 7 7
A 5 4 4 3 4 3 4 4 4 5 6
B 3 3 3 3 2 3 4 2 4 3 2
B 6 4 5 3 3 3 4 4 3 6 6
C 3 3 3 3 3 4 2 3 3 4 2
C 2 1 4 3 1 4 1 1 3 4 2
D 5 3 4 3 3 3 4 3 3 6 3
D 5 3 4 2 2 4 4 3 3 4 4
E 3 3 3 2 3 4 4 3 2 3 4
E 7 5 5 5 5 4 4 5 1 7 7
F 6 5 4 3 3 3 4 3 3 6 6
F 5 3 4 3 3 4 4 3 3 4 4
G 4 3 3 3 2 1 3 3 3 4 4
G 6 5 5 5 5 4 5 5 3 7 7
H 6 5 5 4 4 4 5 4 3 5 6
H 7 5 5 5 4 4 5 5 3 7 7
I 6 5 5 4 4 4 5 4 3 6 6
I 6 3 4 2 3 4 4 3 4 6 4
J 5 3 4 1 3 4 4 3 3 5 5
J 4 2 4 3 2 2 3 2 3 4 4
K 4 4 4 2 3 4 4 4 3 5 5
K 6 4 3 3 2 3 4 3 3 6 5
L 6 4 5 3 3 4 5 4 3 6 6
L 3 1 2 2 1 3 3 1 3 4 4
...
About 1.800 cases (more than 100 per segment).
Did anybody experience similar problems?
 
Thanks for all comments,
 Mario
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RTisean generating multivariate surrogates;

2012-01-17 Thread Burton Shank

I have a question on generating multivariate time series surrogates
using the surrogates function in the RTisean library.
The surrogate data matrices are always much shorter than the input matrices.

FYI, I'm using R version 2.12.2 on Windows XP
RTisean library v 3.0.14
Tisean algorithms v 3.0.13

Creating a surrogate univariate time series returns a time series with
the length of the original vector.

 V1=as.numeric(na.omit(filter(rnorm(103), filter=c(0.25, 0.5, 0.5, 0.25), 
 sides=2)))
 S=surrogates(series=V1); nrow(S)
[1] 100

However, multivariate surrogates are always shorter than the input matrices.

 V1=as.numeric(na.omit(filter(rnorm(103), filter=c(0.5, 0.5), sides=2)))
 V2=as.numeric(na.omit(filter(rnorm(103), filter=c(0.5, 0.5), sides=2)))
 V=cbind(V1, V2)
 S=surrogates(series=V,  m=2, c=1:2); dim(S)
[1] 16  2

One can cheat a little and repeat the datasets, returning a longer output:
 V1=as.numeric(na.omit(filter(rnorm(103), filter=c(0.5, 0.5), sides=2)))
 V2=as.numeric(na.omit(filter(rnorm(103), filter=c(0.5, 0.5), sides=2)))
 V=cbind(V1, V2, V1, V2)
 S=surrogates(series=V,  m=4, c=1:4); dim(S)
[1] 25  4

but the limit appears to be ~50% of the length of the input matrix.

There probably is a legitimate reason behind this but I can't find
mention of this either in the RTisean documentation or the TISEAN
website.
Similarly, examples in the literature from the authors (Schreiber and
Schmitz. 2000. Surrotate time series. Physica D 142.346-382) do not
appear to suffer from this truncation.

Varying the arguments for the surrogates function do not fix the
truncation and some of the arguments don't seem to be functional:

surrogates {RTisean}R Documentation

surrogates(series, n = 1, i, S = FALSE, I, l, x = 0, m, c)

Arguments:
series  a vector or a matrix.
n   number of surrogates.
i   number of iterations.
S   make spectrum exact rather than distribution.
I   seed for random numbers. (capital i)
l   number of points. (lowercase L)
x   number of values to be skipped.
m   number of columns to be read.
c   columns to be read.
---

In particular,  varying n changes the processing time but does not
change the function output, making me think that this is an issue with
the way the R function links to the TISEAN executables.

Suggestions from anyone familiar with this are greatly appreciated.

Thanks,

Burton Shank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Display numbers on map

2012-01-17 Thread Ray Brownrigg

David:

That doesn't quite answer the question about Alaska and Hawaii.

Jeffrey:

help(state, maps)
states This database produces a map of the states of the United States 
mainland ...

so you have to use:
map(world, USA)
map(state, add=T)
to get the whole lot (which I am willing to bet is not exactly what is wanted, 
even though 
it is what was stated). 

This is not the end of the story, but if you'll state exactly what you want for 
Alaska and 
Hawaii, then perhaps we can supply a solution.

Ray Brownrigg

On Wed, 18 Jan 2012, David Winsemius wrote:
 On Jan 17, 2012, at 5:37 AM, Jeffrey Joh wrote:
  I have a text file with states and numbers.  I would like to display
  each number that corresponds to a state on a map.
  
  I am trying to use the maps package, but it doesn't show Alaska or
  Hawaii.  Do you have suggestions on how to do this?
 
 This question suggests you are not yet aware of the search facility
 built into R:
 
 RSiteSearch(maps Hawaii Alaska )
 A search query has been submitted to http://search.r-project.org
 The results page should open in your browser shortly
 
 The third hit was back to the map function help page. (The examples
 should be read and executed.)
 
 ?example
 
  [[alternative HTML version deleted]]
 
 And the above suggests you still need to read the following:
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change state names to abbreviations in an irregular, list of names, abbreviations, null values, and foreign provinces

2012-01-17 Thread David Kikuchi

Thanks!  Both solutions work well for the problem that I described, 
although I failed to mention that there are also pre-abbreviated names 
in the list that I'm working with, and the second solution returns NA's 
for these (but there are plenty of ways around this).  Sample data might 
be: State.Province - c(Cusco, TX, unknown, Texas, NA, Louisiana)


David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unable to find an inherited method for function make.db.names, for signature character, missing

2012-01-17 Thread Uwe Ligges

I think the amount of people on this list who understand your question 
is roughly zero. We cannot see how the subject is related to the body of 
your function nor do we see any incidence that ou followed the posting 
guide.


Uwe Ligges




On 17.01.2012 17:06, Poul Kristensen wrote:

Hi !

I am new to R and I am using Rstudio on Linux.

Have I missed some library() ?

and if so does anyone have the time to write which?

I am trying to create some PostqreSQL tables from comma separated files.

I am a bit surpriced that character is a problem. The missing value
could be NULL.


Thanks

Poul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using Aggregate() with FUN arguments, which require more than one input variables

2012-01-17 Thread Rui Barradas

Hello,


RNoob wrote
 
 Dear all,
 
 I am trying to apply the aggregate() function to calculate correlations
 for subsets of a dataframe. My argument x is supposed to consist of 2
 numerical vectors, which represent x and y for the cor() function. 
 
 The following error results when calling the aggregate function: Error in
 FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think
 the subsets aggregate puts into cor() are sort of list types and therefore
 can't be handled by cor().
 
 Can anyone provide me with a solution?
 
 Regards,
 RNoob
 

I don't know if I'm understanding it well but it seems you're trying to
compute a correlation matrix for each group of a data.frame. The data.frame
is divided into groups by one or more factor columns.  If this is what you
want, try the function below. It doesn't use 'aggregate', it uses 'split'
and 'lapply'.


cor.groups - function(x, vars){
cols - if(is.character(vars)) names(x) else 1:ncol(x)
cols - cols %in% vars
cols - cols | sapply(x, is.factor) | sapply(x, is.character)
# transform logical to numeric index
cols - which(cols)

lapply(split(x, x[, vars]), function(grp) cor(grp[, -cols]))
}

# Sample data
N - 100
DF - data.frame(U=as.factor(sample(LETTERS[1:3], N, T)),
V=as.factor(sample(0:1, N, T)),
W=sample(letters[1:6], N, T),
x=1:N, y=sample(10, N, T), z=rnorm(N),
stringsAsFactors=FALSE)

# And test it. Note the argument 'stringsAsFactors'

cor.groups(DF, U)
cor.groups(DF, c(U, V))
cor.groups(DF, 1:3)
cor.groups(DF, c(U, x)) # look out, right result, wrong function
call


I hope it helps. (if not, be more explicit)

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4304535.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unable to find an inherited method for function make.db.names, for signature character, missing

2012-01-17 Thread Bert Gunter

2012/1/17 Uwe Ligges lig...@statistik.tu-dortmund.de:
I think the amount of people on this list who understand your question is
roughly zero.

A Fortunes candidate?

Cheers,
Bert

We cannot see how the subject is related to the body of your
function nor do we see any incidence that you followed the posting guide.

Uwe Ligges

On 17.01.2012 17:06, Poul Kristensen wrote:

Hi !

I am new to R and I am using Rstudio on Linux.

Have I missed some library() ?

and if so does anyone have the time to write which?

I am trying to create some PostqreSQL tables from comma separated files.

I am a bit surpriced that character is a problem. The missing value
could be NULL.

Thanks

Poul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[R] Using !is.na() in a HAVING clause in sqldf() XXXX

2012-01-17 Thread Dan Abner

Hi everyone,

I have the following:

sqldf(select Premie,count(tpounds) N,avg(tpounds) Avg_Weight,
  stddev_samp(tpounds) StdDev
  from children
  group by Premie
  having !is.na(Premie))

sqldf() does not like the !is.na(Premie) specification. How does one
exclude a missing group in an aggregated query using sqldf()?

Thanks!

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using !is.na() in a HAVING clause in sqldf() XXXX

2012-01-17 Thread Joseph Magagnoli

Did you try a where statement?

where Premie is not null

On Tue, Jan 17, 2012 at 3:03 PM, Dan Abner dan.abne...@gmail.com wrote:

 Hi everyone,

 I have the following:

 sqldf(select Premie,count(tpounds) N,avg(tpounds) Avg_Weight,
  stddev_samp(tpounds) StdDev
  from children
  group by Premie
  having !is.na(Premie))

 sqldf() does not like the !is.na(Premie) specification. How does one
 exclude a missing group in an aggregated query using sqldf()?

 Thanks!

 Dan

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joseph C. Magagnoli

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error when extracting from a data frame

2012-01-17 Thread Suzanne.mertens

(As a noob to R, this is my first posting - yes yes, groans all around...)

I'm trying to extract certain rows from a data frame. I used the following to 
import data from a CSV txt file.

data - read.table(file=data.txt, header=TRUE)

when I do this, my attempt to extract the data rows only from where the Station 
value equals 1…

data.station1 - data[data$Station == 1]

...is giving me the following error message:

Error in `[.data.frame`(data, data
$Station == 1) : 
  undefined columns selected


Bah. 
If I use names(data) I can see Station as a column name. 
And if I use str(data), the variable Station is coming up as integers 
including the value 1. 
And if I use data$Station, I see all station values, including the 1s. 
And if I use data[,Station] I do see all the Station values
And if I instead treat the Station values as characters, by using 1, I still 
get the undefined error. 

Could someone please correct me on my syntax? Or advise if perhaps I imported 
the data the wrong way? I'm working out of A Beginner's Guide to R and also 
looked through the R manual, and even tried this from  Google search:

data.station1 - data,(Station == 1) ]

But that gave me an unwanted output: 
data frame with 0 columns and 789 rows

Almost, but not quite. Please help?


Thank you,



- Suzanne
..
suzanne.mert...@gmail.com
404-337-1533

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using !is.na() in a HAVING clause in sqldf() XXXX

2012-01-17 Thread Phil Spector


Dan -
   Try using having Premie not null instead of 
having !is.na(Premie) .

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Tue, 17 Jan 2012, Dan Abner wrote:


Hi everyone,

I have the following:

sqldf(select Premie,count(tpounds) N,avg(tpounds) Avg_Weight,
 stddev_samp(tpounds) StdDev
 from children
 group by Premie
 having !is.na(Premie))

sqldf() does not like the !is.na(Premie) specification. How does one
exclude a missing group in an aggregated query using sqldf()?

Thanks!

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem in import and export

2012-01-17 Thread Jean V Adams

mukul purva wrote on 01/16/2012 01:35:00 AM:

 hello,
i hav a prob in R lang. i want to do correlation of data frame
 22810(gene) rows and 1436 colums(experiment) when i covert this file
 in csv or xls format it reads only 1024 cloumns. n when i do
 correlation of data mean 22810 *22810 matrix made in terminal then i
 export in csv or xls by write.table it will give me only 1024 columns
 rather than 22810 columns.wat i do
 plzzz help me... for this file what should the syntax of import and
 export...


It's difficult to help you without an example of your code.  I created a 
very simple example of a data frame with 1,500 columns and had no 
difficulty saving it to a csv file and viewing the file with both a text 
editor and with MS Excel (2007) for Windows.

df - data.frame(matrix(1:3000, nrow=2))
write.csv(df, C:\\junk.csv, quote=FALSE)

You may also want to investigate the write.matrix() function in the MASS 
package.
?write.matrix

Jean

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Display numbers on map

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 2:21 PM, Ray Brownrigg wrote:


David:

That doesn't quite answer the question about Alaska and Hawaii.


Agreed. I misinterpreted the text of help(map) example to mean that HI  
and AK boundaries could be found in unemp. However, it was also  
advice to Jeffrey that searching might suggest solutions.


I found several citations on Baron's search site, including this one:

http://finzi.psych.upenn.edu/R/Rhelp02/archive/44481.html

 This is what that example lead me try (after a bit of digging for  
the right names to use for Alaska_:


grep( USA:Alas, map(world, plot=FALSE)$names, value=TRUE)


quartz();  # or whatever is appropriate in your OS to open a new  
screen device

map(state, xlim=c(-170,-60), ylim=c(15,85))
map(world, Hawaii, add=TRUE)
map(world, USA:Alaska, add=TRUE)

--
David.



Jeffrey:

help(state, maps)
states This database produces a map of the states of the United  
States mainland ...


so you have to use:
map(world, USA)
map(state, add=T)
to get the whole lot (which I am willing to bet is not exactly what  
is wanted, even though

it is what was stated).

This is not the end of the story, but if you'll state exactly what  
you want for Alaska and

Hawaii, then perhaps we can supply a solution.

Ray Brownrigg

On Wed, 18 Jan 2012, David Winsemius wrote:

On Jan 17, 2012, at 5:37 AM, Jeffrey Joh wrote:

I have a text file with states and numbers.  I would like to display
each number that corresponds to a state on a map.

I am trying to use the maps package, but it doesn't show Alaska or
Hawaii.  Do you have suggestions on how to do this?


This question suggests you are not yet aware of the search facility
built into R:

RSiteSearch(maps Hawaii Alaska )
A search query has been submitted to http://search.r-project.org
The results page should open in your browser shortly

The third hit was back to the map function help page. (The examples
should be read and executed.)

?example


[[alternative HTML version deleted]]


And the above suggests you still need to read the following:

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error when extracting from a data frame

2012-01-17 Thread Jean V Adams

Read the help file on how to extract from a data frame:
?[.data.frame

Then, try adding a comma inside the brackets.
data.station1 - data[data$Station==1, ] 
Before the comma, the data$Station==1 identifies what rows to select.
After the comma, the lack of specification indicates that all columns 
should be selected.

Jean


Suzanne.mertens wrote on 01/17/2012 03:17:41 PM:

 (As a noob to R, this is my first posting - yes yes, groans all 
around...)
 
 I'm trying to extract certain rows from a data frame. I used the 
 following to import data from a CSV txt file.
 
data - read.table(file=data.txt, header=TRUE)
 
 when I do this, my attempt to extract the data rows only from where 
 the Station value equals 1?
 
data.station1 - data[data$Station == 1]
 
 ...is giving me the following error message:
 
Error in `[.data.frame`(data, data
 $Station == 1) : 
  undefined columns selected
 
 
 Bah. 
 If I use names(data) I can see Station as a column name. 
 And if I use str(data), the variable Station is coming up as 
 integers including the value 1. 
 And if I use data$Station, I see all station values, including the 1s. 
 And if I use data[,Station] I do see all the Station values
 And if I instead treat the Station values as characters, by using 
 1, I still get the undefined error. 
 
 Could someone please correct me on my syntax? Or advise if perhaps I
 imported the data the wrong way? I'm working out of A Beginner's 
 Guide to R and also looked through the R manual, and even tried 
 this from  Google search:
 
data.station1 - data,(Station == 1) ]
 
 But that gave me an unwanted output: 
data frame with 0 columns and 789 rows
 
 Almost, but not quite. Please help?
 
 
 Thank you,
 
 
 
 - Suzanne
 ..
 suzanne.mert...@gmail.com
 404-337-1533

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bayesian mixed logit

2012-01-17 Thread Ben Bolker

Carlo Fezzi (ENV C.Fezzi at uea.ac.uk writes:

 
 Dear all,
 
 I am writing an R code to fit a Bayesian mixed logit (BML) via MCMC / MH
algorithms following Train (2009, ch. 12).
 
 Unfortunately, after many draws the covariance matrix 
 of the correlated random parameters tend to become
 a matrix with almost perfect correlation, so I think
  there is a bug in the code I wrote but I do not seem to be
 able to find it.. dull I know.
 
 Has anybody written a code for BML with R and would like to share it with me
or even take a quick look at my code? I
 would be extremely grateful for any help.

(1) maybe better at r-sig-mixed-mod...@r-project.org
(2) are you trying this on real, or on simulated data? The collapse
of the covariance matrix in this way is a very common symptom of
overfitting/underidentification in mixed models.  I wouldn't say
it necessarily constitutes a bug in your code.  In principle you
should be able to get an uncorrelated answer if you use a big enough,
sufficiently well-behaved simulated data set, but not necessarily 
for real data ...
(3) have you tried the MCMCglmm package, which is a very fast and
flexible MCMC-based approach to GLMMs?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean of simulation runs given in a table

2012-01-17 Thread Ireneusz Szcześniak

Thank you, Uwe, for your help!  I have more measurements (m1, m2) and 
more parameters (par1, par2).  I can calculate the means of m1 and m2 
this way:


aggregate(cbind(m1, m2) ~ par1 + par2, dat, mean)

However, I also need to calculate the standard error of the mean, and 
the variance for the sample, and I would like to have them output as 
extra columns next to the column with means.


Again, I would appreciate any help!

On 17.01.2012 15:09, Uwe Ligges wrote:



On 17.01.2012 12:31, Irek Szczesniak wrote:

Hi,

I have the simulation results of the following structure:

run par measured
1 10 12
2 10 14
1 20 20
2 20 26

Where run is the simulation run number, par is the parameter of
the simulation, and measured is the value measured in the
simulation. This is only a simple example of my results. There are
many values measured and many parameters. But the basic structure
stays the same: there are many runs (identified by the run number) for
the same values of the parameters with various measured values -- they
constitute a sample.

I would like to calculate the mean of the measured value for a
sample, and so I would like to obtain the output as follows:

par mean
10 13
20 23

I would appreciate it if someone could write me how to do it.



For you data in a data.frame called dat:

aggregate(measured ~ par, dat, mean)

Uwe Ligges




Thank you,
Irek

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Ireneusz (Irek) Szczesniak
http://www.irkos.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread crimsonengineer87

Thanks for the comments everyone. I was hoping to not have to find someone in
the stats department ... well, we'll see.

So in response to Z's comment ... I have tried breakpoints(Na ~ yield) and I
did expect to get something continuous. The idea was to get two or three
linear functions making up the curve. And then from there, get a CI from
these lines. Of course, it wouldn't be good. (This is coming from a
non-stats guy ... I'm a civil engineer by degree and am now learning to be a
modeler as a grad student!). Do you know of any more examples of
breakpoints? The examples in the references are great, but I can't seem to
get it right.

Thanks again.

--
View this message in context: 
http://r.789695.n4.nabble.com/breakpoints-and-nonlinear-regression-tp4303629p4305000.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error when extracting from a data frame

2012-01-17 Thread David Winsemius



On Jan 17, 2012, at 4:56 PM, Jean V Adams wrote:


Read the help file on how to extract from a data frame:
?[.data.frame

Then, try adding a comma inside the brackets.
data.station1 - data[data$Station==1, ]
Before the comma, the data$Station==1 identifies what rows to select.
After the comma, the lack of specification indicates that all columns
should be selected.



And if you want to avoid getting all the rows where data$Station are  
NA,  then use either of these alternatives resulting in what I expect  
and generally want to see as a result:


 data.station1 - subset( data, Station==1 )
 data.station1 - data[ which(data$Station==1) , ]

--
David.

Jean


Suzanne.mertens wrote on 01/17/2012 03:17:41 PM:


(As a noob to R, this is my first posting - yes yes, groans all

around...)


I'm trying to extract certain rows from a data frame. I used the
following to import data from a CSV txt file.

  data - read.table(file=data.txt, header=TRUE)

when I do this, my attempt to extract the data rows only from where
the Station value equals 1?

  data.station1 - data[data$Station == 1]

...is giving me the following error message:

  Error in `[.data.frame`(data, data
$Station == 1) :
undefined columns selected


Bah.
If I use names(data) I can see Station as a column name.
And if I use str(data), the variable Station is coming up as
integers including the value 1.
And if I use data$Station, I see all station values, including the  
1s.

And if I use data[,Station] I do see all the Station values
And if I instead treat the Station values as characters, by using
1, I still get the undefined error.

Could someone please correct me on my syntax? Or advise if perhaps I
imported the data the wrong way? I'm working out of A Beginner's
Guide to R and also looked through the R manual, and even tried
this from  Google search:

  data.station1 - data,(Station == 1) ]

But that gave me an unwanted output:
  data frame with 0 columns and 789 rows

Almost, but not quite. Please help?


Thank you,



- Suzanne
..
suzanne.mert...@gmail.com
404-337-1533


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean of simulation runs given in a table

2012-01-17 Thread William Dunlap

Try using the function in the plyr package.  E.g.,
   z - data.frame( # your toy dataset
   run = c(1, 2, 1, 2),
   par = c(10, 10, 20, 20),
   measured = c(12, 14, 20, 26))
   library(plyr)
   ddply(z, .(par), summarize, meanMeasured=mean(measured), 
sdMeasured=sd(measured))
par meanMeasured sdMeasured
  1  10   13   1.414214
  2  20   23   4.242641

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Ireneusz
 Szczesniak
 Sent: Tuesday, January 17, 2012 2:43 PM
 To: r-help@r-project.org
 Subject: Re: [R] Mean of simulation runs given in a table
 
 Thank you, Uwe, for your help!  I have more measurements (m1, m2) and
 more parameters (par1, par2).  I can calculate the means of m1 and m2
 this way:
 
 aggregate(cbind(m1, m2) ~ par1 + par2, dat, mean)
 
 However, I also need to calculate the standard error of the mean, and
 the variance for the sample, and I would like to have them output as
 extra columns next to the column with means.
 
 Again, I would appreciate any help!
 
 On 17.01.2012 15:09, Uwe Ligges wrote:
 
 
  On 17.01.2012 12:31, Irek Szczesniak wrote:
  Hi,
 
  I have the simulation results of the following structure:
 
  run par measured
  1 10 12
  2 10 14
  1 20 20
  2 20 26
 
  Where run is the simulation run number, par is the parameter of
  the simulation, and measured is the value measured in the
  simulation. This is only a simple example of my results. There are
  many values measured and many parameters. But the basic structure
  stays the same: there are many runs (identified by the run number) for
  the same values of the parameters with various measured values -- they
  constitute a sample.
 
  I would like to calculate the mean of the measured value for a
  sample, and so I would like to obtain the output as follows:
 
  par mean
  10 13
  20 23
 
  I would appreciate it if someone could write me how to do it.
 
 
  For you data in a data.frame called dat:
 
  aggregate(measured ~ par, dat, mean)
 
  Uwe Ligges
 
 
 
  Thank you,
  Irek
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 Ireneusz (Irek) Szczesniak
 http://www.irkos.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] pscl package and hurdle model marginal effects

2012-01-17 Thread twarzin

This request is related to the following post from last year:

https://stat.ethz.ch/pipermail/r-help/2011-June/279752.html

After reading the thread, the idea is still not clear. I have fitted a model 
using HURDLE from the PSCL package. I am trying to get marginal effects / 
slopes by multiplying the coefficients by the mean of the marginal effects (I 
think this is right). To my understanding, this will require a mean for the 
binary probability model and a mean for the truncated Poisson count model. My 
guess is that I would use

mean(predict( MODELNAME, type = XXX))

where MODELNAME is the hurdle model and XXX is either RESPONSE, COUNT, or ZERO. 
Assuming the above is right (correct me if it isn't), my questions are:

1. What XXX gives me the mean of the marginal effects for the binomial 
probability model?
2. What XXX gives me the mean of the marginal effects for the count model?

Judging from my results, I would guess the answer to question 1 is COUNT, 
except max(predict(MODELNAME, type= count)) returns 4.5 and I expected it to 
be less than 1. I would also have expected COUNT to match up with the truncated 
Poisson count model. What is the intuition here? 

Also, when I try XXX = PROB, I get the following error: 

Error in matrix(NA, nrow = length(mu), ncol = nUnique) : 
  too many elements specified

So maybe there are other problems.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] arules killed

2012-01-17 Thread Patrick McCann

Hi, I recently got a bizarre message when running arules. It just said
Killed and quit. Anyone know why this might have happened? I am running R
on an AWS quad xl ubuntu instance.

Here is some information, including dataset size and the parameters:

parameter specification:
   confidence minval smax arem  aval originalSupport  support minlen
maxlen
 0.00035812510.11 none FALSETRUE 3.581251e-05  2
   4
 target   ext
  rules FALSE

algorithmic control:
 filter tree heap memopt load sort verbose
0.1 TRUE TRUE  FALSE TRUE2TRUE

apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)(c) 1996-2004   Christian Borgelt
set item appearances ...[1712 item(s)] done [0.00s].
set transactions ...[1712 item(s), 837696 transaction(s)] done [3.99s].
sorting and recoding items ... [1561 item(s)] done [1.83s].
creating transaction tree ... done [1.65s].
checking subsets of size 1 2 3Killed

Thanks,
Patrick McCann

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot- using geom_point and geom_line at the same time

2012-01-17 Thread Mary Kindall

Thanks Hadley for your input.
The following code works fine now.
Thanks again


con = textConnection(inputs  var1  var2  var3
100 10 5 2
1000 20 10 4
5000 30 15 8
1 40 20 16
3 50 25 32)
 data = read.table(con, header=TRUE)
 data
 data = melt(data, id=inputs)
 g - ggplot(data,aes(x=inputs, value, colour= variable, shape=variable))
 g - g + geom_line(lwd=0.8)
 g - g + geom_point()
g - g + scale_colour_discrete('my Custom Legend')
g - g + scale_shape_discrete(my Custom Legend)
g


-

On Tue, Jan 17, 2012 at 10:07 AM, Hadley Wickham had...@rice.edu wrote:

 On Mon, Jan 16, 2012 at 6:05 PM, Mary Kindall mary.kind...@gmail.com
 wrote:
  Thanks for reply
  I wanted to have legend name with spaces. Right now I am using the
  following code but it produce two legends. I have to use Gimp to cut the
  redundant legend.

 Your basic problem is that you're using the fill and colour
 aesthetics, but you only need colour.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/




-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] howto test a package without installation

2012-01-17 Thread Jonas Stein

 Look at Hadley Wickham's devtools package.  It is designed with this
 sort of thing.  That said, it really is not too difficult to install
 as long as you have a working tool chain (which you will need to test
 it anyway).

I cound not find it with google and found no devtools on this page 
http://had.co.nz/
can you give more details please.


 R CMD INSTALL /tmp/sitools
 R
 require(sitools)

This seems not to be what i want:

$ R CMD INSTALL sitools
* installing to library ‘/usr/local/lib/R/site-library’
Error: ERROR: no permission to install to directory 
‘/usr/local/lib/R/site-library’

R tries to install something in my system. That may confuse my 
debian packagemanagement.

kind regards,

-- 
Jonas Stein n...@jonasstein.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] howto test a package without installation

2012-01-17 Thread Jonas Stein


 I don't believe you can. However, you need not install it into a system-wide 
 library directory... your personal library (e.g. 
 /home/jonas/R/x86_64-pc-linux-gnu-library/2.14) should be sufficient.

Finally i created a new testuser to install the library locally as you wrote.
It works. Thank you.
How can i get my R clean again afterwards to test the next version? 

kind regards,

-- 
Jonas Stein n...@jonasstein.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] howto test a package without installation

2012-01-17 Thread R. Michael Weylandt

http://cran.r-project.org/web/packages/devtools/index.html

Michael

On Tue, Jan 17, 2012 at 6:51 PM, Jonas Stein n...@jonasstein.de wrote:
 Look at Hadley Wickham's devtools package.  It is designed with this
 sort of thing.  That said, it really is not too difficult to install
 as long as you have a working tool chain (which you will need to test
 it anyway).

 I cound not find it with google and found no devtools on this page
 http://had.co.nz/
 can you give more details please.


 R CMD INSTALL /tmp/sitools
 R
 require(sitools)

 This seems not to be what i want:

 $ R CMD INSTALL sitools
 * installing to library ‘/usr/local/lib/R/site-library’
 Error: ERROR: no permission to install to directory 
 ‘/usr/local/lib/R/site-library’

 R tries to install something in my system. That may confuse my
 debian packagemanagement.

 kind regards,

 --
 Jonas Stein n...@jonasstein.de

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breakpoints and nonlinear regression

2012-01-17 Thread Rolf Turner



In respect of fitting piecewise linear regressions, have you looked at
the segmented package?

cheers,

Rolf Turner

On 18/01/12 04:30, crimsonengineer87 wrote:

Dear Forum,

I have been wracking my head over this problem for the past few days. I have
a dataset of (x,y). I have been able to obtain a nonlinear regression line
using nls. However, we would like to do some statistical analysis. I would
like to obtain a confidence interval for the curve. We thought we could
divide up the curve into piecewise linear regressions and compute CIs from
those portions. There is a package called strucchange that seems helpful,
but I am thoroughly confused.

'breakpoints' is used to calculate the number of breaks in the data for
linear regressions.  I have the following in my script:

bp.pavlu- breakpoints(Na ~ f(yield, a, b), h=0.15, breaks=3,
data=pavludata)
plot(bp.pavlu)
breakpoints(bp.pavlu)

But I am confused as to how to graph the piecewise functions that make up
the curve. I am not even sure if I am using breakpoints correctly. Do I just
give it a linear relationhip (Na ~ yield), instead of what I have?

Is there an easier way to calculate the confidence interval for a non-linear
regression?

I am new to R (as I've read in many questions), but I have most certainly
tried many things and am just getting frustrated with the lack of examples
for what I'd like to do with my data... I'd appreciate any insight. I can
also provide more information if I am not clear. Thanks in advance.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R package dev: how to export constant?

2012-01-17 Thread Jonas Stein

Hi,
i create two constants kilo and milli in [1]. These should be available
after loading 

library(sitools)

How should i export them and what have i done wrong?
(Other suggestions for improving the package are welcome too)

The ready to use .tar.gz and the source can be found on github [2,3]

kind regatds,

[1] https://github.com/jonasstein/sitools/blob/master/init.R
[2] https://github.com/jonasstein/sitools/downloads
[3] https://github.com/jonasstein/sitools

-- 
Jonas Stein n...@jonasstein.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R package dev: how to export constant?

2012-01-17 Thread William Dunlap

Try adding
  LazyData: yes
to the DESCRIPTION file.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Jonas Stein
 Sent: Tuesday, January 17, 2012 4:41 PM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] R package dev: how to export constant?
 
 Hi,
 i create two constants kilo and milli in [1]. These should be available
 after loading
 
 library(sitools)
 
 How should i export them and what have i done wrong?
 (Other suggestions for improving the package are welcome too)
 
 The ready to use .tar.gz and the source can be found on github [2,3]
 
 kind regatds,
 
 [1] https://github.com/jonasstein/sitools/blob/master/init.R
 [2] https://github.com/jonasstein/sitools/downloads
 [3] https://github.com/jonasstein/sitools
 
 --
 Jonas Stein n...@jonasstein.de
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] howto test a package without installation

2012-01-17 Thread Jeff Newmiller

Just install the updated library over the old one and start a new R session.

Devtools can be helpful for some things, but when I last looked at it I was 
having more difficulty with getting documentation right than debugging code, 
which I can do using normal function development processes, so I went back to 
the edit compile reload cycle to test the library in it's final form.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Jonas Stein n...@jonasstein.de wrote:


 I don't believe you can. However, you need not install it into a
system-wide library directory... your personal library (e.g.
/home/jonas/R/x86_64-pc-linux-gnu-library/2.14) should be sufficient.

Finally i created a new testuser to install the library locally as you
wrote.
It works. Thank you.
How can i get my R clean again afterwards to test the next version? 

kind regards,

-- 
Jonas Stein n...@jonasstein.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 109 matches

Mail list logo