date:20121011

I am not sure you have expressed what you wanjt to do correctly. See inline:

On Wed, Oct 10, 2012 at 9:10 PM, andrewH ahoer...@rprogress.org wrote:
 I have a couple of hundred American Community Survey Summary Files files
 containing rectangular arrays of data, mainly though not exclusively
 numeric.  Each file is referred to as a sequence (henceforth seq).
-- so 1 seq (terrible identifier -- see below for why) = 1 file

 From
 these files I am trying to extract particular subsets (tables) consisting of
 a sets of columns.  These tables are defined by three numbers (now in
 columns in a data frame):
 1.  a file identifier (seq)
 2.  first column position numbers (startNo)
 3.  length of table (len)

So your data frame, call it yourframe, has columns named:

seq  startNo   len


 so the columns to select for one triple would consist of
 startNo:(startNo+length-1).   I am trying to create for each sequence a
 vector of all the column numbers for tables in that sequence.

So for each seq id you want to find all the column numbers, right?

sq.n - seq_len(nrow(yourframe)) ## Just to make it easier to read
colms -  tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,],
   sort(unique(do.call(c, mapply(seq, from=startNo,
length=len,SIMPLIFY = FALSE)

## Comments
In the mapply call, seq is the R function, ?seq.  That's why using it
as a name for a file id is terrible -- it causes confusion.

In the absence of data, this is untested -- and probably not quite
right. But it should be close, I hope. The key idea is the use of
mapply to get the sequence of columns for each row in all the rows for
each seq id. The SIMPLIFY = FALSE guarantees that this yields a list
of vectors of column indices, which are then glopped together and
cleaned up by the sort(unique(do.call(  ...  stuff.

colms should then be a list giving the sorted column numbers to choose
for each seq id.

I do not know whether (once cleaned up,) this is either more elegant
or more efficient than what you proposed. And I wouldn't be surprised
if someone like Bill Dunlap comes up with a lot better way, either.
But it is different -- and perhaps amusing.

... If I have properly understood what you wanted. If not, ignore all.

Cheers,
Bert


 Obviously I could do this with nested for loops,e.g..

 seq - c(1,1,2,2)
 startNo  - c(3, 10, 3, 15)
 len - c(4, 2, 5, 3)
 data.df - data.frame(seq, startNo, len)

 seq.f - factor(data.df$seq)
 data.l - split(data.df, seq.f)
 selectColsList- vector(list, length(levels(seq.f)))
 for (i in seq_along(levels(seq.f))){
selectCols - numeric()
for (j in seq_along(data.l[[i]]$startNo)){
selectCols - c(selectCols,
 data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j]
data.l[[i]]$len[j]-1))
 }
 selectColsList[[i]] - selectCols
 }
 selectColsList
 [[1]]
 [1]  3  4  5  6 10 11
 [[2]]
 [1]  3  4  5  6  7 15 16 17

 But this code strikes me as inelegant and verbose. It seems to me that there
 ought to be a way to make the outer loop, (indexed with i) into a tapply
 function (which is why I started with a split()), and the inner loop
 (indexed with j) into some cute recursive function, but I was not able to do
 so. If anyone could suggest some nicer (e.g. shorter, or faster, or just
 more sophisticated) way to do this instead, I would be most grateful.

 Sincerely, andrewH




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Connect R and Lyx in UBUNTU

2012-10-11 Thread ATANU

By Connect I meant to say that I was able to write code chunks in LYX and
compile them within LYX( using R) to produce results along with other
stuffs.


There are many tutorials available for doing this under Windows  but I could
not solve the problem for linux (UBUNTU).

-Atanu





--
View this message in context: 
http://r.789695.n4.nabble.com/Connect-R-and-Lyx-in-UBUNTU-tp4645675p4645824.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Connect R and Lyx in UBUNTU

2012-10-11 Thread Yihui Xie

It is actually much easier to do it under Ubuntu; see a video here:
http://yihui.name/knitr/demo/lyx/ If you want to use Sweave instead of
knitr, there is also a module for it.

The official documentation is here:

- https://github.com/downloads/yihui/lyx/sweave.pdf
- https://github.com/downloads/yihui/lyx/knitr.pdf

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


On Thu, Oct 11, 2012 at 12:42 AM, ATANU ata.s...@gmail.com wrote:
 By Connect I meant to say that I was able to write code chunks in LYX and
 compile them within LYX( using R) to produce results along with other
 stuffs.


 There are many tutorials available for doing this under Windows  but I could
 not solve the problem for linux (UBUNTU).

 -Atanu


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting summary plm results to latex

2012-10-11 Thread Duncan Mackay


Hi Sebastian

I think I found the package by accident when I did a search of the 
Cran package page forlatex but did not use it as it could not do a 
very particular problem.


If there was no other alternative use the add.to.row argument of xtable
A while ago I needed to add some info from the summary of a glm an I 
think I did it by using the add.to.row argument and 
\multicolumn{n}{l}{text}value
where n is the number of columns for the text and the value is the 
summary subscript object.

Be careful of \ which has to be \\ and carriage returns
I know its a bit kludgy but if I was doing more of them I would make 
a template for my text editor which cuts down the work.


Regards

Duncan

At 09:45 11/10/2012, you wrote:
I am also interested in the standard errors, but beneath not next to 
the point estimates which is standard in the xtable package.
If you by any chance remember the name of the package or how to do 
it that would be much appreciated!


Cheers,
Sebastian


On Oct 10, 2012, at 7:10 PM, Duncan Mackay mac...@northnet.com.au wrote:

 Hi

 If you just want the coefficients.

 xtable(summary(fe)$coef)
 % latex table generated in R 2.15.1 by xtable 1.7-0 package
 % Thu Oct 11 09:04:59 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{r}
  \hline
  Estimate  Std. Error  t-value  Pr($$$|$t$|$) \\
  \hline
 x  0.12  0.07  1.78  0.08 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}

 There is another package whose name eludes me which may help for 
tables which have different outputs to the output of lm etc


 HTH

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au



 At 05:09 11/10/2012, you wrote:
 HI,

 May be you can use library(texreg):

 library(plm)

 #generating some data
 x - rnorm(270)
 y - rnorm(270)
 t - rep(1:3,30)
 i - rep(1:90, each=3)

 data - data.frame(i,t,x,y)

 fe - plm(y~x,data=data,model=within)
 summary(fe)
 library(texreg)
 fe1-extract.plm(fe) #extract the plm object

 library(xtable)

 xtable(do.call(rbind,lapply(fe1,function(x) data.frame(x
 % latex table generated in R 2.15.0 by xtable 1.7-0 package
 % Wed Oct 10 14:59:10 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{rr}
  \hline
  x \\
  \hline
 Estimate  -0.03 \\
  Std. Error  0.08 \\
  Pr($$$|$t$|$)  0.68 \\
  R\$\verb|^|2\$  0.00 \\
  Adj. R\$\verb|^|2\$  0.00 \\
  Num. obs.  270.00 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}
 #Another example.  In this case, you can create two tables from 
the zz1 list

 data(Produc, package = plm)
zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
data = Produc, index = c(state,year))

 zz1-extract.plm(zz)


 lapply(lapply(zz1,function(x) data.frame(x)),xtable)
 [[1]]
 % latex table generated in R 2.15.0 by xtable 1.7-0 package
 % Wed Oct 10 15:08:02 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{}
  \hline
  Estimate  Std..Error  Pr...t.. \\
  \hline
 log(pcap)  -0.03  0.03  0.37 \\
  log(pc)  0.29  0.03  0.00 \\
  log(emp)  0.77  0.03  0.00 \\
  unemp  -0.01  0.00  0.00 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}

 [[2]]
 % latex table generated in R 2.15.0 by xtable 1.7-0 package
 % Wed Oct 10 15:08:02 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{rr}
  \hline
  x \\
  \hline
 R\$\verb|^|2\$  0.94 \\
  Adj. R\$\verb|^|2\$  0.88 \\
  Num. obs.  816.00 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}


 Hope it helps.

 A.K.







 - Original Message -
 From: Sebastian Barfort sb3...@nyu.edu
 To: r-help@r-project.org
 Cc:
 Sent: Wednesday, October 10, 2012 1:07 PM
 Subject: [R] Exporting summary plm results to latex

 Dear all,

 I am trying to export my fixed effect results to Latex. I am 
using the plm package with the summary function. However, it does 
not look like apsrtable, stargazer, or any other package can 
accompany using the plm package.


 I am interested in a classic table with the coefficient in one 
row followed by the standard error in paranthesis in the next row 
and stars by the coefficient to show relevant coefficient level.


 coefficient 1 xxx**
(xxx)

 Here is a reproducible example:

 library(plm)

 #generating some data
 x - rnorm(270)
 y - rnorm(270)
 t - rep(1:3,30)
 i - rep(1:90, each=3)

 data - data.frame(i,t,x,y)

 fe - plm(y~x,data=data,model=within)
 summary(fe)

 If there is an alternative to using the plm package that works 
with any of the export to latex packages, I would be very 
interested to know. Otherwise, any ideas of how to solve this 
problem are very welcome. I almost exclusively use fixed effect 
panel models, and the problem of exporting results to Latex is one 
of the things preventing me from switching entirely from Stata to R.



 Kind regards,
 Sebastian


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list

Re: [R] Connect R and Lyx in UBUNTU

2012-10-11 Thread Rainer M Krug

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have pointed out already: This is a LyX question - please ask on their 
mailing list
(http://www.lyx.org/MailingLists#toc2 and 
http://dir.gmane.org/gmane.editors.lyx.general)

There are many users who use LyX / sweave or knitr / R under Ubuntu!

Rainer




On 11/10/12 08:08, Yihui Xie wrote:
 It is actually much easier to do it under Ubuntu; see a video here: 
 http://yihui.name/knitr/demo/lyx/ If you want to use Sweave instead of knitr, 
 there is also a
 module for it.
 
 The official documentation is here:
 
 - https://github.com/downloads/yihui/lyx/sweave.pdf -
 https://github.com/downloads/yihui/lyx/knitr.pdf
 
 Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: 
 http://yihui.name 
 Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA
 
 
 On Thu, Oct 11, 2012 at 12:42 AM, ATANU ata.s...@gmail.com wrote:
 By Connect I meant to say that I was able to write code chunks in LYX and 
 compile them
 within LYX( using R) to produce results along with other stuffs.
 
 
 There are many tutorials available for doing this under Windows  but I could 
 not solve the
 problem for linux (UBUNTU).
 
 -Atanu
 
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB2bK0ACgkQoYgNqgF2egpT4gCeN3+VSYx2hMAfSc+jp+Jr81b4
mcEAn3xLh8U7hLiB34L1Rouk3ECKN0Ue
=OWBb
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to replicate SAS by group processing in R

2012-10-11 Thread Barry Rowlingson

On Wed, Oct 10, 2012 at 7:09 PM, ramoss ramine.mossad...@finra.org wrote:

 In SAS I use the following code:

 proc sort data=upper;
 by tdate stock_symbol expire  strike;
 run;
 data upper1;
   set upper;
   by tdate stock_symbol expire  strike;
   if first.expire then output;
   rename strike=astrike;
 run;

 on the following data set:

 tdate   stock_symbolexpiration  strike
 9/11/2012   C  9/16/201211
 9/11/2012   C  9/16/201212
 9/11/2012   C  9/16/201213
 9/12/2012   C  9/16/201214
 9/12/2012   C  9/16/201215
 9/12/2012   C  9/16/201216
 9/12/2012   C   9/16/2012   17

 to get the following results:
 tdate   stock_symbolexpiration  strike
 9/11/2012   C  9/16/201211
 9/12/2012   C  9/16/201214

 How would I replicate this kind of logic in R?

 First, replicate it in some kind of universally understood language -
like English. Nearly every alien in every sci-fi film I've seen speaks
English, so that's a safe assumption :)

 What does it do? Take the first record within groups defined by
tdate? Why does your code say 'expire' but the data have 'expiration'?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ptak and Candpara

2012-10-11 Thread Peppe Ricci

Hi,

I am using the package PTAK and in particular the command Candpara to
perform the Parafac factorizationor of a tensor.
The results are not encouraging as I expected, I'm starting a phase of
analysis to see if there are errors.
I pose a question and I hope you can help me.
The command to run the factorization is:

## CANDECOMP/PARAFAC
results- CANDPARA(data_matrix, dim=3)

summary(results)

U-results[[1]]$v
V-results[[2]]$v
W-results[[3]]$v

data_matrix is a tensor of 943x1682x4.

what I want understand is: U, V, W, are really the three factors that
I should get from factorization?
I hope someone can help me.
Thank you.
giuseppe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Case study in forensic computing domain

2012-10-11 Thread Ambikesh Jayal

Hi,

I am looking for case studies, possibly real world, in forensic domain that
will entice forensic computing students and demonstrate the usefulness of
machine learning in forensics. Does anyone know of any such case studies?

Students should be able to replicate the case study, so it should have some
public corpus data and R code to implement the machine learning approach.

I think a case study to determine the authorship of document using machine
learning would be good. The other case study could a regression model to
detect fake currency based on size, weight and other attributes of a note.

Any pointers would be welcome.

Thanks,
Ambi.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GAM without intercept

2012-10-11 Thread anna freni sterrantino

Hi Sergio, 
based on my understanding ( see Wood Generalized Additive Model)
smoothing basis incorporates the intercept already, due to identifiable
issues. Therefore the intercept is always specified and you don't need 
to specify. I guess that your m2 to model is simply not correct.

Hope it helps

Anna




Anna Freni Sterrantino
Department of Statistics
University of Bologna, Italy
via Belle Arti 41, 40124 BO.



 Da: SAEC sergio.es...@uach.cl
A: r-help@r-project.org 
Inviato: Giovedì 11 Ottobre 2012 0:22
Oggetto: [R] GAM without intercept

Hi everybody,

I am trying to fit a GAM model without intercept using library mgcv. 
However, the result has nothing to do with the observed data. In fact 
the predicted points are far from the predicted points obtained from the 
model with intercept. For example:

#First I generate some simulated data:

library(mgcv)
x-seq(0,10,length=100)
y-x^2+rnorm(100)

#then I fit a gam model with and without intercept

m1-gam(y~s(x,k=10,bs='cs'))
m2-gam(y~s(x,k=10,bs='cs')-1)

#and now I obtain predicted values for the interval 0-1

x1-seq(0,10,0.1)
y1-predict(m1,newdata=list(x=x1))
y2-predict(m2,newdata=list(x=x1))

#plotting predicted values

plot(x,y,ylim=c(0,100))
lines(x1,y1,lwd=4,col='red')
lines(x1,y2,lwd=4,col='blue')

In this example you can see that the red line are the predicted points 
from the model with intercept which fit pretty good to the data, but the 
blue line (without intercept) is far from the observed points.

Probably I missunderstanding some key elements in gam modelling or using 
incorrect syntaxis. I don't know what the problem is. Any ideas will be 
helpful.

Sergio







-- 
Sergio A. Estay
Inst. Ciencias Ambientales y Evolutivas
Universidad Austral de Chile
Casilla 567, Valdivia, Chile
Phone: 5663-293913
http://www.ciencias.uach.cl/instituto/ciencias_ambientales_evolutivas/academicos/sergio-estay.php






--
View this message in context: 
http://r.789695.n4.nabble.com/GAM-without-intercept-tp4645786.html
Sent from the R help mailing list archive at Nabble.com.
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Options to extend memory limit

2012-10-11 Thread jennifer.moeller-gulland

Dear All, 

at the moment I am using R for calculations of large databases. 
Unfortunately, R only manages to complete certain operations at some 
times, and not at others. I usually get the error message cannot allocate 
vector of size XX 

I am using the 64-bit version with Windows 7. While my computer has 8 RAM, 
I do have a feeling that R cannot use all of it. 

Searching online, I found that you can increase the memory with the 
options --max-mem-size/ --max-ppsize or change the environment variable 
R_MAX_MEM_SIZE to allow deep recursion or large and complicated 
calculations to be done 

Unfortunately, I am not very knowledgable yet on how to use R and I did 
not quite manage to use the commands successfully. Could you please tell 
me whether these do make sense for my case and if so how (and at what 
stage of the process) I can use them? 

Thank you very, very much in advance. 
Kind regards, 

Jennifer




PricewaterhouseCoopers Aktiengesellschaft Wirtschaftsprüfungsgesellschaft

Vorsitzender des Aufsichtsrates
WP StB Dr. Norbert Vogelpoth

Vorstandsmitglieder
WP StB Prof. Dr. Norbert Winkeljohann · WP StB Dr. Peter Bartels
WP StB CPA Markus Burghardt · StB Prof. Dr. Dieter Endres · WP StB Prof. 
Dr. Georg Kämpfer
WP StB Harald Kayser · WP RA StB Dr. Jan Konerding · WP StB Andreas Menke
StB Marius Möller · WP StB Martin Scholich

Sitz: Frankfurt am Main - Amtsgericht Frankfurt am Main HRB 44845

Mitglied von PricewaterhouseCoopers International, einer Company limited 
by guarantee registriert in England und Wales

__



Diese Information ist ausschliesslich fuer den Adressaten bestimmt und kann 
vertrauliche oder gesetzlich geschuetzte Informationen enthalten. Wenn Sie 
nicht der bestimmungsgemaesse Adressat sind, unterrichten Sie bitte den 
Absender und vernichten Sie diese Mail. Anderen als dem bestimmungsgemaessen 
Adressaten ist es untersagt, diese E-Mail zu lesen, zu speichern, 
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. Wir 
verwenden aktuelle Virenschutzprogramme. Fuer Schaeden, die dem Empfaenger 
gleichwohl durch von uns zugesandte mit Viren befallene E-Mails entstehen, 
schliessen wir jede Haftung aus.
   * * * * *   
The information contained in this email is intended only...{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names


Hello,

I have a problem, with your data example my results are different. I 
have changed the names of two of the variables, to allow for 'pre' and 
'post' to be first in the names.


# auxiliary functions
ifswap - function(x)
if(x[1] %in% c(pre, post)) x[2:1] else x

getpair - function(i, post)
post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine - function(h)
c(MeanDiff = unname(h$estimate),
CIlower = h$conf.int[1],
CIupper = h$conf.int[2],
p.value = h$p.value)

doTests - function(DF, Pairs){
t.list - lapply( seq_len(nrow(Pairs)), function(i)
t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE),
orange_post = sample(18:28,5,replace=TRUE),
pre_banana = sample(25:35,5,replace=TRUE),  # here
apple_post = sample(20:30,5,replace=TRUE),
post_banana = sample(40:50,5,replace=TRUE), # and here
orange_pre = sample(5:10,5,replace=TRUE))


#
# start processing the data.frame
# Make pairs of pre/post columns
vars - names(dat2)
vmat - do.call(rbind, strsplit(vars, _))
vmat - t(apply(vmat, 1, ifswap))
pre - which(vmat[, 2] == pre)
post - which(vmat[, 2] == post)
post - sapply(pre, getpair, post)
pairs - matrix(c(pre, post), ncol = 2)

# now the tests
result - doTests(dat2, pairs)
rownames(result) - vmat[pre, 1]
result


In your results I believe that the values for meandifference are the 
means of x[, 1], at least that's what I've got.

Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:

HI,

If you have a lot of variables and in no order, then it would be better to 
order the data by column names.
For e.g.
set.seed(432)
dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
dat3-dat2[order(colnames(dat2))] #order the columns
list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
res3-do.call(rbind,lapply(lapply(list3,function(x) 
t.test(x[,1],x[,2],paired=TRUE)),function(x) 
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1)))
res3
# meandifference CIlow   CIhigh  p.value
#apple12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166 22.79583 0.0003888560

A.K.



- Original Message -
From: Nundy, Shantanu snu...@chicagobooth.edu
To: r-help@r-project.org r-help@r-project.org
Cc:
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple pre and post variables I want to compare. The variables are named 
apple_pre or pre_banana with the corresponding post variables named apple_post or 
post_banana. The variables are in no particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana 
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference, 
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] practical to loop over 2million rows?

2012-10-11 Thread S Ellison

  If I use a nested ifelse statement in a loop it takes me 13 
 minutes to get an answer on just 50,000 rows. 
 ...
 ifelse(strataID[i+1]==strataID[i], y-x[i+1], y-x[i-1]))

maybe take a closer look at the ifelse help page and the examples?

First, ifelse is intended to be vectorized. If you nest it in a loop, you're 
effectively nesting a loop inside a loop. And by putting ifelse inside ifelse, 
you've done that twice. And then you've run the loops on vectors of length one, 
so 'twas all in vain...
Second, the two things after the condition in ifelse are not instructions, they 
are arguments to the function. Putting y-something in as an argument means 
'(promise to) store something in a variable called y, and then pass y to the 
function'. You probably didn't mean that.
Third, ifelse returns a vector of the results; you're not using the return 
value for anything.

For a single 'if' that takes some action, you want 'if' and 'else' 
_separately_, not 'ifelse'
y-length(x) #length() already returns a numeric value. So if you must do this 
with a loop, it would look more like
 
for(i in 1:length(x)+1) { #because x[i-1] wand x[i+1] won't be there for all i 
otherwise  
if (!is.na(x[i])) , y[i]-x[i]
if(strataID[i+1]==strataID[i]) y-x[i+1] else y-x[i] #I changed the 
second x index  because I can't see why it differed from the strataID index
   #or, using the fact that 'if' also returns something:
   # y - if(strataID[i+1]==strataID[i]) x[i+1] else x[i]
} 

Finally, if you don't preallocate y at the length you want, R will have to move 
the whole of y to a new memory location with one more space every time you 
append something to it. There's a section on that in the R inferno. It's a 
really good way of slowing R down.

So let's try something else.
strataID - sample(letters[1:3], 200, replace=T) #a nice long strata 
identifier with some matches likely
x - rnorm(200) #some random numbers
x - ifelse(x  -2, NA, x) #a few NA's now in x, though it does take a few 
seconds for the 2 million observations

i - 1:(length(x)-1)  #A long indexing vector with space for the last x[i+1]
y - x  #That puts all the NA's in the right place in y, allocates y and 
happens to put all the current values of x into y too.
system.time( y[i]-ifelse( strataID[i+1]==strataID[i], x[i+1], x[i]  ) )
  #does the whole loop and stores it in the 'right' 
places in y - 
  # though it will foul up those NA's because of 
your x indexing. And incidentally it doesn't change the last y either
   #On my allegedly 2GHz machine the systemt time 
result was 2.87 seconds for the 2 million 'rows' 


#Incidentally, a look at what we ended up with:
data.frame(s=strataID, y=y)[1:30,]
#says you probably aren;t getting anything useful from the exercise other than 
a feel for what can go wrong with loops.

 

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm on matrix data

2012-10-11 Thread Jean V Adams

Baoqiang,

Here's an approach that should work:
(1) Make sure that the column names of trainx and testx are the same.
(2) Combine trainy and trainx into a data frame for fitting the model.
(2) Use the newdata= argument in the predict() function.
(3) Convert testx from matrix to data frame.

# some example data
nrow - 5
ncol - 3
colnames - paste(x, seq(ncol), sep=)
nrow2 - 8
trainx - matrix(rnorm(nrow*ncol), ncol=ncol, dimnames=list(NULL, 
colnames))
trainy - matrix(rnorm(nrow), ncol=1, dimnames=list(NULL, y))
testx - matrix(rnorm(nrow2*ncol), ncol=ncol, dimnames=list(NULL, 
colnames))

# create data frames for model fitting and prediction
traindf - data.frame(cbind(trainy, trainx))
testdf - data.frame(testx)

# fit the model and make predictions for new data
fit - lm(y ~ ., data=traindf)
py - predict(fit, newdata=testdf)

Note that the lm() function you fit to the two matrices worked just fine
lm(trainy ~ trainx)
but the way that names are assigned to the predictor variables
trainxx1, trainxx2, etc
makes it inconvenient in predicting on new data.

Jean

 

Baoqiang Cao bqcaom...@gmail.com wrote on 10/10/2012 09:35:47 AM:
 
 Hi,
 
 I have a question about using lm on matrix, have to admit it is very
 trivial but I just couldn't find the answer after searched the mailing
 list and other online tutorial. It would be great if you could help.
 
 I have a matrix trainx of 492(rows) by 220(columns) that is my x,
 and trainy is 492 by 1. Also, I have the newdata testx which is 240
 (rows) by 220 (columns). Here is what I got:
 
 py - predict(lm(trainy ~ trainx ), data.frame(testx))
 Warning message:
 'newdata' had 240 rows but variable(s) found have 492 rows
 
 The fitting formula I intended is: trainy ~ trainx[,1] + trainx[,2] +
 .. +trainx[,220].
 
 Any help, please?
 
 Best,
 Baoqiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Contacting Delphi ??

2012-10-11 Thread S Ellison

  What does the sudden appearance of Contacting Delphi 
 ..the oracle is unavailable.
 We apologize for any inconvenience. mean? A bug? It appears 
 at plotting.

If you have an ordinary plot command, that is very strange indeed. It's a help 
message ... of sorts*. It should be no more likely to appear by accident in 
plotting than a manual page.

What was the plot command that caused it to appear?

S

*It helps you realise you typed too many question marks

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] performance analytics- package

2012-10-11 Thread sheenmaria

In performance analytics  - performance summary  session , i cant run the
code of - 
charts.PerformanceSummary(datafrom_table, rf = 0, main = NULL, method =
ModifiedVaR, width = 0,event.labels = NULL, ylog = FALSE, wealth.index =
FALSE, gap = 12) 

it just return blank chart.

 datafrom_table - having a csv file.
and the rest of the things are get from the site 
https://www.rmetrics.org/files/Meielisalp2007/Presentations/Peterson.pdf
but i dont get the result -

could u please help me.



--
View this message in context: 
http://r.789695.n4.nabble.com/performance-analytics-package-tp4645834.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R(BCA Package)

2012-10-11 Thread kokila

Hifor all...

I have tried jack.jill dataset in BCA package.This Dataset actually contains
557 observations and 8 variables.but i have got only 2 obsevations.anybody
tried this same function.You people got same answers like me or getting as
usual values? Please reply me



by
Kokila.k



--
View this message in context: 
http://r.789695.n4.nabble.com/R-BCA-Package-tp4645835.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nlmnib Package + Hessian Output

2012-10-11 Thread nserdar

Sorry but I don't modified my function with mle2 :( :( 

Can you give example how to obtain Hessian with  numDeriv ?

Serdar 

# Function
Linn=function(param){

 phi1=((param[1]^2/(1+param[1]^2)))
 phi2=((param[2]^2/(1+param[2]^2)))
 phi3=((param[3]^2/(1+param[3]^2)))
 phi4=((param[4]^2/(1+param[4]^2)))
 
 sigw1=sqrt(exp(param[5]))
 sigw2=sqrt(exp(param[6]))
 sigw3=sqrt(exp(param[7]))
 sigw4=sqrt(exp(param[8]))
 
 sigv=sqrt(exp(param[9]))
 
 Betam1=((param[10]*100)/(sqrt(1+param[10]^2)))
 Betam2=((param[11]*100)/(sqrt(1+param[11]^2)))
 Betam3=((param[12]*100)/(sqrt(1+param[12]^2)))
 Betam4=((param[13]*100)/(sqrt(1+param[13]^2)))

phi=diag(c(phi1,phi2,phi3,phi4),4,4)
betam=c(Betam1,Betam2,Betam3,Betam4)
sigw=diag(c(sigw1,sigw2,sigw3,sigw4),4,4)

a-(1.001)
mu0=c(ols[1,1],ols[2,1],ols[3,1],ols[4,1])
sigma0=diag(c(a,a,a,a),4,4)
  
 kf=kfilter1(n,rt,rm,mu0,sigma0,phi,betam,sigw,sigv)
 
 return(kf$like)
 
 }

a-(1.001)
init.par-c(0.5,0.5,0.5,0.5,a,a,a,a,a,ols[1,1],ols[2,1],ols[3,1],ols[4,1])

###






--
View this message in context: 
http://r.789695.n4.nabble.com/nlmnib-Package-Hessian-Output-tp4645768p4645838.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading in a (very simple) list from a file

2012-10-11 Thread VA Smith

Brilliant!  Thank you both, this works!

Combined with the other suggestion of setting stringsAsFactors to FALSE when
reading in the data frame, I now have the behaviour I wanted.

I had been beginning to get the sense that one of the apply functions was
the solution.  I will now do some reading on split to understand precisely
what I'm doing...

Best wishes,
Anne






--
View this message in context: 
http://r.789695.n4.nabble.com/reading-in-a-very-simple-list-from-a-file-tp4645741p4645839.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] column width in .dbf files using write.dbf ... to be continued

2012-10-11 Thread Luiz Max Carvalho

Old topic...

An answer may be useful for someone else, though...

Just do :
  environment(write.dbfMODIF)-environment(foreign::write.dbf )

and it should be good to go.

Cheers,



--
View this message in context: 
http://r.789695.n4.nabble.com/column-width-in-dbf-files-using-write-dbf-to-be-continued-tp1013017p4645841.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Exporting each row in the table as new table

2012-10-11 Thread kallu

Dear all,

I am new to R and I am familiar with very basic stuff. I am trying to create
tables in text format from each row of my table and export these tables with
specific attribute in the table. I tried after reading some forums but
nothing worked. Can you please help me.

ex:
dataGT

ID  State   YearGrowth
1   IA  199925
2   IA  200027
3   KS  199935
4   KS  200031
5   KY  199914
6   KY  200018
7   NE  199934
8   NE  200038

I am trying to have each row of the table as new table and need to export
that table with name of of the ID.

Please help me if possible. Thank you
Kalyani




--
View this message in context: 
http://r.789695.n4.nabble.com/Exporting-each-row-in-the-table-as-new-table-tp4645844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting summary plm results to latex

Hi,

I tried this function on an example dataset and it seems to be working.
extract.plm - function(model) {

if (!class(model)[1] == plm) {
stop(Internal error: Incorrect model type! Should be a plm object!)
}
zz1-summary(model)$coef[,1:2]
 zz2-as.data.frame(apply(zz1,2,function(x) sprintf(%.3f,x)))
zz2[]-sapply(zz2,function(x) as.numeric(as.character(x)))
zz3-data.frame(Coefficient=row.names(zz1),zz2)
zz3-melt(zz3,by=Coefficient)
zz4-within(zz3,{Coefficient-as.character(Coefficient);variable-as.character(variable)})
zz5-ddply(zz4,.(Coefficient),function(x) x)
zz5$value[zz5$variable==Estimate]
zz5$value[zz5$variable==Std..Error]
zz5$value[zz5$variable==Estimate]-ifelse(summary(model)$coef[,4]0.05 
summary(model)$coef[,4]=0.01, 
gsub((.*),\\1*,zz5$value[zz5$variable==Estimate]),ifelse(summary(model)$coef[,4]0.01,gsub((.*),\\1**,zz5$value[zz5$variable==Estimate]),zz5$value[zz5$variable==Estimate]))
zz5$value[zz5$variable==Std..Error]-gsub((.*),(\\1),zz5$value[zz5$variable==Std..Error])
res-zz5[,c(1,3)]
res
}

data(Produc, package = plm)
zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, 
index = c(state,year)) 

 extract.plm(zz)
#Using Coefficient as id variables
 # Coefficient    value
#1    log(emp)    0.768
#2    log(emp)   (0.03)
#3 log(pc)  0.292**
#4 log(pc)  (0.025)
#5   log(pcap) -0.026**
#6   log(pcap)  (0.029)
#7   unemp -0.005**
#8   unemp  (0.001)
library(xtable)
 xtable(extract.plm(zz))
Using Coefficient as id variables
% latex table generated in R 2.15.0 by xtable 1.7-0 package
% Thu Oct 11 09:43:00 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rll}
  \hline
  Coefficient  value \\ 
  \hline
1  log(emp)  0.768 \\ 
  2  log(emp)  (0.03) \\ 
  3  log(pc)  0.292** \\ 
  4  log(pc)  (0.025) \\ 
  5  log(pcap)  -0.026** \\ 
  6  log(pcap)  (0.029) \\ 
  7  unemp  -0.005** \\ 
  8  unemp  (0.001) \\ 
   \hline
\end{tabular}
\end{center}
\end{table}

A.K.





- Original Message -
From: Sebastian Barfort sb3...@nyu.edu
To: Duncan Mackay mac...@northnet.com.au
Cc: r-help-r-project.org r-help@r-project.org
Sent: Wednesday, October 10, 2012 7:45 PM
Subject: Re: [R] Exporting summary plm results to latex

I am also interested in the standard errors, but beneath not next to the point 
estimates which is standard in the xtable package. 
If you by any chance remember the name of the package or how to do it that 
would be much appreciated!

Cheers,
Sebastian


On Oct 10, 2012, at 7:10 PM, Duncan Mackay mac...@northnet.com.au wrote:

 Hi
 
 If you just want the coefficients.
 
 xtable(summary(fe)$coef)
 % latex table generated in R 2.15.1 by xtable 1.7-0 package
 % Thu Oct 11 09:04:59 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{r}
  \hline
  Estimate  Std. Error  t-value  Pr($$$|$t$|$) \\
  \hline
 x  0.12  0.07  1.78  0.08 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}
 
 There is another package whose name eludes me which may help for tables which 
 have different outputs to the output of lm etc
 
 HTH
 
 Duncan
 
 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au
 
 
 
 At 05:09 11/10/2012, you wrote:
 HI,
 
 May be you can use library(texreg):
 
 library(plm)
 
 #generating some data
 x - rnorm(270)
 y - rnorm(270)
 t - rep(1:3,30)
 i - rep(1:90, each=3)
 
 data - data.frame(i,t,x,y)
 
 fe - plm(y~x,data=data,model=within)
 summary(fe)
 library(texreg)
 fe1-extract.plm(fe) #extract the plm object
 
 library(xtable)
 
 xtable(do.call(rbind,lapply(fe1,function(x) data.frame(x
 % latex table generated in R 2.15.0 by xtable 1.7-0 package
 % Wed Oct 10 14:59:10 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{rr}
  \hline
  x \\
  \hline
 Estimate  -0.03 \\
  Std. Error  0.08 \\
  Pr($$$|$t$|$)  0.68 \\
  R\$\verb|^|2\$  0.00 \\
  Adj. R\$\verb|^|2\$  0.00 \\
  Num. obs.  270.00 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}
 #Another example.  In this case, you can create two tables from the zz1 list
 data(Produc, package = plm)
    zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = 
Produc, index = c(state,year))
 zz1-extract.plm(zz)
 
 
 lapply(lapply(zz1,function(x) data.frame(x)),xtable)
 [[1]]
 % latex table generated in R 2.15.0 by xtable 1.7-0 package
 % Wed Oct 10 15:08:02 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{}
  \hline
  Estimate  Std..Error  Pr...t.. \\
  \hline
 log(pcap)  -0.03  0.03  0.37 \\
  log(pc)  0.29  0.03  0.00 \\
  log(emp)  0.77  0.03  0.00 \\
  unemp  -0.01  0.00  0.00 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}
 
 [[2]]
 % latex table generated in R 2.15.0 by xtable 1.7-0 package
 % Wed Oct 10 15:08:02 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{rr}
  \hline
  x \\
  \hline
 R\$\verb|^|2\$  0.94 \\
  Adj. R\$\verb|^|2\$  0.88 \\
  Num. obs.  816.00 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}
 
 
 Hope it helps.
 
 A.K.

Re: [R] performance analytics- package

2012-10-11 Thread R. Michael Weylandt

On Thu, Oct 11, 2012 at 11:04 AM, sheenmaria sheenmar...@gmail.com wrote:
 In performance analytics  - performance summary  session , i cant run the
 code of -
 charts.PerformanceSummary(datafrom_table, rf = 0, main = NULL, method =
 ModifiedVaR, width = 0,event.labels = NULL, ylog = FALSE, wealth.index =
 FALSE, gap = 12)

 it just return blank chart.

  datafrom_table - having a csv file.
 and the rest of the things are get from the site
 https://www.rmetrics.org/files/Meielisalp2007/Presentations/Peterson.pdf
 but i dont get the result -

 could u please help me.


charts.PerformanceSummary() is well tested, so you'll need to supply
datafrom_table (or an approximation thereof) using the dput() function
to make this problem reproducible. Note that dput(datafrom_table) will
cause R to print a lot of what might seem to you gibberish but it's
important you copy and paste it directly into your reply to allow us
to replicate your problem.

If your dataset is large, use dput(head(datafrom_table, 30)) instead.

Finally, I note you're posting from Nabble. Please include context in
your reply -- I don't believe Nabble does this automatically, so
you'll need to manually include it. Most of the regular respondents on
this list don't use Nabble -- it is a _mailing list_ after all -- so
we don't get the forum view you do, only emails of the individual
posts. Combine that with the high volume of posts, and it's quite
difficult to trace a discussion if we all don't make sure to include
context.

Cheers,
Michael



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/performance-analytics-package-tp4645834.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting each row in the table as new table

2012-10-11 Thread R. Michael Weylandt

On Thu, Oct 11, 2012 at 2:04 PM, kallu kallu...@gmail.com wrote:
 Dear all,

 I am new to R and I am familiar with very basic stuff. I am trying to create
 tables in text format from each row of my table and export these tables with
 specific attribute in the table. I tried after reading some forums but
 nothing worked. Can you please help me.

 ex:
 dataGT

 ID  State   YearGrowth
 1   IA  199925
 2   IA  200027
 3   KS  199935
 4   KS  200031
 5   KY  199914
 6   KY  200018
 7   NE  199934
 8   NE  200038

 I am trying to have each row of the table as new table and need to export
 that table with name of of the ID.

 Please help me if possible. Thank you
 Kalyani



Hi Kalyani,

I'm afraid I don't understand your question: what do you mean in this
context by table? data frame()s? csv files? And in either case, why
are you splitting into single row objects? When you say attribute do
you mean the formal programming construct that is key to many things
in R or something simpler?

In short, could you elaborate further?

Finally, I note you're posting from Nabble. Please include context in
your reply -- I don't believe Nabble does this automatically, so
you'll need to manually include it. Most of the regular respondents on
this list don't use Nabble -- it is a _mailing list_ after all -- so
we don't get the forum view you do, only emails of the individual
posts. Combine that with the high volume of posts, and it's quite
difficult to trace a discussion if we all don't make sure to include
context.

This might also be of help:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Cheers,
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names

HI Rui,

By running your code, I got the results as:
result
#   MeanDiff   CIlower    CIupper  p.value
#apple -12.6 -16.68052  -8.519476 0.0010166626
#banana    -15.0 -17.91196 -12.088040 0.0001388506
#orange    -18.2 -22.79583 -13.604166 0.0003888560

From my code:
res3
#   meandifference CIlow   CIhigh  p.value
#apple    12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166 22.79583 0.0003888560

There is difference in signs.
A.K.




- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu
Cc: R help r-help@r-project.org
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have 
changed the names of two of the variables, to allow for 'pre' and 'post' to be 
first in the names.

# auxiliary functions
ifswap - function(x)
    if(x[1] %in% c(pre, post)) x[2:1] else x

getpair - function(i, post)
    post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine - function(h)
    c(MeanDiff = unname(h$estimate),
        CIlower = h$conf.int[1],
        CIupper = h$conf.int[2],
        p.value = h$p.value)

doTests - function(DF, Pairs){
    t.list - lapply( seq_len(nrow(Pairs)), function(i)
        t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
    do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE),
            orange_post = sample(18:28,5,replace=TRUE),
            pre_banana = sample(25:35,5,replace=TRUE),  # here
            apple_post = sample(20:30,5,replace=TRUE),
            post_banana = sample(40:50,5,replace=TRUE), # and here
            orange_pre = sample(5:10,5,replace=TRUE))


#
# start processing the data.frame
# Make pairs of pre/post columns
vars - names(dat2)
vmat - do.call(rbind, strsplit(vars, _))
vmat - t(apply(vmat, 1, ifswap))
pre - which(vmat[, 2] == pre)
post - which(vmat[, 2] == post)
post - sapply(pre, getpair, post)
pairs - matrix(c(pre, post), ncol = 2)

# now the tests
result - doTests(dat2, pairs)
rownames(result) - vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of 
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:
 HI,
 
 If you have a lot of variables and in no order, then it would be better to 
 order the data by column names.
 For e.g.
 set.seed(432)
 dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
 dat3-dat2[order(colnames(dat2))] #order the columns
 list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
 res3-do.call(rbind,lapply(lapply(list3,function(x) 
 t.test(x[,1],x[,2],paired=TRUE)),function(x) 
 data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
 row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1)))
 res3
 #     meandifference     CIlow   CIhigh      p.value
 #apple            12.6  8.519476 16.68052 0.0010166626
 #banana           15.0 12.088040 17.91196 0.0001388506
 #orange           18.2 13.604166 22.79583 0.0003888560
 
 A.K.
 
 
 
 - Original Message -
 From: Nundy, Shantanu snu...@chicagobooth.edu
 To: r-help@r-project.org r-help@r-project.org
 Cc:
 Sent: Wednesday, October 10, 2012 7:09 PM
 Subject: Re: [R] multiple t-tests across similar variable names
 
 Hi everyone-
 
 I have a dataset with multiple pre and post variables I want to compare. 
 The variables are named apple_pre or pre_banana with the corresponding 
 post variables named apple_post or post_banana. The variables are in no 
 particular order.
 
 apple_pre orange_pre orange_post pre_banana apple_post post_banana
 person_1
 person_2
 person_3
 ...
 person_x
 
 
 How do I:
 1. Run a series of paired t-tests for the apple_pre variables and pre_banana 
 variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
 2. Print the results from these t-tests in a table with col 1=mean 
 difference, col 2= 95% conf interval, col 3=p-value.
 
 Thank you kindly,
 -Shantanu
 
 Shantanu Nundy, M.D.
 University of Chicago
 
      [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org

Re: [R] Options to extend memory limit

2012-10-11 Thread Ben Bolker

 jennifer.moeller-gulland at de.pwc.com writes:

 at the moment I am using R for calculations of large databases. 
 Unfortunately, R only manages to complete certain operations at some 
 times, and not at others. I usually get the error message cannot allocate 
 vector of size XX 
 
 I am using the 64-bit version with Windows 7. While my computer has 8 RAM, 
 I do have a feeling that R cannot use all of it. 
 
 Searching online, I found that you can increase the memory with the 
 options --max-mem-size/ --max-ppsize or change the environment variable 
 R_MAX_MEM_SIZE to allow deep recursion or large and complicated 
 calculations to be done 

  I believe this may be somewhat out of date (although I don't use
Windows so I'm a little rusty). If you are dealing with large databases
you should almost certainly check out the High Performance Computing
task view (you can google it), which recommends many approaches for
dealing with Big Data.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dotplot in .R with lattice latticeExtra: proper visualization

2012-10-11 Thread Andres LaCortadora

Dear everyone,

I'm trying to do a dotplot with the libraries lattice and latticeExtra.
However, no proper representation of the values on the vertical y-axis is
done by .R. Instead of choosing the actual values of the numeric variable,
.R plots the rank of the value. That is, there are values [375, 500, 625,
750, ..., 3000] and .R plots their ranks [1,2,3,4,...23] and chooses the
scale accordingly. Has someone experienced a problem like this? How can I
manage the get a proper representation with ticks like (0, 500, 1000, 1500,
...) on the vertical y-scale?

Here's my data:
https://www.dropbox.com/s/egy25cj00rhum40/data.csv

And here the program code so far:

df.dose - read.table(data.csv, sep=,, header=TRUE)
library(lattice); library(latticeExtra)

useOuterStrips(dotplot(z ~ sample.size |
as.factor(effect.size)*as.factor(true.dose),
   groups=as.factor(type), data=df.dose, as.table=TRUE))

I'd be glad for any kind of help!
Andres



-
Andres
--
View this message in context: 
http://r.789695.n4.nabble.com/dotplot-in-R-with-lattice-latticeExtra-proper-visualization-tp4645850.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] own function: computing time

2012-10-11 Thread Tonja Krueger


   That's perfect, thanks a lot!
   Tonja
   Gesendet: Mittwoch, 10. Oktober 2012 um 21:37 Uhr
   Von: William Dunlap wdun...@tibco.com
   An: tonja.krue...@web.de tonja.krue...@web.de, r-help@r-project.org
   r-help@r-project.org
   Betreff: RE: [R] own function: computing time
   Your original method would be the following function
   f - function (x, y)
   {
   xy - cbind(x, y)
   outside - function(z) {
   !any(x  z[1]  y  z[2])
   }
   j - apply(xy, 1, outside)
   which(j)
   }
   and the following one quickly computes the same thing as the above
   as long as there are no repeated points (if there are repeated
   points it chooses one of them).
   f1 - function (x, y)
   {
   o - order(x, decreasing = TRUE)
   yo - y[o]
   j - logical(length(y))
   j[o] - yo == cummax(yo)
   which(j)
   }
   Think of the problem as finding the ladder points (Feller's term)
   of a sequence of points, the places where the sequence reaches
   a new high point.
   Bill Dunlap
   Spotfire, TIBCO Software
   wdunlap tibco.com
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
   On Behalf
Of William Dunlap
Sent: Wednesday, October 10, 2012 9:52 AM
To: tonja.krue...@web.de; r-help@r-project.org
Subject: Re: [R] own function: computing time
   
No, the desired points are not a subset of the convex hull.
E.g., x=c(0,1:5), y=c(0,1/(1:5)).
   
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
   
   
 -Original Message-
 From: William Dunlap
 Sent: Wednesday, October 10, 2012 9:46 AM
 To: 'tonja.krue...@web.de'; r-help@r-project.org
 Subject: RE: [R] own function: computing time

 Are the points you are looking for (those data points with no other data
 points above or to the right of them) a subset of the convex hull of the
 data points? If so, chull(x,y) can quickly give you the points on the
   convex
 hull (typically a fairly small number) and you can look through them for
 the ones you want.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


  -Original Message-
  From: r-help-boun...@r-project.org
   [mailto:r-help-boun...@r-project.org] On
Behalf
  Of tonja.krue...@web.de
  Sent: Wednesday, October 10, 2012 3:16 AM
  To: r-help@r-project.org
  Subject: [R] own function: computing time
 
  Hi all,
 
  I wrote a function that actually does what I want it to do, but it
   tends to be very slow
 for
  large amount of data. On my computer it takes 5.37 seconds for 16000
   data points
and
  21.95 seconds for 32000 data points. As my real data consists of
   1800 data
points
 it
  would take ages to use the function as it is now.
  Could someone help me to speed up the calculation?
 
  Thank you, Tonja
 
  system.time({
  x - runif(32000)
  y - runif(32000)
 
  xy - cbind(x,y)
 
  outer - function(z){
  !any(x  z[1]  y  z[2])}
  j - apply(xy,1, outer)
 
  plot(x,y)
  points(x[j],y[j],col=green)
 
  })
 
  __
  R-help@r-project.org mailing list
  [1]https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
   [2]http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
   
__
R-help@r-project.org mailing list
[3]https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
   [4]http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

References

   1. https://stat.ethz.ch/mailman/listinfo/r-help
   2. http://www.R-project.org/posting-guide.html
   3. https://stat.ethz.ch/mailman/listinfo/r-help
   4. http://www.R-project.org/posting-guide.html
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names


Hello,

If that is the problem now, then change the variables' names.
In what follows, the first line is just the example you gave. In the 
actual runnunig code uncomment the commented out lines.


vars -  c(red_apple_pre, post_banana_organic)
#vars - names(dat)
vars - gsub(_pre, =pre, vars)
vars - gsub(_post, =post, vars)
vars - gsub(pre_, pre=, vars)
vars - gsub(post_, post=, vars)
vars - gsub(_, \\., vars)
vars - sub(=, _, vars)
#names(dat) - vars

Rui Barradas
Em 11-10-2012 15:17, Nundy, Shantanu escreveu:

Actually, I see now that part of the problem is that many of the names have multiple underscores 
such as red_apple_pre or post_banana_organic. I think this is causing a 
problem for this line in your code:

vmat - do.call(rbind, strsplit(vars, _))

Shantanu




From: Nundy, Shantanu
Sent: Thursday, October 11, 2012 9:07 AM
To: Rui Barradas
Subject: RE: [R] multiple t-tests across similar variable names

Rui,
Thank you so much for your solution. It is exactly what I was struggling with!

One small question. When I ran the code on my actual dataset I got the error 
below:


vars - names(master)
vmat - do.call(rbind, strsplit(vars, _))

Warning message:
In function (..., deparse.level = 1)  :
   number of columns of result is not a multiple of vector length (arg 1)

My guess is that the problem is not all the variables have pre or post in them. Some of the 
variables are constants that I will not do a paired t-test on. What would be the easiest way to get around this, 
perhaps even by simply removing all of the variables that have neither pre or post in them?

Thanks again,
Shantanu








From: arun [smartpink...@yahoo.com]
Sent: Thursday, October 11, 2012 8:50 AM
To: Rui Barradas
Cc: Nundy, Shantanu
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

  Thanks for testing the code. I will look into it later.
A.K.




- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu
Cc: R help r-help@r-project.org
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have 
changed the names of two of the variables, to allow for 'pre' and 'post' to be 
first in the names.

# auxiliary functions
ifswap - function(x)
 if(x[1] %in% c(pre, post)) x[2:1] else x

getpair - function(i, post)
 post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine - function(h)
 c(MeanDiff = unname(h$estimate),
 CIlower = h$conf.int[1],
 CIupper = h$conf.int[2],
 p.value = h$p.value)

doTests - function(DF, Pairs){
 t.list - lapply( seq_len(nrow(Pairs)), function(i)
 t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
 do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE),
 orange_post = sample(18:28,5,replace=TRUE),
 pre_banana = sample(25:35,5,replace=TRUE),  # here
 apple_post = sample(20:30,5,replace=TRUE),
 post_banana = sample(40:50,5,replace=TRUE), # and here
 orange_pre = sample(5:10,5,replace=TRUE))


#
# start processing the data.frame
# Make pairs of pre/post columns
vars - names(dat2)
vmat - do.call(rbind, strsplit(vars, _))
vmat - t(apply(vmat, 1, ifswap))
pre - which(vmat[, 2] == pre)
post - which(vmat[, 2] == post)
post - sapply(pre, getpair, post)
pairs - matrix(c(pre, post), ncol = 2)

# now the tests
result - doTests(dat2, pairs)
rownames(result) - vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of 
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:

HI,

If you have a lot of variables and in no order, then it would be better to 
order the data by column names.
For e.g.
set.seed(432)
dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
dat3-dat2[order(colnames(dat2))] #order the columns
list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
res3-do.call(rbind,lapply(lapply(list3,function(x) 
t.test(x[,1],x[,2],paired=TRUE)),function(x) 
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1)))
res3
# meandifference CIlow   CIhigh  p.value
#apple12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166

Re: [R] Options to extend memory limit

2012-10-11 Thread Sebastian P. Luque

On Thu, 11 Oct 2012 14:45:16 +0200,
jennifer.moeller-gull...@de.pwc.com wrote:

 Dear All, at the moment I am using R for calculations of large
 databases.  Unfortunately, R only manages to complete certain
 operations at some times, and not at others. I usually get the error
 message cannot allocate vector of size XX

 I am using the 64-bit version with Windows 7. While my computer has 8
 RAM, I do have a feeling that R cannot use all of it.

 Searching online, I found that you can increase the memory with the
 options --max-mem-size/ --max-ppsize or change the environment
 variable R_MAX_MEM_SIZE to allow deep recursion or large and
 complicated calculations to be done

 Unfortunately, I am not very knowledgable yet on how to use R and I
 did not quite manage to use the commands successfully. Could you
 please tell me whether these do make sense for my case and if so how
 (and at what stage of the process) I can use them?

Are you sure you're using the 64 bit R executable which comes with the R
installation?

-- 
Seb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names


Hello,

Em 11-10-2012 15:14, arun escreveu:

HI Rui,

By running your code, I got the results as:
result
#   MeanDiff   CIlowerCIupper  p.value
#apple -12.6 -16.68052  -8.519476 0.0010166626
#banana-15.0 -17.91196 -12.088040 0.0001388506
#orange-18.2 -22.79583 -13.604166 0.0003888560

 From my code:
res3
#   meandifference CIlow   CIhigh  p.value
#apple12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166 22.79583 0.0003888560

There is difference in signs.


Mistery solved.

Rui Barradas

A.K.




- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu
Cc: R help r-help@r-project.org
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have 
changed the names of two of the variables, to allow for 'pre' and 'post' to be 
first in the names.

# auxiliary functions
ifswap - function(x)
 if(x[1] %in% c(pre, post)) x[2:1] else x

getpair - function(i, post)
 post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine - function(h)
 c(MeanDiff = unname(h$estimate),
 CIlower = h$conf.int[1],
 CIupper = h$conf.int[2],
 p.value = h$p.value)

doTests - function(DF, Pairs){
 t.list - lapply( seq_len(nrow(Pairs)), function(i)
 t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
 do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE),
 orange_post = sample(18:28,5,replace=TRUE),
 pre_banana = sample(25:35,5,replace=TRUE),  # here
 apple_post = sample(20:30,5,replace=TRUE),
 post_banana = sample(40:50,5,replace=TRUE), # and here
 orange_pre = sample(5:10,5,replace=TRUE))


#
# start processing the data.frame
# Make pairs of pre/post columns
vars - names(dat2)
vmat - do.call(rbind, strsplit(vars, _))
vmat - t(apply(vmat, 1, ifswap))
pre - which(vmat[, 2] == pre)
post - which(vmat[, 2] == post)
post - sapply(pre, getpair, post)
pairs - matrix(c(pre, post), ncol = 2)

# now the tests
result - doTests(dat2, pairs)
rownames(result) - vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of 
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:

HI,

If you have a lot of variables and in no order, then it would be better to 
order the data by column names.
For e.g.
set.seed(432)
dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
dat3-dat2[order(colnames(dat2))] #order the columns
list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
res3-do.call(rbind,lapply(lapply(list3,function(x) 
t.test(x[,1],x[,2],paired=TRUE)),function(x) 
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res3)-unlist(unique(lapply(strsplit(colnames(dat3),_),`[`,1)))
res3
# meandifference CIlow   CIhigh  p.value
#apple12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166 22.79583 0.0003888560

A.K.



- Original Message -
From: Nundy, Shantanu snu...@chicagobooth.edu
To: r-help@r-project.org r-help@r-project.org
Cc:
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple pre and post variables I want to compare. The variables are named 
apple_pre or pre_banana with the corresponding post variables named apple_post or 
post_banana. The variables are in no particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana 
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference, 
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Options to extend memory limit

2012-10-11 Thread Marc Schwartz


On Oct 11, 2012, at 9:55 AM, Sebastian P. Luque splu...@gmail.com wrote:

 On Thu, 11 Oct 2012 14:45:16 +0200,
 jennifer.moeller-gull...@de.pwc.com wrote:
 
 Dear All, at the moment I am using R for calculations of large
 databases.  Unfortunately, R only manages to complete certain
 operations at some times, and not at others. I usually get the error
 message cannot allocate vector of size XX
 
 I am using the 64-bit version with Windows 7. While my computer has 8
 RAM, I do have a feeling that R cannot use all of it.
 
 Searching online, I found that you can increase the memory with the
 options --max-mem-size/ --max-ppsize or change the environment
 variable R_MAX_MEM_SIZE to allow deep recursion or large and
 complicated calculations to be done
 
 Unfortunately, I am not very knowledgable yet on how to use R and I
 did not quite manage to use the commands successfully. Could you
 please tell me whether these do make sense for my case and if so how
 (and at what stage of the process) I can use them?
 
 Are you sure you're using the 64 bit R executable which comes with the R
 installation?

Sebastian hit on my initial thought here, though depending upon how much data 
you are dealing with, 8Gb may indeed not be enough and some of your RAM may be 
used by other processes/applications, leaving less for R.

A quick check to see which version you are running is to use:

  .Machine$sizeof.pointer

If it returns 8, you are using the 64 bit version of R. If it comes back with 
4, you are using the 32 bit version of R, which of course will be more limited 
in how much RAM it can access.

If it returns 8, then as Ben noted, you may want to evaluate some of the Large 
Memory options on the HPC task view:

  http://cran.r-project.org/web/views/HighPerformanceComputing.html

or of course install more RAM.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dotplot in .R with lattice latticeExtra: proper visualization

2012-10-11 Thread David Winsemius


On Oct 11, 2012, at 6:48 AM, Andres LaCortadora wrote:

 Dear everyone,
 
 I'm trying to do a dotplot with the libraries lattice and latticeExtra.
 However, no proper representation of the values on the vertical y-axis is
 done by .R. Instead of choosing the actual values of the numeric variable,
 .R plots the rank of the value. That is, there are values [375, 500, 625,
 750, ..., 3000] and .R plots their ranks [1,2,3,4,...23] and chooses the
 scale accordingly. Has someone experienced a problem like this? How can I
 manage the get a proper representation with ticks like (0, 500, 1000, 1500,
 ...) on the vertical y-scale?
 

I suspect it will be difficult with dotplot. It is expecting a factor variable 
for the y-value and appears to be coercing the LHS argument to one. Why not use 
xyplot if you are plotting numeric by numeric? If you what to add horizontal 
lines `ala` dotplot you could construct a panel that had the appropriate 
commands.
-- 
David.

 Here's my data:
 https://www.dropbox.com/s/egy25cj00rhum40/data.csv
 
 And here the program code so far:
 
 df.dose - read.table(data.csv, sep=,, header=TRUE)
 library(lattice); library(latticeExtra)
 
 useOuterStrips(dotplot(z ~ sample.size |
 as.factor(effect.size)*as.factor(true.dose),
  groups=as.factor(type), data=df.dose, as.table=TRUE))
 
 I'd be glad for any kind of help!
 Andres
 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optim and nlminb

2012-10-11 Thread John C Nash

It appears you are using the approach throw every method at a problem and 
select the
answer you like. I use this quite a lot with optimx to see just what disasters 
I can
create, but I do so to see if the software will return sensible error messages.

You will have to provide a reproducible example if you want useful answers from 
this list
(as per posting guide). Optimization tools are like F1 racing cars -- many 
controls and
settings, with lots of power but difficulties in controlling it. Their users -- 
even if
well-qualified in other areas -- are unfortunately often those who have trouble 
riding a
bicycle with just one speed. There is a serious and quite involved learning 
curve.

Previously you tried optimx, but seem to have misunderstood or disregarded the 
answers. It
is quite likely the problem you are sending to the optimizers is ill-posed or 
plain wrong.
Certainly it does not have a gradient function, which is almost always a good 
idea. If you
prepare a reproducible example that can be run by readers of the list you will
  a) discover what is wrong as you prepare it, or
  b) be able to submit and very likely get useful help.

Indeed in several years on the list, I've never seen a query with a short, 
testable case
fail to get an answer very quickly.

JN


On 10/11/2012 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 92
 Date: Wed, 10 Oct 2012 13:16:38 -0700 (PDT)
 From: nserdar snes1...@hotmail.com
 To: r-help@r-project.org
 Subject: [R] optim and nlminb
 Message-ID: 1349900198210-4645772.p...@n4.nabble.com
 Content-Type: text/plain; charset=us-ascii
 
 
 #optim package
 estimate-optim(init.par,Linn,hessian=TRUE, method=c(L-BFGS-B),control =
 list(trace=1,abstol=0.001),lower=c(0,0,0,0,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf),upper=c(1,1,1,1,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf))
 
 #nlminb package
 estimate-nlminb(init.par,Linn,gr=NULL,hessian=TRUE,control =
 list(trace=1,factr=1),lower=c(0,0,0,0,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf),upper=c(1,1,1,1,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf))
 
 I did not get same results from above equations. Log-likelihood values are
 close but parameter estimation completely different. 
 
 My expectation is very close to nlminb packages.
 
 Do you have any idea and suggestion between packages?
 
 Regards,
 Serdar 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Formatting data for bootstrapping for confidence intervals

2012-10-11 Thread Paul Wennekes

Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset events
that looks something like this: 

AreaNAMEDATEX   Xn  Y
1   X   1/10/10 1   1   0
1   Y   1/11/10 0   0   1
1   X   1/12/10 1   0   0
1   X   1/12/10 1   0   0
1   X   1/12/10 1   0   0
2   X   2/12/10 1   1   0
2   X   2/12/10 1   0   0
2   Y   2/12/10 0   0   1
2   X   2/13/10 1   0   0
2   X   2/13/10 1   0   0
2   X   2/13/10 1   0   0
2   X   2/14/10 1   0   0
2   X   2/14/10 1   0   0
2   X   2/14/10 1   1   0
2   X   2/14/10 1   0   0
3   X   7/27/11 1   0   0
3   X   7/27/11 1   1   0
3   X   7/27/11 1   0   0
3   X   7/28/11 1   0   0
3   X   7/28/11 1   1   0
3   X   7/28/11 1   0   0
3   X   7/28/11 1   0   0
3   Y   7/28/11 0   0   1
3   X   7/28/11 1   0   0
3   X   7/28/11 1   1   0
3   Y   7/28/11 0   0   1
3   X   7/28/11 1   0   0
3   X   7/29/11 1   0   0
3   X   7/29/11 1   0   0
3   X   7/29/11 1   1   0

X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this: 

AreaDATEX   Xn  Y
1   1/10/10 1   1   0
1   1/11/10 0   0   1
1   1/12/10 3   0   0
2   2/12/10 2   1   1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over parts of
the columns as defined by the DATE values...

Many thanks ahead!



--
View this message in context: 
http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names

Hi Shantanu,

I guess the below code should solve both the issues:

set.seed(432)
dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
 colnames(dat2)-gsub(^pre\\_(.*),\\1_pre,gsub(^post\\_(.*),\\1_post,colnames(dat2)))
dat3-t(dat2[order(colnames(dat2))])
dat3-data.frame(varName=gsub((.*)\\_.*,\\1,row.names(dat3)),dat3)
list3-lapply(split(dat3,dat3$varName),function(x) t(x[-1]))
res3-do.call(rbind,lapply(lapply(list3,function(x) 
t.test(x[,1],x[,2],paired=TRUE)),function(x) 
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
res3
#  meandifference CIlow   CIhigh  p.value
#apple    12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166 22.79583 0.0003888560
A.K.




- Original Message -
From: Nundy, Shantanu snu...@chicagobooth.edu
To: arun smartpink...@yahoo.com
Cc: 
Sent: Thursday, October 11, 2012 10:22 AM
Subject: RE: [R] multiple t-tests across similar variable names

hi Arun,
This is very helpful thanks. 

I'm running into a couple issues:
1. Since some of the variables start with pre_apple and others apple_post 
sorting the variables doesn't completely put pre-post variables next to each 
other.
2. I have about 50 variables so typing this line is a bit cumbersome:

 list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])

Thanks,
Shantanu


From: arun [smartpink...@yahoo.com]
Sent: Thursday, October 11, 2012 9:14 AM
To: Rui Barradas
Cc: Nundy, Shantanu; R help
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

By running your code, I got the results as:
result
#       MeanDiff   CIlower    CIupper      p.value
#apple     -12.6 -16.68052  -8.519476 0.0010166626
#banana    -15.0 -17.91196 -12.088040 0.0001388506
#orange    -18.2 -22.79583 -13.604166 0.0003888560

From my code:
res3
#       meandifference     CIlow   CIhigh      p.value
#apple            12.6  8.519476 16.68052 0.0010166626
#banana           15.0 12.088040 17.91196 0.0001388506
#orange           18.2 13.604166 22.79583 0.0003888560

There is difference in signs.
A.K.




- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu
Cc: R help r-help@r-project.org
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have 
changed the names of two of the variables, to allow for 'pre' and 'post' to be 
first in the names.

# auxiliary functions
ifswap - function(x)
    if(x[1] %in% c(pre, post)) x[2:1] else x

getpair - function(i, post)
    post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine - function(h)
    c(MeanDiff = unname(h$estimate),
        CIlower = h$conf.int[1],
        CIupper = h$conf.int[2],
        p.value = h$p.value)

doTests - function(DF, Pairs){
    t.list - lapply( seq_len(nrow(Pairs)), function(i)
        t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
    do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE),
            orange_post = sample(18:28,5,replace=TRUE),
            pre_banana = sample(25:35,5,replace=TRUE),  # here
            apple_post = sample(20:30,5,replace=TRUE),
            post_banana = sample(40:50,5,replace=TRUE), # and here
            orange_pre = sample(5:10,5,replace=TRUE))


#
# start processing the data.frame
# Make pairs of pre/post columns
vars - names(dat2)
vmat - do.call(rbind, strsplit(vars, _))
vmat - t(apply(vmat, 1, ifswap))
pre - which(vmat[, 2] == pre)
post - which(vmat[, 2] == post)
post - sapply(pre, getpair, post)
pairs - matrix(c(pre, post), ncol = 2)

# now the tests
result - doTests(dat2, pairs)
rownames(result) - vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of 
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:
 HI,

 If you have a lot of variables and in no order, then it would be better to 
 order the data by column names.
 For e.g.
 set.seed(432)
 dat2-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
 dat3-dat2[order(colnames(dat2))] #order the columns
 list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
 res3-do.call(rbind,lapply(lapply(list3,function(x)

[R] plots for presentation

2012-10-11 Thread mamush bukana

Dear users,
I am preparing a presentation in latex(beamer) . I would like to show parts
of my plots per click. Example, consider I have two time series x and y:

x-ts(rnorm(100), start=1900,end=1999)
y-ts(rnorm(100), start=1900,end=1999)
plot(x)
lines(y,col=2)

Then I imported this plot into latex as .eps file. My question is, how
can i show plot of each time series separately in sequence (one after the
other). An also I want to show parts of the plots at different time
segments in my presentation. To be honest, I don't know if these features
are in R or in latex.

Thanks in advance

M.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sorting a data frame by specifying a vector

2012-10-11 Thread LCOG1

Hello all, 
   I cannot seem to figure out this seemingly simple procedure.  

I want to sort a data frame by a specified character vector.

So for :

df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

I want to sort the data frame by the seasons but in the order I specify
since alphapetically would not put the season in sequential order

I tried the following and a few other things but no dice.  It looks like I
will have to convert to factors.  Any thoughts?  Thanks

df.. -
df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),]

Josh



--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optim and nlminb

2012-10-11 Thread Spencer Graves


a fortune?


On 10/11/2012 9:56 AM, John C Nash wrote:


snip



Indeed in several years on the list, I've never seen a query with a short, 
testable case
fail to get an answer very quickly.

JN





--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting a data frame by specifying a vector

?order
df[order(yourcolumn, ]

-- Bert


On Thu, Oct 11, 2012 at 10:08 AM, LCOG1 jr...@lcog.org wrote:
 Hello all,
I cannot seem to figure out this seemingly simple procedure.

 I want to sort a data frame by a specified character vector.

 So for :

 df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
 runif(length(rep(c(Summer,Fall,Winter,Spring),4

 I want to sort the data frame by the seasons but in the order I specify
 since alphapetically would not put the season in sequential order

 I tried the following and a few other things but no dice.  It looks like I
 will have to convert to factors.  Any thoughts?  Thanks

 df.. -
 df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),]

 Josh



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help on probability distribution question

2012-10-11 Thread Andras Farkas

Dear All,
Â 
I haveÂ a questions I would like to ask about and wonder if you have any 
thoughts to make it work in R.
Â 
1. I work in the field of medicine where physiologic variables are often 
simulated, and they can not have negative values. Most often the assumption is 
made to simulate this parameters with a normal distribution but in the 
log-domain to avoid from negative values to be generated. Since the expected 
meanÂ  and SD is usually known from the normal domain, using the methods 
described in the wikipedia article Arithmetric moments I generate Î¼and Ï 
and simulate with rlnorm(). At timesÂ though the following issue comes up: I 
have the mean and SD for the parameters available from the normal domain, and 
the covariance matrix from the normal domain. Then I would like to simulate the 
values, but to avoid from negative values being generated I have to fall back 
on rlnorm in {compositions}. My issue is though that my covariance matrix is 
representing the covariance of the parameters in the normal domain, as opposed 
to in the lognormal domain. Any thoughts on how
 to work around this?
Â 
apreciate the help,
Â 
Andras
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Expected number of events, Andersen-Gill model fit via coxph in package survival

2012-10-11 Thread Omar De la Cruz C.

Thank you, Dr. Therneau, that was very helpful.

Best regards,

Omar.


On Mon, Oct 8, 2012 at 9:58 AM, Terry Therneau thern...@mayo.edu wrote:

 I am interested in producing the expected number of events, in a
 recurring events setting. I am using the Andersen-Gill model, as fit
 by the function coxph in the package survival.

 I need to produce expected numbers of events for a cohort,
 cumulatively, at several fixed times. My ultimate goal is: To fit an
 AG model to a reference sample, then use that fitted model to generate
 expected numbers of events for a new cohort; then, comparing the
 expected vs. the observed numbers of events would give us some idea of
 whether the new cohort differs from the reference one.

 From my reading of the documentation and the text by Therneau and

 Grambsch, it seems that the function survexp is what I need. But
 using it I am not able to obtain expected numbers of events that match
 reasonably well the observed numbers *even for the same reference
 population.* So, I think I am misunderstanding something quite badly.


  You've hit a common confusion.  Observed versus expected events
 computations are done on a cumulative hazard scale H, not the surivival
 scale S; S = exp(-H).  Relating this back to simple Poisson models H(t)
 would be the expected number of events by time t and S(t) the probability of
 no events before time t.  G. Berry (Biometrics 1983) has a classic ane
 readable article on this (especially if you ignore the proofs).

   Using your example:

 cphfit -
 coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
 zz - predict(cphfit, type='expected')
 c(sum(zz), sum(bladder2$event))
 [1] 112 112

 tdata - bladder2[1:10]   #new data set (lazy way)
 predict(cphfit, type='expected', newdata=tdata)
  [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665
  [8] 0.8864808 0.2932915 0.5190647


  You can also do this using survexp and the cohort=FALSE argument, which
 would return S(t) for each subject and we would then use -log(result) to get
 H.  This is how it was done when I wrote the book, but the newer predict
 function is easier.

 Terry Therneau



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting a data frame by specifying a vector

Hi,
In your dataset, it seems like it is already ordered in the way you wanted to.
df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

#Suppose the order you want is:

 vec2-c(Summer,Winter,Fall,Spring)
df1-df..[match(df..$Season,vec2),]
 row.names(df1)-1:nrow(df1)
 df1
#   Season   Obs
#1  Summer 0.2141001
#2  Winter 0.9318599
#3    Fall 0.6722337
#4  Spring 0.1927715
#5  Summer 0.2141001
#6  Winter 0.9318599
#7    Fall 0.6722337
#8  Spring 0.1927715
#9  Summer 0.2141001
#10 Winter 0.9318599
#11   Fall 0.6722337
#12 Spring 0.1927715
#13 Summer 0.2141001
#14 Winter 0.9318599
#15   Fall 0.6722337
#16 Spring 0.1927715


A.K.

- Original Message -
From: LCOG1 jr...@lcog.org
To: r-help@r-project.org
Cc: 
Sent: Thursday, October 11, 2012 1:08 PM
Subject: [R] Sorting a data frame by specifying a vector

Hello all, 
   I cannot seem to figure out this seemingly simple procedure.  

I want to sort a data frame by a specified character vector.

So for :

df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

I want to sort the data frame by the seasons but in the order I specify
since alphapetically would not put the season in sequential order

I tried the following and a few other things but no dice.  It looks like I
will have to convert to factors.  Any thoughts?  Thanks

df.. -
df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),]

Josh



--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on probability distribution question

2012-10-11 Thread Ted Harding

On 11-Oct-2012 17:22:44 Andras Farkas wrote:
 Dear All,
 I have a questions I would like to ask about and wonder if you
 have any thoughts to make it work in R.
 
 1. I work in the field of medicine where physiologic variables
 are often simulated, and they can not have negative values.
 Most often the assumption is made to simulate this parameters
 with a normal distribution but in the log-domain to avoid from
 negative values to be generated. Since the expected mean and SD
 is usually known from the normal domain, using the methods described
 in the wikipedia article Arithmetric moments I generate Î¼and Ï
 and simulate with rlnorm(). At times though the following issue
 comes up: I have the mean and SD for the parameters available
 from the normal domain, and the covariance matrix from the normal
 domain. Then I would like to simulate the values, but to avoid
 from negative values being generated I have to fall back on rlnorm
 in {compositions}. My issue is though that my covariance matrix is
 representing the covariance of the parameters in the normal domain,
 as opposed to in the lognormal domain. Any thoughts on how  to work
 around this?
 
 apreciate the help,
 Andras

If I understand your question correctly, if Y is the variable being
simulated then you know the mean (M, say) and the variance (V, say)
of log(Y). So you can simulate X from a normal distribution with
mean M and variance V = S^2 (S = SD of X), and then Y = exp(X):

  Y - exp(rnorm(n,M,S))

where n is the number of sampled values you want.

When Y is multivariate, with M the vector of means and V the
covariance matrix of log(Y), then use a similar approach with
the function mvrnorm() from the MASS package:

  library(MASS)
  Y - mvrnorm(n,M,V)

Does this help?
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Oct-2012  Time: 18:51:47
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing ugly for loops

Sorry, you **did** supply data and my solution **does** work (except I
left off 1 closing ) .

 sq.n - seq_len(nrow(data.df))
 tapply(sq.n,data.df$seq,function(x)with(data.df[x,],
+ sort(unique(do.call(c,mapply(seq,from=startNo,length=len,SIMPLIFY=FALSE))
$`1`
[1]  3  4  5  6 10 11

$`2`
[1]  3  4  5  6  7 15 16 17

Cheers,
Bert


On Wed, Oct 10, 2012 at 10:59 PM, Bert Gunter bgun...@gene.com wrote:
 I am not sure you have expressed what you wanjt to do correctly. See inline:

 On Wed, Oct 10, 2012 at 9:10 PM, andrewH ahoer...@rprogress.org wrote:
 I have a couple of hundred American Community Survey Summary Files files
 containing rectangular arrays of data, mainly though not exclusively
 numeric.  Each file is referred to as a sequence (henceforth seq).
 -- so 1 seq (terrible identifier -- see below for why) = 1 file

  From
 these files I am trying to extract particular subsets (tables) consisting of
 a sets of columns.  These tables are defined by three numbers (now in
 columns in a data frame):
 1.  a file identifier (seq)
 2.  first column position numbers (startNo)
 3.  length of table (len)

 So your data frame, call it yourframe, has columns named:

 seq  startNo   len


 so the columns to select for one triple would consist of
 startNo:(startNo+length-1).   I am trying to create for each sequence a
 vector of all the column numbers for tables in that sequence.

 So for each seq id you want to find all the column numbers, right?

 sq.n - seq_len(nrow(yourframe)) ## Just to make it easier to read
 colms -  tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,],
sort(unique(do.call(c, mapply(seq, from=startNo,
 length=len,SIMPLIFY = FALSE)

 ## Comments
 In the mapply call, seq is the R function, ?seq.  That's why using it
 as a name for a file id is terrible -- it causes confusion.

 In the absence of data, this is untested -- and probably not quite
 right. But it should be close, I hope. The key idea is the use of
 mapply to get the sequence of columns for each row in all the rows for
 each seq id. The SIMPLIFY = FALSE guarantees that this yields a list
 of vectors of column indices, which are then glopped together and
 cleaned up by the sort(unique(do.call(  ...  stuff.

 colms should then be a list giving the sorted column numbers to choose
 for each seq id.

 I do not know whether (once cleaned up,) this is either more elegant
 or more efficient than what you proposed. And I wouldn't be surprised
 if someone like Bill Dunlap comes up with a lot better way, either.
 But it is different -- and perhaps amusing.

 ... If I have properly understood what you wanted. If not, ignore all.

 Cheers,
 Bert


 Obviously I could do this with nested for loops,e.g..

 seq - c(1,1,2,2)
 startNo  - c(3, 10, 3, 15)
 len - c(4, 2, 5, 3)
 data.df - data.frame(seq, startNo, len)

 seq.f - factor(data.df$seq)
 data.l - split(data.df, seq.f)
 selectColsList- vector(list, length(levels(seq.f)))
 for (i in seq_along(levels(seq.f))){
selectCols - numeric()
for (j in seq_along(data.l[[i]]$startNo)){
selectCols - c(selectCols,
 data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j]
data.l[[i]]$len[j]-1))
 }
 selectColsList[[i]] - selectCols
 }
 selectColsList
 [[1]]
 [1]  3  4  5  6 10 11
 [[2]]
 [1]  3  4  5  6  7 15 16 17

 But this code strikes me as inelegant and verbose. It seems to me that there
 ought to be a way to make the outer loop, (indexed with i) into a tapply
 function (which is why I started with a split()), and the inner loop
 (indexed with j) into some cute recursive function, but I was not able to do
 so. If anyone could suggest some nicer (e.g. shorter, or faster, or just
 more sophisticated) way to do this instead, I would be most grateful.

 Sincerely, andrewH




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting a data frame by specifying a vector

2012-10-11 Thread ROLL Josh F

Sorry if I wasn't clear but the result I am looking for is as follows
#   Season   Obs
#1  Summer 0.2141001
#5  Summer 0.2141001
#9  Summer 0.2141001
#13 Summer 0.2141001
#3    Fall 0.6722337
#7    Fall 0.6722337
#11   Fall 0.6722337
#15   Fall 0.6722337
#2  Winter 0.9318599
#6  Winter 0.9318599
#10 Winter 0.9318599
#14 Winter 0.9318599
#4  Spring 0.1927715
#8  Spring 0.1927715
#12 Spring 0.1927715
#16 Spring 0.1927715

The process you describe does not get me there

Any other recommendations?

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Thursday, October 11, 2012 10:33 AM
To: ROLL Josh F
Cc: R help
Subject: Re: [R] Sorting a data frame by specifying a vector

Hi,
In your dataset, it seems like it is already ordered in the way you wanted to.
df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

#Suppose the order you want is:

 vec2-c(Summer,Winter,Fall,Spring)
df1-df..[match(df..$Season,vec2),]
 row.names(df1)-1:nrow(df1)
 df1
#   Season   Obs
#1  Summer 0.2141001
#2  Winter 0.9318599
#3    Fall 0.6722337
#4  Spring 0.1927715
#5  Summer 0.2141001
#6  Winter 0.9318599
#7    Fall 0.6722337
#8  Spring 0.1927715
#9  Summer 0.2141001
#10 Winter 0.9318599
#11   Fall 0.6722337
#12 Spring 0.1927715
#13 Summer 0.2141001
#14 Winter 0.9318599
#15   Fall 0.6722337
#16 Spring 0.1927715


A.K.

- Original Message -
From: LCOG1 jr...@lcog.org
To: r-help@r-project.org
Cc: 
Sent: Thursday, October 11, 2012 1:08 PM
Subject: [R] Sorting a data frame by specifying a vector

Hello all,
   I cannot seem to figure out this seemingly simple procedure.  

I want to sort a data frame by a specified character vector.

So for :

df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

I want to sort the data frame by the seasons but in the order I specify since 
alphapetically would not put the season in sequential order

I tried the following and a few other things but no dice.  It looks like I will 
have to convert to factors.  Any thoughts?  Thanks

df.. -
df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),]

Josh



--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plots for presentation

2012-10-11 Thread Duncan Murdoch


On 11/10/2012 1:08 PM, mamush bukana wrote:

Dear users,
I am preparing a presentation in latex(beamer) . I would like to show parts
of my plots per click. Example, consider I have two time series x and y:

x-ts(rnorm(100), start=1900,end=1999)
y-ts(rnorm(100), start=1900,end=1999)
plot(x)
lines(y,col=2)

Then I imported this plot into latex as .eps file. My question is, how
can i show plot of each time series separately in sequence (one after the
other). An also I want to show parts of the plots at different time
segments in my presentation. To be honest, I don't know if these features
are in R or in latex.


Mostly Latex/Beamer.  Draw the two versions of the plot, and tell beamer 
to show the first one only on overlay 1, the second only on overlay 2.


This is particularly easy using Sweave, because you can save the code 
that drew the first plot and re-use it in the second.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] survey package question

2012-10-11 Thread Sebastián Daza

Hello,

I have got a cluster sample using an election dataset where I already
had the final results of a county-specific election. I am trying to
figure out what would be the best sampling design for my data.

The  structure of the dataset is:

1) polling station (in general schools where people vote, for a
county, for example, there are 15 polling stations)
2) inside each polling station, there are voting units, where people
actually vote (on average there are about 40 voting units for polling
station)
3) for each voting unit I have the total votes by candidate (e.g.,
candidate 1 =322, candidate 2=122, candidate 3= 89)

The initial sampling design is:
1) selection of 5 polling stations PPS (based on number of voters)
2) selection of 10 voting units (SRS)

I am interested in estimating the proportion of votes by candidate
(let's assume we have 3 candidates). My naive estimate would be:

votes for candidate 1 / all valid votes = proportion

e.g.

candidate 1= 2132 / 10874= .1906
candidate 2= 5323 / 10874= .4895
candidate 3= 3419 / 10874= .3144

In this case, the unit of analysis is voters (or votes).

 If I specify the sampling design using the survey package in this way...

design -svydesign(id=~station + unit  fpc=~probstation +probunit,
data=sample, pps=brewer)

svyciprop(~I(candidate1/totalVotes), design)

... I am assuming that the unit of analysis is the voting unit, right?
and I am estimating an average among voting units?

I should expand my database at individual level (voters), or I just
have to include a unit weight according to the number of voters for
voting unit? In other words, is there a way to estimate, for instance,
votes for candidate 1 / all valid votes = proportion, directly from
the survey package or I have to expand  the database at people level
(voters), and then estimate the proportion using svymean and the
respective design.

I would appreciate any advice or help.

Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting a data frame by specifying a vector

On Thu, Oct 11, 2012 at 10:43 AM, ROLL Josh F jr...@lcog.org wrote:
 Sorry if I wasn't clear.
Actually, my bad -- I didn't read carefully enough.

But the answer is still essentially correct -- just change the
ordering of the levels of Season, which, by default, is alphabetic.

df$Season - factor(df$Season, lev = c(Summer,Fall,Winter,Spring))

df - df[order(df$Season),]

Learn about factors (Read the Intro to R tutorial if you haven't
already). They are very handy (and much despised by some).

-- Bert


The result I am looking for would be something like:

 #   Season   Obs
 #1  Summer 0.2141001
 #5  Summer 0.2141001
 #9  Summer 0.2141001
 #13 Summer 0.2141001
 #3Fall 0.6722337
 #7Fall 0.6722337
 #11   Fall 0.6722337
 #15   Fall 0.6722337
 #2  Winter 0.9318599
 #6  Winter 0.9318599
 #10 Winter 0.9318599
 #14 Winter 0.9318599
 #4  Spring 0.1927715
 #8  Spring 0.1927715
 #12 Spring 0.1927715
 #16 Spring 0.1927715

 Any other thoughts?

 JR


 -Original Message-
 From: Bert Gunter [mailto:gunter.ber...@gene.com]
 Sent: Thursday, October 11, 2012 10:19 AM
 To: ROLL Josh F
 Cc: r-help@r-project.org
 Subject: Re: [R] Sorting a data frame by specifying a vector

 ?order
 df[order(yourcolumn, ]

 -- Bert


 On Thu, Oct 11, 2012 at 10:08 AM, LCOG1 jr...@lcog.org wrote:
 Hello all,
I cannot seem to figure out this seemingly simple procedure.

 I want to sort a data frame by a specified character vector.

 So for :

 df.. -
 data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
 runif(length(rep(c(Summer,Fall,Winter,Spring),4

 I want to sort the data frame by the seasons but in the order I
 specify since alphapetically would not put the season in sequential
 order

 I tried the following and a few other things but no dice.  It looks
 like I will have to convert to factors.  Any thoughts?  Thanks

 df.. -
 df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spr
 ing))),]

 Josh



 --
 View this message in context:
 http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vec
 tor-tp4645867.html Sent from the R help mailing list archive at
 Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting a data frame by specifying a vector

HI,
In this case, specifying the factor levels would be easier.
Try this:
set.seed(1)
df - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4
df1-within(df,{Season-factor(Season,levels=c(Summer,Fall,Winter,Spring))})
library(plyr)
df2-ddply(df1,.(Season),function(x) x)
df2
#   Season    Obs
#1  Summer 0.26550866
#2  Summer 0.20168193
#3  Summer 0.62911404
#4  Summer 0.68702285
#5    Fall 0.37212390
#6    Fall 0.89838968
#7    Fall 0.06178627
#8    Fall 0.38410372
#9  Winter 0.57285336
#10 Winter 0.94467527
#11 Winter 0.20597457
#12 Winter 0.76984142
#13 Spring 0.90820779
#14 Spring 0.66079779
#15 Spring 0.17655675
#16 Spring 0.49769924


Just curious, in your reply, the Obs column has only 4 values.  Do you want to 
get the means???


A.K.



- Original Message -
From: ROLL Josh F jr...@lcog.org
To: 'arun' smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Thursday, October 11, 2012 1:42 PM
Subject: RE: [R] Sorting a data frame by specifying a vector

Sorry if I wasn't clear but the result I am looking for is as follows
#   Season   Obs
#1  Summer 0.2141001
#5  Summer 0.2141001
#9  Summer 0.2141001
#13 Summer 0.2141001
#3    Fall 0.6722337
#7    Fall 0.6722337
#11   Fall 0.6722337
#15   Fall 0.6722337
#2  Winter 0.9318599
#6  Winter 0.9318599
#10 Winter 0.9318599
#14 Winter 0.9318599
#4  Spring 0.1927715
#8  Spring 0.1927715
#12 Spring 0.1927715
#16 Spring 0.1927715

The process you describe does not get me there

Any other recommendations?

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Thursday, October 11, 2012 10:33 AM
To: ROLL Josh F
Cc: R help
Subject: Re: [R] Sorting a data frame by specifying a vector

Hi,
In your dataset, it seems like it is already ordered in the way you wanted to.
df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

#Suppose the order you want is:

 vec2-c(Summer,Winter,Fall,Spring)
df1-df..[match(df..$Season,vec2),]
 row.names(df1)-1:nrow(df1)
 df1
#   Season   Obs
#1  Summer 0.2141001
#2  Winter 0.9318599
#3    Fall 0.6722337
#4  Spring 0.1927715
#5  Summer 0.2141001
#6  Winter 0.9318599
#7    Fall 0.6722337
#8  Spring 0.1927715
#9  Summer 0.2141001
#10 Winter 0.9318599
#11   Fall 0.6722337
#12 Spring 0.1927715
#13 Summer 0.2141001
#14 Winter 0.9318599
#15   Fall 0.6722337
#16 Spring 0.1927715


A.K.

- Original Message -
From: LCOG1 jr...@lcog.org
To: r-help@r-project.org
Cc: 
Sent: Thursday, October 11, 2012 1:08 PM
Subject: [R] Sorting a data frame by specifying a vector

Hello all,
   I cannot seem to figure out this seemingly simple procedure.  

I want to sort a data frame by a specified character vector.

So for :

df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
runif(length(rep(c(Summer,Fall,Winter,Spring),4

I want to sort the data frame by the seasons but in the order I specify since 
alphapetically would not put the season in sequential order

I tried the following and a few other things but no dice.  It looks like I will 
have to convert to factors.  Any thoughts?  Thanks

df.. -
df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),]

Josh



--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-by-specifying-a-vector-tp4645867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting a data frame by specifying a vector

2012-10-11 Thread Sarah Goslee

I'm pretty sure you were already given the answer: order() in
conjunction with a factor with the level in an order you specify.


mydf$Season - factor(mydf$Season, levels=c(Summer,Fall,Winter,Spring))

mydf[order(mydf$Season),]

Thanks for making sure to include the context in your replies.

Sarah

On Thu, Oct 11, 2012 at 1:42 PM, ROLL Josh F jr...@lcog.org wrote:
 Sorry if I wasn't clear but the result I am looking for is as follows
 #   Season   Obs
 #1  Summer 0.2141001
 #5  Summer 0.2141001
 #9  Summer 0.2141001
 #13 Summer 0.2141001
 #3Fall 0.6722337
 #7Fall 0.6722337
 #11   Fall 0.6722337
 #15   Fall 0.6722337
 #2  Winter 0.9318599
 #6  Winter 0.9318599
 #10 Winter 0.9318599
 #14 Winter 0.9318599
 #4  Spring 0.1927715
 #8  Spring 0.1927715
 #12 Spring 0.1927715
 #16 Spring 0.1927715

 The process you describe does not get me there

 Any other recommendations?

 -Original Message-
 From: arun [mailto:smartpink...@yahoo.com]
 Sent: Thursday, October 11, 2012 10:33 AM
 To: ROLL Josh F
 Cc: R help
 Subject: Re: [R] Sorting a data frame by specifying a vector

 Hi,
 In your dataset, it seems like it is already ordered in the way you wanted to.
 df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
 runif(length(rep(c(Summer,Fall,Winter,Spring),4

 #Suppose the order you want is:

  vec2-c(Summer,Winter,Fall,Spring)
 df1-df..[match(df..$Season,vec2),]
  row.names(df1)-1:nrow(df1)
  df1
 #   Season   Obs
 #1  Summer 0.2141001
 #2  Winter 0.9318599
 #3Fall 0.6722337
 #4  Spring 0.1927715
 #5  Summer 0.2141001
 #6  Winter 0.9318599
 #7Fall 0.6722337
 #8  Spring 0.1927715
 #9  Summer 0.2141001
 #10 Winter 0.9318599
 #11   Fall 0.6722337
 #12 Spring 0.1927715
 #13 Summer 0.2141001
 #14 Winter 0.9318599
 #15   Fall 0.6722337
 #16 Spring 0.1927715


 A.K.

 - Original Message -
 From: LCOG1 jr...@lcog.org
 To: r-help@r-project.org
 Cc:
 Sent: Thursday, October 11, 2012 1:08 PM
 Subject: [R] Sorting a data frame by specifying a vector

 Hello all,
I cannot seem to figure out this seemingly simple procedure.

 I want to sort a data frame by a specified character vector.

 So for :

 df.. - data.frame(Season=rep(c(Summer,Fall,Winter,Spring),4),Obs=
 runif(length(rep(c(Summer,Fall,Winter,Spring),4

 I want to sort the data frame by the seasons but in the order I specify since 
 alphapetically would not put the season in sequential order

 I tried the following and a few other things but no dice.  It looks like I 
 will have to convert to factors.  Any thoughts?  Thanks

 df.. -
 df..[sort(as.factor(Df..$Season,levels=c(Summer,Fall,Winter,Spring))),]

 Josh



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Friedman test for replicated blocked data

2012-10-11 Thread kolassa

It looks like friedman in  agricolae package handles replicates by
averaging and then doing the unreplicated Freidman analysis.  Any pointers
to the fully replicated analysis, given, for ex., in Conover, Practical
Nonparametric Statistics (3rd Edn.), pp 383f?  Thanks, John



--
View this message in context: 
http://r.789695.n4.nabble.com/Friedman-test-for-replicated-blocked-data-tp798293p4645875.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bug tracker broken

2012-10-11 Thread Antonio Piccolboni

Hi,
I get a 404 page not found on the root. There is not webmaster link on
r-project.org that I can see. Whom should I contact? Thanks


Antonio

PS: Yes I was trying to report my first bug. It's a conspiracy with p 
 0.01.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Repeating a series of commands

2012-10-11 Thread KoopaTrooper

I'm trying to figure out how to repeat a series of commands in R and have the
outputs added to a dataframe after each iteration.

My code starts this way...

a-read.csv(File1.csv)
b-read.csv(File2.csv)

a$Z-ifelse(a$Z==L,sample(1:4,length(a$Z),replace=TRUE),ifelse(a$Z==M,sample(5:8,length(a$Z),replace=TRUE),ifelse(a$Z==U,sample(9:10,length(a$Z),replace=TRUE),)))
a$Z-as.numeric(a$Z)
b$Z-ifelse(b$Z==L,sample(1:4,length(b$Z),replace=TRUE),ifelse(b$Z==M,sample(5:8,length(b$Z),replace=TRUE),ifelse(b$Z==U,sample(9:10,length(b$Z),replace=TRUE),)))
b$Z-as.numeric(b$Z)

This is basically just starting off with a new and partially random data set
every time that then goes through a bunch of other commands (not shown) and
ends with the following outputs saved.

Output1, Output2, Output3, Output4

where each of these is just a single number. My questions is:

1. How do I repeat the entire series of commands x number of times and save
each of the outputs into a structure like this:
  Output1  Output2 Output3 Output4
Iteration 1
Iteration 2
Iteration 3
etc.

Not even sure where to start. Are loops the answer? Thanks,





--
View this message in context: 
http://r.789695.n4.nabble.com/Repeating-a-series-of-commands-tp4645881.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Formatting data for bootstrapping for confidence intervals


Hello,

To aggregate the data use, yes, it's exists, function aggregate.

with(dat, aggregate(cbind(X, Xn, Y), list(Area, DATE), FUN = sum))
# output
  Group.1 Group.2 X Xn Y
1   1 1/10/10 1  1 0
2   1 1/11/10 0  0 1
3   1 1/12/10 3  0 0
4   2 2/12/10 2  1 1
5   2 2/13/10 3  0 0
6   2 2/14/10 4  1 0
7   3 7/27/11 3  1 0
8   3 7/28/11 7  2 2
9   3 7/29/11 3  1 0

And take a look at package boot. Maybe you'll find something there.

Hope this helps,

Rui Barradas


Em 11-10-2012 16:55, Paul Wennekes escreveu:

Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset events
that looks something like this:

AreaNAMEDATEX   Xn  Y
1   X   1/10/10 1   1   0
1   Y   1/11/10 0   0   1
1   X   1/12/10 1   0   0
1   X   1/12/10 1   0   0
1   X   1/12/10 1   0   0
2   X   2/12/10 1   1   0
2   X   2/12/10 1   0   0
2   Y   2/12/10 0   0   1
2   X   2/13/10 1   0   0
2   X   2/13/10 1   0   0
2   X   2/13/10 1   0   0
2   X   2/14/10 1   0   0
2   X   2/14/10 1   0   0
2   X   2/14/10 1   1   0
2   X   2/14/10 1   0   0
3   X   7/27/11 1   0   0
3   X   7/27/11 1   1   0
3   X   7/27/11 1   0   0
3   X   7/28/11 1   0   0
3   X   7/28/11 1   1   0
3   X   7/28/11 1   0   0
3   X   7/28/11 1   0   0
3   Y   7/28/11 0   0   1
3   X   7/28/11 1   0   0
3   X   7/28/11 1   1   0
3   Y   7/28/11 0   0   1
3   X   7/28/11 1   0   0
3   X   7/29/11 1   0   0
3   X   7/29/11 1   0   0
3   X   7/29/11 1   1   0

X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this:

AreaDATEX   Xn  Y
1   1/10/10 1   1   0
1   1/11/10 0   0   1
1   1/12/10 3   0   0
2   2/12/10 2   1   1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over parts of
the columns as defined by the DATE values...

Many thanks ahead!



--
View this message in context: 
http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeating a series of commands

encapsulate them into a function and call the function ??

-- Bert

On Thu, Oct 11, 2012 at 11:09 AM, KoopaTrooper ncoop...@tulane.edu wrote:
 I'm trying to figure out how to repeat a series of commands in R and have the
 outputs added to a dataframe after each iteration.

 My code starts this way...

 a-read.csv(File1.csv)
 b-read.csv(File2.csv)

 a$Z-ifelse(a$Z==L,sample(1:4,length(a$Z),replace=TRUE),ifelse(a$Z==M,sample(5:8,length(a$Z),replace=TRUE),ifelse(a$Z==U,sample(9:10,length(a$Z),replace=TRUE),)))
 a$Z-as.numeric(a$Z)
 b$Z-ifelse(b$Z==L,sample(1:4,length(b$Z),replace=TRUE),ifelse(b$Z==M,sample(5:8,length(b$Z),replace=TRUE),ifelse(b$Z==U,sample(9:10,length(b$Z),replace=TRUE),)))
 b$Z-as.numeric(b$Z)

 This is basically just starting off with a new and partially random data set
 every time that then goes through a bunch of other commands (not shown) and
 ends with the following outputs saved.

 Output1, Output2, Output3, Output4

 where each of these is just a single number. My questions is:

 1. How do I repeat the entire series of commands x number of times and save
 each of the outputs into a structure like this:
   Output1  Output2 Output3 Output4
 Iteration 1
 Iteration 2
 Iteration 3
 etc.

 Not even sure where to start. Are loops the answer? Thanks,





 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Repeating-a-series-of-commands-tp4645881.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on probability distribution question

2012-10-11 Thread Andras Farkas

Ted,
Â 
thanks for the answer. I actually think I have it the other way around. Let me 
give you an example:
Â 
1. I know the mean parameter value of a variable (V), lets call it M with a 
value of 5, and I also know the SD, let us call it SD with a value of 3:
#V
M -5
SD -3
Â 
2. Usually in case there is no known covariance with another parameter andÂ in 
order to avoid from negative values being generated I would do the following:
calculate mu and sigma:
mu -log(M)-0.5*log(1+SD^2/(M^2))
sigma -sqrt(log(1+SD^2/(M^2)))
Â 
3. then I would simulate:
Y -rlnorm(5000,mu,sigma)
then do
mean(Y)
sd(Y)
Â 
with resulting values of 4.968 for mean and 2.923 for SD, which I am reasonably 
happy with. 
Â 
At times though I have a multivariate situation on my hands where I know V with 
M and SD from above and additional V1 with M1 and SD1, and V2 with M2 and SD2, 
example:
Â 
#VM -5
SD -3
Â 
#V1Â 
M1 -8
SD -4
Â 
#V2
M -12
SD -6
Â 
in addition to knowing this information I also have a covariance matrix 
available for these 3 parameters. Based on my previous experience with using 
mvrnorm, if I do what is suggested below then I will generate negative values, 
which is no good news for me. In the mean time simply calculating mu and sigma 
again and then simulate all 3 variables independently as above in 1. would not 
be apropriate because that would not take into consideration the known 
covariance between parameters. Hope this example makes my qestion more clear, 
and any thoughts would be apreciated
Â 
thanks,
Â 
Andras



From: ted.hard...@wlandres.net ted.hard...@wlandres.net
To: r-help@r-project.org r-help@r-project.org 

Sent: Thursday, October 11, 2012 1:51 PM
Subject: RE: [R] Help on probability distribution question

On 11-Oct-2012 17:22:44 Andras Farkas wrote:
 Dear All,
 I have a questions I would like to ask about and wonder if you
 have any thoughts to make it work in R.
 
 1. I work in the field of medicine where physiologic variables
 are often simulated, and they can not have negative values.
 Most often the assumption is made to simulate this parameters
 with a normal distribution but in the log-domain to avoid from
 negative values to be generated. Since the expected mean and SD
 is usually known from the normal domain, using the methods described
 in the wikipedia article Arithmetric moments I generate ÃÂ¼and ÃÆ
 and simulate with rlnorm(). At times though the following issue
 comes up: I have the mean and SD for the parameters available
 from the normal domain, and the covariance matrix from the normal
 domain. Then I would like to simulate the values, but to avoid
 from negative values being generated I have to fall back on rlnorm
 in {compositions}. My issue is though that my covariance matrix is
 representing the covariance of the parameters in the normal domain,
 as opposed to in the lognormal domain. Any thoughts on howÂ  to work
 around this?
 
 apreciate the help,
 Andras

If I understand your question correctly, if Y is the variable being
simulated then you know the mean (M, say) and the variance (V, say)
of log(Y). So you can simulate X from a normal distribution with
mean M and variance V = S^2 (S = SD of X), and then Y = exp(X):

Â  Y - exp(rnorm(n,M,S))

where n is the number of sampled values you want.

When Y is multivariate, with M the vector of means and V the
covariance matrix of log(Y), then use a similar approach with
the function mvrnorm() from the MASS package:

Â  library(MASS)
Â  Y - mvrnorm(n,M,V)

Does this help?
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Oct-2012Â  Time: 18:51:47
This message was sent by XFMail
-
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on probability distribution question

2012-10-11 Thread Ted Harding

(I made a slip with the mulstivariate case below: see at [***])

On 11-Oct-2012 17:51:51 Ted Harding wrote:
 On 11-Oct-2012 17:22:44 Andras Farkas wrote:
 Dear All,
 I have a questions I would like to ask about and wonder if you
 have any thoughts to make it work in R.
 
 1. I work in the field of medicine where physiologic variables
 are often simulated, and they can not have negative values.
 Most often the assumption is made to simulate this parameters
 with a normal distribution but in the log-domain to avoid from
 negative values to be generated. Since the expected mean and SD
 is usually known from the normal domain, using the methods described
 in the wikipedia article Arithmetric moments I generate Î¼and Ï
 and simulate with rlnorm(). At times though the following issue
 comes up: I have the mean and SD for the parameters available
 from the normal domain, and the covariance matrix from the normal
 domain. Then I would like to simulate the values, but to avoid
 from negative values being generated I have to fall back on rlnorm
 in {compositions}. My issue is though that my covariance matrix is
 representing the covariance of the parameters in the normal domain,
 as opposed to in the lognormal domain. Any thoughts on how  to work
 around this?
 
 apreciate the help,
 Andras
 
If I understand your question correctly, if Y is the variable being
simulated then you know the mean (M, say) and the variance (V, say)
of log(Y). So you can simulate X from a normal distribution with
mean M and variance V = S^2 (S = SD of X), and then Y = exp(X):

  Y - exp(rnorm(n,M,S))

where n is the number of sampled values you want.

When Y is multivariate, with M the vector of means and V the
covariance matrix of log(Y), then use a similar approach with
the function mvrnorm() from the MASS package:
[***]
##  library(MASS)
##  Y - mvrnorm(n,M,V)
  library(MASS)
  Y - exp(mvrnorm(n,M,V))

Does this help?
Ted.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 11-Oct-2012  Time: 18:51:47
 This message was sent by XFMail
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Oct-2012  Time: 19:44:02
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Course: Data exploration, regression, GLM GAM with R introduction

2012-10-11 Thread Highland Statistics Ltd



We would like to announce the following statistics course:
Data exploration, regression, GLM  GAM. With introduction to R
 
When: 4 - 8 February 2013.

Where: Coimbra, Portugal.

For details, see: http://www.highstat.com/statscourse.htm
Course flyer: http://www.highstat.com/Courses/Flyer2013FebCoimbra.pdf


Kind regards,

Alain Zuur


--

Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) 
Zuur, Saveliev, Ieno.
http://www.highstat.com/book4.htm

Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highs...@highstat.com
URL: www.highstat.com
URL: www.brodgar.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names

HI Shantanu,

I saw your reply to Rui regarding multiple underscores in Nabble:

(Actually, I see now that part of the problem is that many of the 
names have multiple underscores such as red_apple_pre or 
post_banana_organic. I think this is causing a problem for this line 
in your code:)

I wasn't aware of that problem. In that case, try this:
set.seed(432)
dat2-data.frame(red_apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana_organic=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
 nam1-c(apple,orange,banana)
 nam2-c(pre,post)
colnames(dat2)-unlist(lapply(lapply(strsplit(colnames(dat2),_),function(x) 
x[x%in%nam1|x%in%nam2]),function(x) paste(x[1],x[2],sep=_)))
colnames(dat2)-gsub(^pre\\_(.*),\\1_pre,gsub(^post\\_(.*),\\1_post,colnames(dat2)))
dat3-t(dat2[order(colnames(dat2))])
dat3-data.frame(varName=gsub((.*)\\_.*,\\1,row.names(dat3)),dat3)
list3-lapply(split(dat3,dat3$varName),function(x) t(x[-1]))
res3-do.call(rbind,lapply(lapply(list3,function(x) 
t.test(x[,1],x[,2],paired=TRUE)),function(x) 
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
res3
# meandifference CIlow   CIhigh  p.value
#apple    12.6  8.519476 16.68052 0.0010166626
#banana   15.0 12.088040 17.91196 0.0001388506
#orange   18.2 13.604166 22.79583 0.0003888560


I hope this works.
A.K.






- Original Message -
From: Nundy, Shantanu snu...@chicagobooth.edu
To: arun smartpink...@yahoo.com
Cc: 
Sent: Thursday, October 11, 2012 10:22 AM
Subject: RE: [R] multiple t-tests across similar variable names

hi Arun,
This is very helpful thanks. 

I'm running into a couple issues:
1. Since some of the variables start with pre_apple and others apple_post 
sorting the variables doesn't completely put pre-post variables next to each 
other.
2. I have about 50 variables so typing this line is a bit cumbersome:

 list3-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])

Thanks,
Shantanu


From: arun [smartpink...@yahoo.com]
Sent: Thursday, October 11, 2012 9:14 AM
To: Rui Barradas
Cc: Nundy, Shantanu; R help
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

By running your code, I got the results as:
result
#       MeanDiff   CIlower    CIupper      p.value
#apple     -12.6 -16.68052  -8.519476 0.0010166626
#banana    -15.0 -17.91196 -12.088040 0.0001388506
#orange    -18.2 -22.79583 -13.604166 0.0003888560

From my code:
res3
#       meandifference     CIlow   CIhigh      p.value
#apple            12.6  8.519476 16.68052 0.0010166626
#banana           15.0 12.088040 17.91196 0.0001388506
#orange           18.2 13.604166 22.79583 0.0003888560

There is difference in signs.
A.K.




- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: arun smartpink...@yahoo.com; Nundy, Shantanu snu...@chicagobooth.edu
Cc: R help r-help@r-project.org
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have 
changed the names of two of the variables, to allow for 'pre' and 'post' to be 
first in the names.

# auxiliary functions
ifswap - function(x)
    if(x[1] %in% c(pre, post)) x[2:1] else x

getpair - function(i, post)
    post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine - function(h)
    c(MeanDiff = unname(h$estimate),
        CIlower = h$conf.int[1],
        CIupper = h$conf.int[2],
        p.value = h$p.value)

doTests - function(DF, Pairs){
    t.list - lapply( seq_len(nrow(Pairs)), function(i)
        t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
    do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 - data.frame(apple_pre = sample(10:20,5,replace=TRUE),
            orange_post = sample(18:28,5,replace=TRUE),
            pre_banana = sample(25:35,5,replace=TRUE),  # here
            apple_post = sample(20:30,5,replace=TRUE),
            post_banana = sample(40:50,5,replace=TRUE), # and here
            orange_pre = sample(5:10,5,replace=TRUE))


#
# start processing the data.frame
# Make pairs of pre/post columns
vars - names(dat2)
vmat - do.call(rbind, strsplit(vars, _))
vmat - t(apply(vmat, 1, ifswap))
pre - which(vmat[, 2] == pre)
post - which(vmat[, 2] == post)
post - sapply(pre, getpair, post)
pairs - matrix(c(pre, post), ncol = 2)

# now the tests
result - doTests(dat2, pairs)
rownames(result) - vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of 
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:
 HI,

 If you have a lot of variables and in no order, then it would

Re: [R] bug tracker broken

2012-10-11 Thread peter dalgaard


On Oct 11, 2012, at 19:56 , Antonio Piccolboni wrote:

 Hi,
 I get a 404 page not found on the root. There is not webmaster link on
 r-project.org that I can see. Whom should I contact? Thanks
 

The machine hosting the bug tracker is having some issues. Just wait for the 
dust to settle...

- Peter D.

 
 Antonio
 
 PS: Yes I was trying to report my first bug. It's a conspiracy with p 
 0.01.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug tracker broken

2012-10-11 Thread Uwe Ligges


On 11.10.2012 19:56, Antonio Piccolboni wrote:

Hi,
I get a 404 page not found on the root. There is not webmaster link on
r-project.org that I can see. Whom should I contact? Thanks


Thanks for the note.
The servers where the bug tracker is installed are experiencing problems 
that are known. The people are working on it.


Best,
Uwe Ligges






Antonio

PS: Yes I was trying to report my first bug. It's a conspiracy with p 
  0.01.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] struggling with R2wd or SWord? Try rtf!

2012-10-11 Thread Jean V Adams

I have been looking for a way to write R-generated reports to Microsoft 
Word documents.  In the past, I used the package R2wd, but for some reason 
I haven't been able to get it to work on my current set up.
R version 2.15.0 (64-bit)
Windows 7 Enterprise - Service Pack 1
Microsoft Office Professional Plus 2010 - Word version 
14.0.6123.5001 (32-bit)
I gave the package SWord a try, too.  Also, no luck.

But, I just recently ran across the package rtf, and it serves my needs 
quite well.  Since some of you may find yourself in a similar situation, I 
thought I'd spread the word (ha!) about rtf.

Below is some introductory code based on examples in 
http://cran.r-project.org/web/packages/rtf/vignettes/rtf.pdf

Give it a try.  You may like it!

Jean


`·.,,  (((º   `·.,,  (((º   `·.,,  (((º

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA
http://www.glsc.usgs.gov



library(rtf)
rtf - RTF(rtf_vignette.doc, width=8.5, height=11, font.size=10, 
omi=c(1, 1, 1, 1))

addHeader(rtf, title=This text was added with the addHeader() function., 
subtitle=So was this.)
addParagraph(rtf, This text was added with the addParagraph() function. 
It is a new self-contained paragraph.  When Alpha; is greater than 
beta;, then gamma; is equal to zero.\n)

startParagraph(rtf)
addText(rtf, This text was added with the startParagraph() and addText() 
functions.  You can insert )
addText(rtf, styled , bold=TRUE, italic=TRUE)
addText(rtf, text this way.  But, you must end the paragraph manually 
with the endParagraph() function.\n)
endParagraph(rtf)

increaseIndent(rtf)
addParagraph(rtf, paste(rep(You can indent text with the increaseIndent() 
function., 4), collapse=  ))

addNewLine(rtf)

decreaseIndent(rtf)
addParagraph(rtf, paste(rep(And remove the indent with the 
decreaseIndent() function., 4), collapse=  ))

addNewLine(rtf)
addNewLine(rtf)

addParagraph(rtf, Table 1.  Table of the iris data using the addTable() 
function.\n)
tab - table(iris$Species, floor(iris$Sepal.Length))
names(dimnames(tab)) - c(Species, Sepal Length)
addTable(rtf, tab, font.size=10, row.names=TRUE, NA.string=-, 
col.widths=c(1, 0.5, 0.5, 0.5, 0.5) )

newPlot - function() {
par(pty=s, cex=0.7)
plot(iris[, 1], iris[, 2])
abline(h=2.5, v=6.0, lty=2)
}
addPageBreak(rtf)
addPlot(rtf, plot.fun=newPlot, width=5, height=5, res=300)
addNewLine(rtf)
addParagraph(rtf, Figure 1.  Plot of the iris data using the addPlot() 
function.\n)

addNewLine(rtf)
addNewLine(rtf)

addSessionInfo(rtf)
done(rtf)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survey package question

2012-10-11 Thread Thomas Lumley

On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza
sebastian.d...@gmail.com wrote:
 Hello,

 I have got a cluster sample using an election dataset where I already
 had the final results of a county-specific election. I am trying to
 figure out what would be the best sampling design for my data.

 The  structure of the dataset is:

 1) polling station (in general schools where people vote, for a
 county, for example, there are 15 polling stations)
 2) inside each polling station, there are voting units, where people
 actually vote (on average there are about 40 voting units for polling
 station)
 3) for each voting unit I have the total votes by candidate (e.g.,
 candidate 1 =322, candidate 2=122, candidate 3= 89)

 The initial sampling design is:
 1) selection of 5 polling stations PPS (based on number of voters)
 2) selection of 10 voting units (SRS)

 I am interested in estimating the proportion of votes by candidate
 (let's assume we have 3 candidates). My naive estimate would be:

 votes for candidate 1 / all valid votes = proportion

 e.g.

 candidate 1= 2132 / 10874= .1906
 candidate 2= 5323 / 10874= .4895
 candidate 3= 3419 / 10874= .3144

 In this case, the unit of analysis is voters (or votes).

  If I specify the sampling design using the survey package in this way...

 design -svydesign(id=~station + unit  fpc=~probstation +probunit,
 data=sample, pps=brewer)

 svyciprop(~I(candidate1/totalVotes), design)

 ... I am assuming that the unit of analysis is the voting unit, right?
 and I am estimating an average among voting units?


You want a ratio estimator

svyratio(~candidate1, ~totalVotes, design)


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] epiR//Incidence rate//beginner question on syntax

2012-10-11 Thread ninbut

Hi, new in R and I would like to start with calculating an incidence rate. My
data is imported into R from a tab delimited txt file, as shown below:

ID DATE_BIRTH   DATE_UNT   EVENT  TIME_EV
1 4867 08/02/1959 19/10/2001  1  31
2   52 15/07/1941 08/02/1999   1   6
3   63 02/01/1946 11/02/1999  1 6
4  710 21/10/1965 23/03/1999 010
5 1808 07/05/1952 18/06/1999  17
6  554 19/08/1947 15/03/1999  0  10
...
event (EVENT=1)
censoring  (EVENT=0)
How do I calculate the  incidence rate in R  for different strata of age?

1) number of events (EVENT) / personmonths to event  or censoring  (TIME_EV)
2) number of events (EVENT) / 12 personmonths to event  or censoring
(TIME_EV) 

I am lost here, and did not succeed to understand epiR, and the option to
create a matrix for simple  incidence rates, and not incidence rate ratios. 
If anyone could help me out with a simple to understand syntaxt to get
further, I would aprreciate so much and thanks in advance!!
nb



--
View this message in context: 
http://r.789695.n4.nabble.com/epiR-Incidence-rate-beginner-question-on-syntax-tp4645896.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with getURL (RCurl) to obtain list files of an ftp directory

2012-10-11 Thread Francisco Zambrano

Dear all,

I have a problem with the command 'getURL' from the RCurl package, which I
have been using to obtain a ftp directory list from the MOD16 (ET, DSI)
products, and then to  download them. (part of the script by Tomislav
Hengl, spatial-analyst). Instead of the list of files (from ftp), I am
getting the complete html code. Anyone knows why this might happen?

This are the steps i have been doing:

 MOD16A2.doy- '
ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/'

 items - strsplit(getURL(MOD16A2.doy,
.opts=curlOptions(ftplistonly=TRUE)), \n)[[1]]

items #results

[1] !DOCTYPE HTML PUBLIC \-//W3C//DTD HTML 4.01 Transitional//EN\ \
http://www.w3.org/TR/html4/loose.dtd\;\n!-- HTML listing generated by
Squid 2.7.STABLE9 --\n!-- Wed, 10 Oct 2012 13:43:53 GMT
--\nHTMLHEADTITLE\nFTP Directory:
ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n/TITLE\nSTYLE
type=\text/css\!--BODY{background-color:#ff;font-family:verdana,sans-serif}--/STYLE\n/HEADBODY\nH2\nFTP
Directory: A HREF=\/\ftp://ftp.ntsg.umt.edu/A/A
HREF=\/pub/\pub/A/A HREF=\/pub/MODIS/\MODIS/A/A
HREF=\/pub/MODIS/Mirror/\Mirror/A/A
HREF=\/pub/MODIS/Mirror/MOD16/\MOD16/A/A
HREF=\/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\MOD16A2.105_MERRAGMAO/A//H2\nPRE\nA
HREF=\../\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\;
ALT=\[DIRUP]\/A A HREF=\../\Parent Directory/A \nA
HREF=\GEOTIFF_0.05degree/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\GEOTIFF_0.05degree/\GEOTIFF_0.05degree/A
. . . . . . . Jun  3 18:00\nA HREF=\GEOTIFF_0.5degree/\IMG
border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\GEOTIFF_0.5degree/\GEOTIFF_0.5degree/A. .
. . . . . . Jun  3 18:01\nA HREF=\Y2000/\IMG border=\0\
SRC=\http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2000/\Y2000/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2001/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2001/\Y2001/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2002/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2002/\Y2002/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2003/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2003/\Y2003/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2004/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2004/\Y2004/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2005/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2005/\Y2005/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2006/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2006/\Y2006/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2007/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2007/\Y2007/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2008/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2008/\Y2008/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2009/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2009/\Y2009/A. . . . . . . . . . . . . .
Dec 23  2010\nA HREF=\Y2010/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2010/\Y2010/A. . . . . . . . . . . . . .
Feb 20  2011\nA HREF=\Y2011/\IMG border=\0\ SRC=\
http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
ALT=\[DIR] \/A A HREF=\Y2011/\Y2011/A. . . . . . . . . . . . . .
Mar 12  2012\n/PRE\nHR noshade
size=\1px\\nADDRESS\nGenerated Wed, 10 Oct 2012 13:43:53 GMT by
localhost (squid/2.7.STABLE9)\n/ADDRESS/BODY/HTML\n

The curious is that the command getURL was working well until I don't know
what happened. And using the same command in Windows works fine.

The sessionInfo() have given me the next:

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
 [1] MODIS_0.5-8 maptools_0.8-16 lattice_0.20-0

[R] Selecting n observation

2012-10-11 Thread bibek sharma

Hello R help,
 I have a question similar to what is posted by someone before. my
problem is that Instead of last assessment, I want to choose last two.

I have a data set with several time assessments for each participant.
I want to select the last assessment for each participant. My dataset
looks like this:
ID  week  outcome
1   2   14
1   4   28
1   6   42
4   2   14
4   6   46
4   9   64
4   9   71
4  12   85
9   2   14
9   4   28
9   6   51
9   9   66
9  12   84

Here is one solution for choosing last assessment
do.call(rbind,
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
  ID week outcome
1  16  42
4  4   12  85
9  9   12  84

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simple parsing question?

2012-10-11 Thread Fuchs Ira

I am using the getQuote function in the Quantmod package to retrieve the % 
change for a stock as follows:

 getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
  Trade Time %Change (RT)
aapl 2012-10-11 03:41:00 N/A - -1.67%

How can I extract the numeric change % which is being returned as a factor so 
that I can use it in other calculations?

Thanks.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting summary plm results to latex

HI Sebastian,

Sorry, I found an error in my solution (the values and coefficients got mixed 
up in sorting).
Try this:

library(reshape)
extract.plm - function(model) {
if (!class(model)[1] == plm) {
stop(Internal error: Incorrect model type! Should be a plm object!)
}
zz1-summary(model)$coef[,c(1,2,4)]
zz2-as.data.frame(apply(zz1,2,function(x) sprintf(%.3f,x)))
zz2[]-sapply(zz2,function(x) as.numeric(as.character(x)))
zz3-data.frame(Coefficient=row.names(zz1),zz2)
zz3-melt(zz3,by=Coefficient)
zz4-within(zz3,{Coefficient-as.character(Coefficient);variable-as.character(variable)})
zz5-ddply(zz4,.(Coefficient),function(x) x)
zz5$value[zz5$variable==Estimate]-ifelse(zz5$value[zz5$variable==Pr...t..]0.05
  
zz5$value[zz5$variable==Pr...t..]=0.01,gsub((.*),\\1*,zz5$value[zz5$variable==Estimate]),ifelse(zz5$value[zz5$variable==Pr...t..]0.01,gsub((.*),\\1**,zz5$value[zz5$variable==Estimate]),zz5$value[zz5$variable==Estimate]))
zz5$value[zz5$variable==Std..Error]-gsub((.*),(\\1),zz5$value[zz5$variable==Std..Error])
zz6-zz5[!zz5$variable==Pr...t..,]
rownames(zz6)-1:nrow(zz6)
 res-zz6[,c(1,3)]
res
}
library(plm)
data(Produc, package = plm)
zz - plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, 
index = c(state,year))

extract.plm(zz)
#Using Coefficient as id variables
#  Coefficient    value
#1    log(emp)  0.768**
#2    log(emp)   (0.03)
#3 log(pc)  0.292**
#4 log(pc)  (0.025)
#5   log(pcap)   -0.026
#6   log(pcap)  (0.029)
#7   unemp -0.005**
#8   unemp  (0.001)
 summary(zz)$coef
#  Estimate   Std. Error    t-value  Pr(|t|)
#log(pcap) -0.026149654 0.0290015755 -0.9016632  3.675200e-01
#log(pc)    0.292006925 0.0251196728 11.6246309  7.075069e-29
#log(emp)   0.768159473 0.0300917394 25.5272539 2.021455e-104
#unemp -0.005297741 0.0009887257 -5.3581508  1.113946e-07


library(xtable)
xtable(extract.plm(zz))
Using Coefficient as id variables
% latex table generated in R 2.15.0 by xtable 1.7-0 package
% Thu Oct 11 15:28:12 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rll}
  \hline
  Coefficient  value \\ 
  \hline
1  log(emp)  0.768** \\ 
  2  log(emp)  (0.03) \\ 
  3  log(pc)  0.292** \\ 
  4  log(pc)  (0.025) \\ 
  5  log(pcap)  -0.026 \\ 
  6  log(pcap)  (0.029) \\ 
  7  unemp  -0.005** \\ 
  8  unemp  (0.001) \\ 
   \hline
\end{tabular}
\end{center}
\end{table}


I used this example because your example is a bit restricted in the sense that 
there was only one independent variable.  In that case, some adjustments need 
to be made in the function:

#With your example dataset 

x - rnorm(270)
y - rnorm(270)
t - rep(1:3,30)
i - rep(1:90, each=3)
data - data.frame(i,t,x,y)
fe - plm(y~x,data=data,model=within)

extract.plm - function(model) {

if (!class(model)[1] == plm) {
stop(Internal error: Incorrect model type! Should be a plm object!)
}
tab1 - summary(model)$coef[,1:2]
tab1[1]-ifelse(summary(model)$coef[,4]0.05 summary(model)$coef[,4]=0.01, 
gsub((.*),\\1*,tab1[1]),ifelse(summary(model)$coef[,4]0.01,gsub((.*),\\1**,tab1[1]),tab1[1]))
tab2-melt(tab1)
row.names(tab2)[2]-
tab2-within(tab2,{value=as.character(value)})
tab2[2,1]-gsub((.*),(\\1),sprintf(%.3f,as.numeric(as.character(tab2[2,1]
tab2
}
extract.plm(fe)
xtable(extract.plm(fe))

% latex table generated in R 2.15.0 by xtable 1.7-0 package
% Thu Oct 11 15:56:20 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rl}
  \hline
  value \\ 
  \hline
Estimate  -0.154513026282509* \\ 
    (0.074) \\ 
   \hline
\end{tabular}
\end{center}
\end{table}


I hope this helps.
A.K.




- Original Message -
From: Sebastian Barfort sb3...@nyu.edu
To: Duncan Mackay mac...@northnet.com.au
Cc: r-help-r-project.org r-help@r-project.org
Sent: Wednesday, October 10, 2012 7:45 PM
Subject: Re: [R] Exporting summary plm results to latex

I am also interested in the standard errors, but beneath not next to the point 
estimates which is standard in the xtable package. 
If you by any chance remember the name of the package or how to do it that 
would be much appreciated!

Cheers,
Sebastian


On Oct 10, 2012, at 7:10 PM, Duncan Mackay mac...@northnet.com.au wrote:

 Hi
 
 If you just want the coefficients.
 
 xtable(summary(fe)$coef)
 % latex table generated in R 2.15.1 by xtable 1.7-0 package
 % Thu Oct 11 09:04:59 2012
 \begin{table}[ht]
 \begin{center}
 \begin{tabular}{r}
  \hline
  Estimate  Std. Error  t-value  Pr($$$|$t$|$) \\
  \hline
 x  0.12  0.07  1.78  0.08 \\
   \hline
 \end{tabular}
 \end{center}
 \end{table}
 
 There is another package whose name eludes me which may help for tables which 
 have different outputs to the output of lm etc
 
 HTH
 
 Duncan
 
 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au
 
 
 
 At 05:09 11/10/2012, you wrote:
 HI,
 
 May be you can use library(texreg):
 
 library(plm)
 
 #generating some data
 x - rnorm(270)
 y - rnorm(270)
 t - rep(1:3,30)
 i - rep(1:90, each=3)
 
 data -

Re: [R] Selecting n observation

2012-10-11 Thread Peter Ehlers


On 2012-10-11 12:48, bibek sharma wrote:

Hello R help,
  I have a question similar to what is posted by someone before. my
problem is that Instead of last assessment, I want to choose last two.

I have a data set with several time assessments for each participant.
I want to select the last assessment for each participant. My dataset
looks like this:
ID  week  outcome
1   2   14
1   4   28
1   6   42
4   2   14
4   6   46
4   9   64
4   9   71
4  12   85
9   2   14
9   4   28
9   6   51
9   9   66
9  12   84

Here is one solution for choosing last assessment
do.call(rbind,
 by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
   ID week outcome
1  16  42
4  4   12  85
9  9   12  84


With the plyr package:

  library(plyr)
  ddply(df, .(ID), function(x) tail(x, 2))

or, slightly simpler:

  ddply(df, .(ID), tail, 2)

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optim and nlminb

2012-10-11 Thread nserdar

I have already try optimx but I got this error message. How to solve it.

fn is  Linn 
Function has  10  arguments
par[ 1 ]:  0   ? 0.5   ? 1 In Bounds   
par[ 2 ]:  0   ? 0.5   ? 1 In Bounds   In Bounds  
par[ 3 ]:  0   ? 0.5   ? 1 In Bounds   In Bounds   In Bounds 
par[ 4 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds
par[ 5 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds   In Bounds   
par[ 6 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds   In Bounds   In Bounds  
par[ 7 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds   In Bounds   In Bounds   In Bounds 
par[ 8 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds   In Bounds   In Bounds   In Bounds   In Bounds
par[ 9 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds   In Bounds   In Bounds   In Bounds   In Bounds   In Bounds   
par[ 10 ]:  -Inf   ? 1   ? Inf In Bounds   In Bounds   In Bounds   In
Bounds   In Bounds   In Bounds   In Bounds   In Bounds   In Bounds   In
Bounds  
Error in optimx(init.par, Linn, gr = NULL, method = L-BFGS-B, hessian =
TRUE,  : 
  Function provided is not returning a scalar number

Regards,
Serdar



--
View this message in context: 
http://r.789695.n4.nabble.com/optim-and-nlminb-tp4645772p4645907.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple parsing question?

2012-10-11 Thread William Dunlap

 qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change Percent 
 (Real-time
 qs
 Trade Time %Change (RT)
aapl2012-10-11 04:00:00 N/A - -2.00%
tibx2012-10-11 04:00:00 N/A - -0.85%
gm  2012-10-11 04:00:00 N/A - +1.77%
badWolfNA  N/A - 0.00%
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]])))
[1] -2.00 -0.85  1.77  0.00

The \\1 in the replacement argument to sub() means the
text matched by the first parenthesized subpattern in the pattern
argument. 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Fuchs Ira
 Sent: Thursday, October 11, 2012 12:58 PM
 To: r-help@r-project.org
 Subject: [R] simple parsing question?
 
 I am using the getQuote function in the Quantmod package to retrieve the % 
 change for
 a stock as follows:
 
  getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
   Trade Time %Change (RT)
 aapl 2012-10-11 03:41:00 N/A - -1.67%
 
 How can I extract the numeric change % which is being returned as a factor 
 so that I
 can use it in other calculations?
 
 Thanks.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple parsing question?

2012-10-11 Thread Fuchs Ira

I'm glad I asked as I would have thought that this was a common requirement and 
quantmod itself or a simple R function would have done the conversion. You 
saved me from having to master R's sub function. One remaining thing…when I use 
your snippet for AAPL, I get:

 aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]])))
[1] -2

not the -2.00 that you got. Do I have a setting that is causing it to not show 
the significant digits?

Thanks.


On Oct 11, 2012, at 4:27 PM, William Dunlap wrote:

 qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change 
 Percent (Real-time
 qs
Trade Time %Change (RT)
 aapl2012-10-11 04:00:00 N/A - -2.00%
 tibx2012-10-11 04:00:00 N/A - -0.85%
 gm  2012-10-11 04:00:00 N/A - +1.77%
 badWolfNA  N/A - 0.00%
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]])))
 [1] -2.00 -0.85  1.77  0.00
 
 The \\1 in the replacement argument to sub() means the
 text matched by the first parenthesized subpattern in the pattern
 argument. 
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Fuchs Ira
 Sent: Thursday, October 11, 2012 12:58 PM
 To: r-help@r-project.org
 Subject: [R] simple parsing question?
 
 I am using the getQuote function in the Quantmod package to retrieve the % 
 change for
 a stock as follows:
 
 getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
 Trade Time %Change (RT)
 aapl 2012-10-11 03:41:00 N/A - -1.67%
 
 How can I extract the numeric change % which is being returned as a factor 
 so that I
 can use it in other calculations?
 
 Thanks.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple parsing question?

2012-10-11 Thread William Dunlap

But I thought the intention was to turn the string into a number, not
into another string. 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: arun [mailto:smartpink...@yahoo.com]
 Sent: Thursday, October 11, 2012 1:54 PM
 To: Fuchs Ira
 Cc: R help; William Dunlap
 Subject: Re: [R] simple parsing question?
 
 HI,
 Try this:
 
  sprintf(%.2f,as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, 
 as.character(aapl[[2]]
 #[1] -2.00
 A.K.
 
 
 
 
 - Original Message -
 From: Fuchs Ira irafu...@gmail.com
 To: r-help@r-project.org
 Cc:
 Sent: Thursday, October 11, 2012 4:45 PM
 Subject: Re: [R] simple parsing question?
 
 I'm glad I asked as I would have thought that this was a common requirement 
 and
 quantmod itself or a simple R function would have done the conversion. You 
 saved me
 from having to master R's sub function. One remaining thing…when I use your 
 snippet for
 AAPL, I get:
 
  aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
  as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, 
  as.character(aapl[[2]])))
 [1] -2
 
 not the -2.00 that you got. Do I have a setting that is causing it to not 
 show the
 significant digits?
 
 Thanks.
 
 
 On Oct 11, 2012, at 4:27 PM, William Dunlap wrote:
 
  qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change 
  Percent
 (Real-time
  qs
                 Trade Time %Change (RT)
  aapl    2012-10-11 04:00:00 N/A - -2.00%
  tibx    2012-10-11 04:00:00 N/A - -0.85%
  gm      2012-10-11 04:00:00 N/A - +1.77%
  badWolf                NA  N/A - 0.00%
  as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]])))
  [1] -2.00 -0.85  1.77  0.00
 
  The \\1 in the replacement argument to sub() means the
  text matched by the first parenthesized subpattern in the pattern
  argument.
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf
  Of Fuchs Ira
  Sent: Thursday, October 11, 2012 12:58 PM
  To: r-help@r-project.org
  Subject: [R] simple parsing question?
 
  I am using the getQuote function in the Quantmod package to retrieve the % 
  change
 for
  a stock as follows:
 
  getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
              Trade Time %Change (RT)
  aapl 2012-10-11 03:41:00 N/A - -1.67%
 
  How can I extract the numeric change % which is being returned as a 
  factor so that I
  can use it in other calculations?
 
  Thanks.
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple parsing question?

HI,
Try this:

 sprintf(%.2f,as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, 
as.character(aapl[[2]]
#[1] -2.00
A.K.




- Original Message -
From: Fuchs Ira irafu...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Thursday, October 11, 2012 4:45 PM
Subject: Re: [R] simple parsing question?

I'm glad I asked as I would have thought that this was a common requirement and 
quantmod itself or a simple R function would have done the conversion. You 
saved me from having to master R's sub function. One remaining thing…when I use 
your snippet for AAPL, I get:

 aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(aapl[[2]])))
[1] -2

not the -2.00 that you got. Do I have a setting that is causing it to not show 
the significant digits?

Thanks.


On Oct 11, 2012, at 4:27 PM, William Dunlap wrote:

 qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change 
 Percent (Real-time
 qs
                Trade Time %Change (RT)
 aapl    2012-10-11 04:00:00 N/A - -2.00%
 tibx    2012-10-11 04:00:00 N/A - -0.85%
 gm      2012-10-11 04:00:00 N/A - +1.77%
 badWolf                NA  N/A - 0.00%
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]])))
 [1] -2.00 -0.85  1.77  0.00
 
 The \\1 in the replacement argument to sub() means the
 text matched by the first parenthesized subpattern in the pattern
 argument. 
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Fuchs Ira
 Sent: Thursday, October 11, 2012 12:58 PM
 To: r-help@r-project.org
 Subject: [R] simple parsing question?
 
 I am using the getQuote function in the Quantmod package to retrieve the % 
 change for
 a stock as follows:
 
 getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
             Trade Time %Change (RT)
 aapl 2012-10-11 03:41:00 N/A - -1.67%
 
 How can I extract the numeric change % which is being returned as a factor 
 so that I
 can use it in other calculations?
 
 Thanks.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple parsing question?

2012-10-11 Thread Fuchs Ira

Yes, in  my case it would be re-learning regular expressions. Unlike riding a 
bicycle, this is something I have managed to forget (except for the simplest 
cases). I even have an old O'reilly book on the subject which I can dust off.  
I was thinking (hoping?) that quantmod had functions to manipulate the 
information returned by Yahoo but I guess that is not the case. Anyway thanks 
to everyone's help, I now know how to proceed.

Best,
Ira

On Oct 11, 2012, at 4:59 PM, Bert Gunter wrote:

 Just a comment.
 
 On Thu, Oct 11, 2012 at 1:45 PM, Fuchs Ira irafu...@gmail.com wrote:
 I'm glad I asked as I would have thought that this was a common requirement 
 and quantmod itself or a simple R function would have done the conversion.
 
 **You saved me from having to master R's sub function.**
 Actually, it's not R's sub function, it's regular expressions, which
 are independent of R and used in many other languages for text
 processing. They also have an interesting history in computer science.
 You might wish to have a look at Wikipedia's or other source's page on
 regular expressions to get some background. Depending on the nature of
 your work, you may also wish to reconsider your avoidance of learning
 the regular expression syntax, which is, however, a chore.
 
 Best,
 Bert
 
 
 
 One remaining thing…when I use your snippet for AAPL, I get:
 
 aapl=getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, 
 as.character(aapl[[2]])))
 [1] -2
 
 not the -2.00 that you got. Do I have a setting that is causing it to not 
 show the significant digits?
 
 Thanks.
 
 
 On Oct 11, 2012, at 4:27 PM, William Dunlap wrote:
 
 qs - getQuote(c(aapl,tibx,gm,badWolf),what=yahooQF(c(Change 
 Percent (Real-time
 qs
   Trade Time %Change (RT)
 aapl2012-10-11 04:00:00 N/A - -2.00%
 tibx2012-10-11 04:00:00 N/A - -0.85%
 gm  2012-10-11 04:00:00 N/A - +1.77%
 badWolfNA  N/A - 0.00%
 as.numeric(sub(^.* ([-+]?[[:digit:].]+)%$, \\1, as.character(qs[[2]])))
 [1] -2.00 -0.85  1.77  0.00
 
 The \\1 in the replacement argument to sub() means the
 text matched by the first parenthesized subpattern in the pattern
 argument.
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
 On Behalf
 Of Fuchs Ira
 Sent: Thursday, October 11, 2012 12:58 PM
 To: r-help@r-project.org
 Subject: [R] simple parsing question?
 
 I am using the getQuote function in the Quantmod package to retrieve the % 
 change for
 a stock as follows:
 
 getQuote(aapl,what=yahooQF(c(Change Percent (Real-time
Trade Time %Change (RT)
 aapl 2012-10-11 03:41:00 N/A - -1.67%
 
 How can I extract the numeric change % which is being returned as a 
 factor so that I
 can use it in other calculations?
 
 Thanks.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting n observation

2012-10-11 Thread David Winsemius


On Oct 11, 2012, at 12:48 PM, bibek sharma wrote:

 Hello R help,
 I have a question similar to what is posted by someone before. my
 problem is that Instead of last assessment, I want to choose last two.
 
 I have a data set with several time assessments for each participant.
 I want to select the last assessment for each participant. My dataset
 looks like this:
 ID  week  outcome
 1   2   14
 1   4   28
 1   6   42
 4   2   14
 4   6   46
 4   9   64
 4   9   71
 4  12   85
 9   2   14
 9   4   28
 9   6   51
 9   9   66
 9  12   84
 
 Here is one solution for choosing last assessment
 do.call(rbind,
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))

Why wouldn't the solution be something along the lines of:

do.call(rbind,
   by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) ))


  ID week outcome
 1  16  42
 4  4   12  85
 9  9   12  84
 
 


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing ugly for loops

2012-10-11 Thread andrewH

Dear Bert--
I tried your function on the data that I provided (data.df) and it worked
beautifully (after I added a missing final parenthesis), producing exactly
the same output as my function.  This is an excellent example of what I was
looking for, because it is 
   (a) 50% shorter than mine, 
   (b) fully vectorized, and 
   (c) uses three functions that I have never used before: with, unique, and
do.call

I am going to spend a happy afternoon working through this command by
command and at the end I am confident that I will have learned some valuable
new ( to me) tricks. 
Thanks!
Warmest Regards, AndrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing ugly for loops

I hate to decline such praise, but honesty demands that I must.

In fact, my solution is **not** fully vectorized at all! The tapply()
and mapply() calls are, in fact, in some sense hidden loops at the
interpreted levels. They do have the virtue of being true to R's
functional paradigm, but they are loops, nevertheless. For this
reason, they may not be more efficient then the explicit loops you've
written. But I hope the code is more transparent.

AndI did send a follow-up note to the list both acknowledging my
erroneous accusation that you did not provide data and confirming that
my proposed solution worked with the example you did, in fact,
provide.

But thanks for the kind words anyway.

-- Bert

On Thu, Oct 11, 2012 at 2:16 PM, andrewH ahoer...@rprogress.org wrote:
Dear Bert--
I tried your function on the data that I provided (data.df) and it worked
beautifully (after I added a missing final parenthesis), producing exactly
the same output as my function. This is an excellent example of what I was
looking for, because it is
(a) 50% shorter than mine,
(b) fully vectorized, and
(c) uses three functions that I have never used before: with, unique, and
do.call

I am going to spend a happy afternoon working through this command by
command and at the end I am confident that I will have learned some valuable
new ( to me) tricks.
Thanks!
Warmest Regards, AndrewH

--
View this message in context:
http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[R] Fonts in *.Rd files.

2012-10-11 Thread Rolf Turner


I wanted to put a certain string in sans serif font in an *.Rd file
that I was writing.  I tried {\sf ...} and \textsf{...} but both resulted
in the warning unknown macro.  The manual on Writing R Extensions
seems to me to imply that one should be able to invoke such LaTeX
macros (section 2.3):

 Each of the above commands takes LaTeX-like input, so other macros may 
 be used within text. 

Is there something else I need to do to get this to work?  Or some other way
to get sans serif?

I would appreciate any pointers.

 cheers,

 Rolf Turner

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in file(file, rt) : cannot open the connection

2012-10-11 Thread Navin Goyal

Hi,
I am using R package QT which call runs alongwith SAS

I get this error :   Error in file(file, rt) : cannot open the
connection

I have tried using setwd or running R directly from that directory but
still get the same error. Any help would be appreciated

setwd(C:\\Documents and Settings\\\\two)
data= read.csv(data.csv, header=T)
head(data)

info - list(
saspath=\C:/Program Files/SAS/SASFoundation/9.2,
output=C:\\Documents and Settings\\...\\two,device=tiff,
...
)

Thanks

-- 
Navin Goyal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Formatting data for bootstrapping for confidence intervals



Hi,
Try this:

dat1-read.table(text=
Area    NAME    DATE    X    Xn    Y
1    X    1/10/10    1    1    0
1    Y    1/11/10    0    0    1
1    X    1/12/10    1    0    0
1    X    1/12/10    1    0    0
1    X    1/12/10    1    0    0
2    X    2/12/10    1    1    0
2    X    2/12/10    1    0    0
2    Y    2/12/10    0    0    1
2    X    2/13/10    1    0    0
2    X    2/13/10    1    0    0
2    X    2/13/10    1    0    0
2    X    2/14/10    1    0    0
2    X    2/14/10    1    0    0
2    X    2/14/10    1    1    0
2    X    2/14/10    1    0    0
3    X    7/27/11    1    0    0
3    X    7/27/11    1    1    0
3    X    7/27/11    1    0    0
3    X    7/28/11    1    0    0
3    X    7/28/11    1    1    0
3    X    7/28/11    1    0    0
3    X    7/28/11    1    0    0
3    Y    7/28/11    0    0    1
3    X    7/28/11    1    0    0
3    X    7/28/11    1    1    0
3    Y    7/28/11    0    0    1
3    X    7/28/11    1    0    0
3    X    7/29/11    1    0    0
3    X    7/29/11    1    0    0
3    X    7/29/11    1    1    0
,sep=,header=TRUE,stringsAsFactors=FALSE)

#You can either use aggregate(), ddply() from library(plyr) or using 
library(data.table)
library(data.table)
dat2-data.table(dat1)
dat2[,list(X=sum(X),Xn=sum(Xn),Y=sum(Y)),list(Area,DATE)]
#   Area    DATE X Xn Y
#1:    1 1/10/10 1  1 0
#2:    1 1/11/10 0  0 1
#3:    1 1/12/10 3  0 0
#4:    2 2/12/10 2  1 1
#5:    2 2/13/10 3  0 0
#6:    2 2/14/10 4  1 0
#7:    3 7/27/11 3  1 0
#8:    3 7/28/11 7  2 2
#9:    3 7/29/11 3  1 0
library(plyr)
ddply(dat1,.(Area,DATE),colwise(sum,c(X,Xn,Y)))
# Area    DATE X Xn Y
#1    1 1/10/10 1  1 0
#2    1 1/11/10 0  0 1
#3    1 1/12/10 3  0 0
#4    2 2/12/10 2  1 1
#5    2 2/13/10 3  0 0
#6    2 2/14/10 4  1 0
#7    3 7/27/11 3  1 0
#8    3 7/28/11 7  2 2
#9    3 7/29/11 3  1 0

A.K.


- Original Message -
From: Paul Wennekes paul.wenne...@evobio.eu
To: r-help@r-project.org
Cc: 
Sent: Thursday, October 11, 2012 11:55 AM
Subject: [R] Formatting data for bootstrapping  for confidence intervals

Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset events
that looks something like this: 

Area    NAME    DATE    X    Xn    Y
1            X    1/10/10            1    1    0
1            Y    1/11/10            0    0    1
1            X    1/12/10            1    0    0
1            X    1/12/10            1    0    0
1            X    1/12/10            1    0    0
2            X    2/12/10            1    1    0
2            X    2/12/10            1    0    0
2            Y    2/12/10            0    0    1
2            X    2/13/10            1    0    0
2            X    2/13/10            1    0    0
2            X    2/13/10            1    0    0
2            X    2/14/10            1    0    0
2            X    2/14/10            1    0    0
2            X    2/14/10            1    1    0
2            X    2/14/10            1    0    0
3            X    7/27/11            1    0    0
3            X    7/27/11            1    1    0
3            X    7/27/11            1    0    0
3            X    7/28/11            1    0    0
3            X    7/28/11            1    1    0
3            X    7/28/11            1    0    0
3            X    7/28/11            1    0    0
3            Y    7/28/11            0    0    1
3            X    7/28/11            1    0    0
3            X    7/28/11            1    1    0
3            Y    7/28/11            0    0    1
3            X    7/28/11            1    0    0
3            X    7/29/11            1    0    0
3            X    7/29/11            1    0    0
3            X    7/29/11            1    1    0

X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this: 

Area        DATE            X    Xn    Y
1                1/10/10            1    1    0
1                1/11/10            0    0    1
1                1/12/10            3    0    0
2                2/12/10            2    1    1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over parts of
the columns as defined by the DATE values...

Many thanks ahead!



--
View this message in context:

[R] model selection with spg and AIC (or, convert list to fitted model object)

2012-10-11 Thread Ravi Varadhan

Adam,

See the attached R code that solves your problem and beyond.  One important 
issue is that you are enforcing constraints only indirectly.  You need to make 
sure that P1, P2, and P3 (which are functions of original parameters and time) 
are all between 0 and 1.  It is not enough to impose constraints on original 
parameters p1, p2, mu1 and mu2.

I also show you how to do a likelihood ratio test with the results from spg.  
You can also do the same for nlminb or Rvmmin.

Finally, I also show you how to use optimx to compare different algorithms. 
This shows you that in addition to spg, you also get very good results (as seen 
from objective function values and KKT conditions) with a number of other 
algorithms (e.g., Rvmmin, nlminb), many of which are much faster than spg.

This example illustrates the utility of optimx().  I am always surprised as to 
why more R users doing optimization are not using optimx.  This is a very 
powerful for benchmarking unconstrained and box-constrained optimization 
problems.  It deserves to be used widely, in my biased, but correct, opinion.

Ravi

Ravi Varadhan, Ph.D.
Assistant Professor
The Center on Aging and Health
Division of Geriatric Medicine  Gerontology
Johns Hopkins University
rvarad...@jhmi.edumailto:rvarad...@jhmi.edu
410-502-2619

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a question

2012-10-11 Thread mina izadi

Dear R-helpers,
I need to read some data from output of garchFit in fGarch.
my model is garch(1,1) and i want to read
coefficients(omega,alpha,beta) and timeseries(x)  and conditional
SD(s). because i need them to use  in other formula.
for example :omega+x[1]+s[3]
and maybe i have several simulation then i need a general way  to read
them, not to read with my eyes for example the quantity of omega then
subsitute in formula.
Best.
M.Izadi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] In vegan package: running adonis (or similar) on a distance matrix

2012-10-11 Thread Roey Angel


Hi,
Using Vegan package I was wondering if there's a way to use a distance 
matrix as an input for adonis (or any of the other similar hypothesis 
testing functions) instead of the usual species by sample table.
Working in the field of microbial ecology, what I'm trying to do is to 
overcome the problem of having to use discrete units such as species or 
OTUs, which are problematic in microbial ecology (if not outright 
theoretically false).
What I have instead is a phylometric distance matrix between all my 
samples based on a phylogenetic tree.


Some people have apparently made such a python implementation 
http://qiime.org/scripts/compare_categories.html, but I'd rather use R.


Thanks in advance,
Roey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Changing NA to 0 in selected columns of a dataframe

2012-10-11 Thread scoyoc

I've been beating my head on the table for hours now and don't understand why
this doesn't work. I have a dataframe that I want to change NAs to 0 for
some of the columns and not others. Consider this...

#create dataframe
 A = c(1:5)
 B = c(6, 7, NA, NA, NA)
 C = c(NA, NA, 13, 14, 15)
 D = c(16:20)
 E = c(21, NA, NA, NA, 25)
 data = as.data.frame ( cbind ( A, B, C, D, E ) )
#convert NAs in columns B  C to 0
 data [ is.na ( data [ , 2:3] ) ] = 0
Error in `[-.data.frame`(`*tmp*`, is.na(data[, 2:3]), value = 0) : 
  only logical matrix subscripts are allowed in replacement

I only want to change NA in columns B and C. When I run this I get this
error. Why can't I designate rows using is.na()?



--
View this message in context: 
http://r.789695.n4.nabble.com/Changing-NA-to-0-in-selected-columns-of-a-dataframe-tp4645917.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] characters, mathematical expressions and computed values

2012-10-11 Thread 1Rnwb

Hello,

I have to add Age (bar(x)=14.3) as a title on a chart. I am unable to get
this to working. I have tried bquote, substitute and expression, but they
are only doing a part of the job.

new-
c(14.3, 18.5, 18.1, 17.7, 18, 15.9, 19.6, 17.3, 17.8, 17.5, 15.4, 
16.3, 15, 17.1, 17.1, 16.4, 15.2, 16.7, 16.7, 16.9, 14.5, 16.6, 
15.8, 15.2, 16.2, 15.6, 15, 17.1, 16.7, 15.6, 15, 15.8, 16.8, 
17, 15.2, 15.8, 15.7, 14.7, 17.3, 14.9, 16.8, 14.6, 19.3, 15.3, 
14.7, 13.3, 16.5, 16, 14.2, 16.1, 15.2, 13.4, 17.7, 15.5, 14.5, 
15.7, 13.6, 14.1, 20, 17.2, 16.5, 14.3, 13.7, 14.7, 15.4, 13.6, 
17, 17.3, 15.4, 15.5, 16.6, 15.8, 15.7, 14.7, 14.2, 14.2, 14, 
14.2, 19.1, 17.2, 18.3, 13.9, 16, 15.9, 14.9, 14.6, 15.9, 12.2, 
14.1, 12, 12.8, 17.1, 17, 15, 15.8, 15.9, 16.1, 18, 14.7, 18.9
)
hist(new, xlab='30-day Death Rate',xlim=c(7,22),main=expression(Heart
Attack( * bar(X) * )= *
mean(new)))

I would appreciate any pointers on getting this correct.
Thanks



--
View this message in context: 
http://r.789695.n4.nabble.com/characters-mathematical-expressions-and-computed-values-tp4645916.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question on survival

2012-10-11 Thread lau pel

Hi,
I'm going crazy trying to plot a quite simple graph.
i need to plot estimated hazard rate from a cox model.
supposing the model i like this:
coxPhMod=coxph(Surv(TIME, EV) ~ AGE+A+B+strata(C) data=data)
with 4 level for C.
how can i obtain a graph with 4 estimated (better smoothed) hazard curve
(base-line hazard + 3 proportional) to highlight the effect of C.
thanks!!
laudan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing NA to 0 in selected columns of a dataframe

2012-10-11 Thread scoyoc

Actually what does only logical matrix subscripts are allowed in
replacement mean. I can designate columns using is.na.



--
View this message in context: 
http://r.789695.n4.nabble.com/Changing-NA-to-0-in-selected-columns-of-a-dataframe-tp4645917p4645918.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Formatting data for bootstrapping for confidence intervals

2012-10-11 Thread Paul Wennekes

Thank you! That had me stuck for quite a while and this worked like a charm!



--
View this message in context: 
http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860p4645920.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing NA to 0 in selected columns of a dataframe

Hi,
Try this:
 dat1 = as.data.frame ( cbind ( A, B, C, D, E ) )
dat1$B[is.na(dat1$B)]-0
 dat1$C[is.na(dat1$C)]-0
 dat1
#  A B  C  D  E
#1 1 6  0 16 21
#2 2 7  0 17 NA
#3 3 0 13 18 NA
#4 4 0 14 19 NA
#5 5 0 15 20 25
A.K.




- Original Message -
From: scoyoc sco...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Thursday, October 11, 2012 6:05 PM
Subject: [R] Changing NA to 0 in selected columns of a dataframe

I've been beating my head on the table for hours now and don't understand why
this doesn't work. I have a dataframe that I want to change NAs to 0 for
some of the columns and not others. Consider this...

#create dataframe
 A = c(1:5)
 B = c(6, 7, NA, NA, NA)
 C = c(NA, NA, 13, 14, 15)
 D = c(16:20)
 E = c(21, NA, NA, NA, 25)
 data = as.data.frame ( cbind ( A, B, C, D, E ) )
#convert NAs in columns B  C to 0
 data [ is.na ( data [ , 2:3] ) ] = 0
Error in `[-.data.frame`(`*tmp*`, is.na(data[, 2:3]), value = 0) : 
  only logical matrix subscripts are allowed in replacement

I only want to change NA in columns B and C. When I run this I get this
error. Why can't I designate rows using is.na()?



--
View this message in context: 
http://r.789695.n4.nabble.com/Changing-NA-to-0-in-selected-columns-of-a-dataframe-tp4645917.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing NA to 0 in selected columns of a dataframe

2012-10-11 Thread R. Michael Weylandt

On Thu, Oct 11, 2012 at 11:58 PM, arun smartpink...@yahoo.com wrote:
 Hi,
 Try this:
  dat1 = as.data.frame ( cbind ( A, B, C, D, E ) )

No. Do not try this. It is a Very Bad Thing to use

as.data.frame(cbind(...))

instead of

data.frame(...)

for reasons I've mentioned before on this list. In short, cbind()
forces all its arguments to a single mode, thereby missing the entire
point of a data frame.

Cheers,
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question


Hello,

What a terribly asked question.
Let me rephrase it. You have a time series 'x' simulated with garchSim 
from package fGarch and have fitted a model using garchFit.

1. You want to extract the coefficients.
coef(fit)

2. You want the series of observations (simulated) and of conditional sd.
x$garch
x$sigma

Hope this helps,

Rui Barradas
Em 11-10-2012 21:29, mina izadi escreveu:

Dear R-helpers,
I need to read some data from output of garchFit in fGarch.
my model is garch(1,1) and i want to read
coefficients(omega,alpha,beta) and timeseries(x)  and conditional
SD(s). because i need them to use  in other formula.
for example :omega+x[1]+s[3]
and maybe i have several simulation then i need a general way  to read
them, not to read with my eyes for example the quantity of omega then
subsitute in formula.
Best.
M.Izadi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing NA to 0 in selected columns of a dataframe