date:20111126

[R] Why it is happeing?

2011-11-26 Thread Christofer Bogaso

Dear all, I had following calculations with R:
 x = vector(length = 4)
 x[1] = 1
 x[2] = 3
 x[3] = 123456789123456
 x[4] = -9876543219876
 as.integer(x)
[1]  1  3 NA NA
Warning message:
NAs introduced by coercion

What went wrong?

Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why it is happeing?

2011-11-26 Thread Kehl Dániel

Note that current implementations of*R*use 32-bit integers for integer 
vectors, so the range of representable integers is restricted to 
about/+/-2*10^9/:|double 
http://stat.ethz.ch/R-manual/R-patched/library/base/html/double.html|s 
can hold much larger integers exactly.

hth
d

2011-11-26 13:05 keltezéssel, Christofer Bogaso írta:
 Dear all, I had following calculations with R:
 x = vector(length = 4)
 x[1] = 1
 x[2] = 3
 x[3] = 123456789123456
 x[4] = -9876543219876
 as.integer(x)
 [1]  1  3 NA NA
 Warning message:
 NAs introduced by coercion

 What went wrong?

 Thanks and regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Constrained linear regression

2011-11-26 Thread Julia Lira


Dear all,
I need to run a simple linear regression such that:
y = b0 + b1*x1 + (1-b1)*x2 + e
which I know I can use:
lm(y ~ I(x1 - x2) + offset(x2)).
However, I also need to restrict the coefficient b1 to be between 0 and 1.
Is there any way to include such restriction in the linear regression 
estimation?
I saw suggestion related with the function Solve.QP, but I really did not 
understand such method.
Thanks in advance,
Julia 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Constrained linear regression

2011-11-26 Thread Bert Gunter

Sounds like it is  or could be considered a mixtures problem. Check
out the FlexMix package, which looks like it should do exactly what
you want. (But maybe not, so look carefully).

-- Bert

On Sat, Nov 26, 2011 at 6:10 AM, Julia Lira julia.l...@hotmail.co.uk wrote:

 Dear all,
 I need to run a simple linear regression such that:
 y = b0 + b1*x1 + (1-b1)*x2 + e
 which I know I can use:
 lm(y ~ I(x1 - x2) + offset(x2)).
 However, I also need to restrict the coefficient b1 to be between 0 and 1.
 Is there any way to include such restriction in the linear regression 
 estimation?
 I saw suggestion related with the function Solve.QP, but I really did not 
 understand such method.
 Thanks in advance,
 Julia
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dir.create() does not create directory

2011-11-26 Thread syrvn

Hello,

I am running Windows 7 and R-2.13 in StatET. 

When I try to create a directory it does not print any errors but if I check
outside eclipse if it exists or do a refresh in Eclipse the directory is not
been created. The strange thing is that it happens only to some
sub-folders... On the other hand when I use the normal windows explorer in
one of these sub-folders and create a folder it works.

Any ideas? I also tried the mode=777 option but still the same problem


Cheers,

syrvn

--
View this message in context: 
http://r.789695.n4.nabble.com/dir-create-does-not-create-directory-tp4110517p4110517.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dir.create() does not create directory

2011-11-26 Thread R. Michael Weylandt michael.weyla...@gmail.com

I'm not a Windows man, but have you tried in the R CLI or GUI rather than
Eclipse? That would help narrow down the problem.

Also, if you could provide a minimal example for those who have Windows boxes
that'd be great - though admittedly it sounds hard here.

As an outline, something like:

sessionInfo()
setwd()
list.files()
create.dir()
list.files()

should suffice.

Michael

On Nov 26, 2011, at 10:07 AM, syrvn ment...@gmx.net wrote:

Hello,

I am running Windows 7 and R-2.13 in StatET.

When I try to create a directory it does not print any errors but if I check
outside eclipse if it exists or do a refresh in Eclipse the directory is not
been created. The strange thing is that it happens only to some
sub-folders... On the other hand when I use the normal windows explorer in
one of these sub-folders and create a folder it works.

Any ideas? I also tried the mode=777 option but still the same problem

Cheers,

syrvn

--
View this message in context:
http://r.789695.n4.nabble.com/dir-create-does-not-create-directory-tp4110517p4110517.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cumsum in 3d arrays

2011-11-26 Thread zloncaric

Hello!

Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array? 
Something like matrlab function - /cumsum (*A*,dim)/ which returns the
cumulative sum of the elements along the dimension of *A* specified by
scalar dim.

Thanks in advance 

Željka



--
View this message in context: 
http://r.789695.n4.nabble.com/cumsum-in-3d-arrays-tp4110470p4110470.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Time series merge?

2011-11-26 Thread Kevin Burton

I have two time series

 

a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10)

b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10)

 

Obviously 'b' is a subset of 'a'. I want a single index value indicating
where that start of 'b' lines up with the start of 'a'. So in this simple
example I would expect an index of 5. I was playing with 'merge'. But, for a
'ts' object this does not produce anything that is useful:

 

 merge(a,b)

  x

1 1

2 2

3 3

4 4

5 5

 

I get the same answer if I use 'merge(b,a)' so I don't know how to convert
this result to something useful. So then I decided to use 'xts'. But the
conversion fails:

 

 ax - as.xts(a)

Error in as.xts.ts(a) : could not convert index to appropriate type

 

For this simple example I could code it myself using a simple for loop but
if I add capability to handle missing dates, different frequencies, etc. it
gets complicated very fast.  It seems that 'xts' has more extensive date
handling facilities that 'ts' but I am stuck since it doesn't look like I
can convert from 'ts' to 'xts'. 

 

Thanks in advance for your suggestions.

 

Kevin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time series merge?

2011-11-26 Thread Gabor Grothendieck

On Sat, Nov 26, 2011 at 10:55 AM, Kevin Burton rkevinbur...@charter.net wrote:
 I have two time series



 a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10)

 b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10)



 Obviously 'b' is a subset of 'a'. I want a single index value indicating
 where that start of 'b' lines up with the start of 'a'. So in this simple
 example I would expect an index of 5. I was playing with 'merge'. But, for a
 'ts' object this does not produce anything that is useful:



 merge(a,b)


Try this:

library(zoo)
m - merge(a = as.zoo(a), b = as.zoo(b))
m

or to get a ts object back:

as.ts(m)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cumsum in 3d arrays

2011-11-26 Thread Ben Bolker

zloncaric zloncaric at biologija.unios.hr writes:

 Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array? 
 Something like matrlab function - /cumsum (*A*,dim)/ which returns the
 cumulative sum of the elements along the dimension of *A* specified by
 scalar dim.

  Check out the combination of apply and cumsum.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot xy data

2011-11-26 Thread David Winsemius



On Nov 25, 2011, at 11:27 PM, sutada Mungpakdee wrote:


Hi,

Has anyone know about how to get the correct plot?

I have use this R script (as below), so I expect the plot is based  
on x axis, but the result was opposite. Any suggestion will be great.


You question doesn't make clear what you expected and what you are  
seeing. I also do not see why you added library(IRanges) because I see  
nothing from that package in the code. We cannot run it because the  
data is not provided and you made not effort to construct a   
data.frame that would match the attributes of the real data. It  
would be better of course to call your data something other than dog.




library(IRanges)
data -read.table(file=~/q20snpref/ 
illusmp454merbed,sep=\t,header=F)

colnames(data)-c(Scaffold,sca_position,coverage)
depth-mean(data[,coverage])
#depth now has the mean (overall)coverage
#set the bin-size
window-10001
rangefrom-0
rangeto-length(data[,sca_position])
data.10kb-runmed(data[,coverage],k=window)
png(file=cov_10k.png,width=1000,height=1000)
plot(x=data. 
10kb 
[rangefrom 
:rangeto 
],y 
= 
data 
[rangefrom 
:rangeto 
,sca_position 
],pch=.,cex=1,xlab=depth,ylab=bp_position,type=p)


If you want to swap the roles of data.10kb (AKA coverage) and  
sca_position then just reverse the x and y assignments.



dev.off()

Best regards,
Sutada
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cumsum in 3d arrays

2011-11-26 Thread David Winsemius



On Nov 26, 2011, at 9:32 AM, zloncaric wrote:


Hello!

Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array?
Something like matrlab function - /cumsum (*A*,dim)/ which returns the
cumulative sum of the elements along the dimension of *A* specified by
scalar dim.



`apply` lets you chose which dimension gets selected.

Perhaps:

apply(mat, 3, cumsum)

(This is pretty basic stuff so you should probably be reading or at  
least skimming somewhat more thoroughly than you have so far the  
Introduction to R document and there is also the R for Matlab document  
by Bob Muenchen ... and a compendium of equivalencies by Hiebeler at: www.math.umaine.edu/~hiebeler/comp/matlabR.html 
 )


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cumsum in 3d arrays

2011-11-26 Thread David Winsemius



On Nov 26, 2011, at 11:24 AM, David Winsemius wrote:



On Nov 26, 2011, at 9:32 AM, zloncaric wrote:


Hello!

Is it posible to apply /cumsum()/ along the 3rd dimension of 3D  
array?
Something like matrlab function - /cumsum (*A*,dim)/ which returns  
the
cumulative sum of the elements along the dimension of *A* specified  
by

scalar dim.



`apply` lets you chose which dimension gets selected.

Perhaps:

apply(mat, 3, cumsum)


Or perhaps

 apply(mat, 1:2, cumsum)



(This is pretty basic stuff so you should probably be reading or at  
least skimming somewhat more thoroughly than you have so far the  
Introduction to R document and there is also the R for Matlab  
document by Bob Muenchen ... and a compendium of equivalencies by  
Hiebeler at: www.math.umaine.edu/~hiebeler/comp/matlabR.html )


--


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time series merge?

2011-11-26 Thread Kevin Burton

Seems to work fine. Thank you.

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, November 26, 2011 10:11 AM
To: Kevin Burton
Cc: r-help@r-project.org
Subject: Re: [R] Time series merge?

On Sat, Nov 26, 2011 at 10:55 AM, Kevin Burton rkevinbur...@charter.net
wrote:
 I have two time series

 a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10)

 b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10)

 Obviously 'b' is a subset of 'a'. I want a single index value 
 indicating where that start of 'b' lines up with the start of 'a'. So 
 in this simple example I would expect an index of 5. I was playing 
 with 'merge'. But, for a 'ts' object this does not produce anything that
is useful:

 merge(a,b)

Try this:

library(zoo)
m - merge(a = as.zoo(a), b = as.zoo(b)) m

or to get a ts object back:

as.ts(m)

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how big (in RAM and/or disk storage) is each of these objects in a list?

2011-11-26 Thread Paul Johnson

Greetings, friends (and others :) )

We generated a bunch of results and saved them in an RData file. We
can open, use, all is well, except that the size of the saved file is
quite a bit larger than we expected.  I suspect there's something
floating about in there that one of the packages we are using puts in,
such as a spare copy of a data frame that is saved in some subtle way
that has escaped my attention.

Consider a list of objects. Are there ways to do these things:

1. ask R how much memory is used by the things inside the list?

2.   Does as.expression(anObject) print everything in there? Or, is
there a better way to convert each thing to text or some other format
that you can actually read line by line to see what is in there, to
see everything?

If there's no giant hidden data frame floating about, I figure I'll
have to convert symmetric matrices to lower triangles or such to save
space.  Unless R already is automatically saving a matrix in that way
but just showing me the full matrix, which I suppose is possible. If
you have other ideas about general ways to make saved objects smaller,
I'm open for suggestions.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SPSS - R

2011-11-26 Thread Kristi Shoemaker

I'm an SPSS user trying to make the transition to R.

Can someone help me translate the following SPSS code into R?:


GLM Total_tp1 Total_tp2 WITH Age Sex
  /WSFACTOR=Time 2 Repeated
  /METHOD=SSTYPE(3)
  /CRITERIA=ALPHA(.05)
  /WSDESIGN= Time
  /DESIGN= Age Sex Age*Sex.

Also. can anyone recommend any resources to help SPSS users learn to things in 
R?

Thanks,
-kristi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need some vectorizing help

2011-11-26 Thread Scott Tetrick


Thank you very much David - R is so rich, the easy way can be hard to find.

Just to close this out for others, the final solution I used was:

Peak2Return - function(v) {
  S - cummax(v)
  L - which((v ==S)  (diff(c(0,v)0))
  R - sapply(v[L], function(x,S) {which(x  S)[1]; }, S)

now you have L for the left index, and R for the corresponding right 
index.  If there is no right index due to the curve, the R value is NA.


On 11/24/2011 7:35 AM, David Winsemius wrote:


On Nov 24, 2011, at 4:52 AM, Scott Tetrick wrote:

So I have a problem that I'm trying to get through, and I just can't 
seem to get it to run very fast in R.


What I'm trying to do is to find in a vector a local peak, then the 
next time that value is crossed later.  I don't care about peaks that 
may be lower than this first one - they can be ignored.  I've tried 
some sapply methods along the way, but they all are slower.  The best 
solution I have is a loop, and I just know there are smart R folks 
that could help me eliminate it.


It looks as though you are reinventing hte function:

?cummax




Peak2Return - function(v) {
 Q - (1:m)[diff(v)0]; find all the peaks
 L - Q[c(TRUE,v[Q[-1]]  v[Q[-length(Q)]])]
  ; 
eliminate lower peaks
 R - sapply(L,function (x,v) { ((x+1):length(v))[v[x]  
v[(x+1):m]][1]; }, v)
   ; 
find the next crossing

 out - data.frame(peak=L,Return=R)
 out
}

Thanks in advance!



David Winsemius, MD
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computationally singular error with mice()

2011-11-26 Thread Fei

Hi Josh,

Thanks for the kind reminder of posting the dataframe on. My dataframe
contains lots of categorical variables, which seems to be problematic.  For
instance,

dobstatus edu   mrext
  married   highschool   yes, full time

Do you know how to specify the imputation methods and the visitSquence so
that those categorical variables are not involved in the imputation process?
Thank you.

Fei



--
View this message in context: 
http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4110776.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how big (in RAM and/or disk storage) is each of these objects in a list?

2011-11-26 Thread Duncan Murdoch


On 11-11-26 1:41 PM, Paul Johnson wrote:
 We generated a bunch of results and saved them in an RData file. We
 can open, use, all is well, except that the size of the saved file is
 quite a bit larger than we expected.  I suspect there's something
 floating about in there that one of the packages we are using puts in,
 such as a spare copy of a data frame that is saved in some subtle way
 that has escaped my attention.

 Consider a list of objects. Are there ways to do these things:

 1. ask R how much memory is used by the things inside the list?

You can use object.size, but read the man page:  it is not a completely 
well-defined question.




 2.   Does as.expression(anObject) print everything in there? Or, is
 there a better way to convert each thing to text or some other format
 that you can actually read line by line to see what is in there, to
 see everything?

No, as.expression won't necessarily work.  save(..., ascii=TRUE) will 
show you everything, but it's not designed to be readable.  Probably the 
most useful function is str().



 If there's no giant hidden data frame floating about, I figure I'll
 have to convert symmetric matrices to lower triangles or such to save
 space.  Unless R already is automatically saving a matrix in that way
 but just showing me the full matrix, which I suppose is possible. If
 you have other ideas about general ways to make saved objects smaller,
 I'm open for suggestions.

You could try different compression methods (see ?save), but probably 
the best idea is to identify the things that you didn't mean to include, 
and don't include those.  A common way this happens is objects like 
functions or formulas that carry their environment with them.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPSS - R

2011-11-26 Thread Frank Harrell

If you know SPSS already why not learn R modeling syntax and do this
yourself?

If ALPHA(.05) implies that you are using stepwise variable selection note
that this is an invalid statistical technique.

Frank

Kristi Shoemaker wrote
 
 I'm an SPSS user trying to make the transition to R.
 
 Can someone help me translate the following SPSS code into R?:
 
 
 GLM Total_tp1 Total_tp2 WITH Age Sex
   /WSFACTOR=Time 2 Repeated
   /METHOD=SSTYPE(3)
   /CRITERIA=ALPHA(.05)
   /WSDESIGN= Time
   /DESIGN= Age Sex Age*Sex.
 
 Also. can anyone recommend any resources to help SPSS users learn to
 things in R?
 
 Thanks,
 -kristi
 
   [[alternative HTML version deleted]]
 
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/SPSS-R-tp4110995p4111006.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPSS - R

2011-11-26 Thread R. Michael Weylandt michael.weyla...@gmail.com

Perhaps this website and the associated book will be of help: 
http://r4stats.com/

Michael

On Nov 26, 2011, at 11:08 AM, Kristi Shoemaker kristi.shoema...@yahoo.com 
wrote:

 I'm an SPSS user trying to make the transition to R.
 
 Can someone help me translate the following SPSS code into R?:
 
 
 GLM Total_tp1 Total_tp2 WITH Age Sex
 � /WSFACTOR=Time 2 Repeated
 � /METHOD=SSTYPE(3)
 � /CRITERIA=ALPHA(.05)
 � /WSDESIGN= Time
 � /DESIGN= Age Sex Age*Sex.
 
 Also. can anyone recommend any resources to help SPSS users learn to things 
 in R?
 
 Thanks,
 -kristi
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPSS - R

2011-11-26 Thread John Fox

Dear Kristi,

I assume that this is a repeated-measures ANOVA with one within-subjects
factor (Time) and two between-subjects factors (Age and Sex, which are
crossed). If Age is numeric, and not a factor, then the type-III tests
that you requested don't test sensible hypotheses. In any event, if my guess
is right about the design, then you can use the Anova() function in the car
package for an equivalent analysis. See the repeated-measures example in
?Anova (for the O'Brien and Kaiser data). 

You've already had an answer to the more general question.

I hope this helps,
 John


John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Kristi Shoemaker
 Sent: November-26-11 11:08 AM
 To: r-help@r-project.org
 Subject: [R] SPSS - R
 
 I'm an SPSS user trying to make the transition to R.
 
 Can someone help me translate the following SPSS code into R?:
 
 
 GLM Total_tp1 Total_tp2 WITH Age Sex
   /WSFACTOR=Time 2 Repeated
   /METHOD=SSTYPE(3)
   /CRITERIA=ALPHA(.05)
   /WSDESIGN= Time
   /DESIGN= Age Sex Age*Sex.
 
 Also. can anyone recommend any resources to help SPSS users learn to
 things in R?
 
 Thanks,
 -kristi
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how big (in RAM and/or disk storage) is each of these objects in a list?

2011-11-26 Thread John

On Sat, 26 Nov 2011 12:41:08 -0600
Paul Johnson pauljoh...@gmail.com wrote:

 Greetings, friends (and others :) )
 
 We generated a bunch of results and saved them in an RData file. We
 can open, use, all is well, except that the size of the saved file is
 quite a bit larger than we expected.  I suspect there's something
 floating about in there that one of the packages we are using puts in,
 such as a spare copy of a data frame that is saved in some subtle way
 that has escaped my attention.
 
 Consider a list of objects. Are there ways to do these things:
 
 1. ask R how much memory is used by the things inside the list?
 
 2.   Does as.expression(anObject) print everything in there? Or, is
 there a better way to convert each thing to text or some other format
 that you can actually read line by line to see what is in there, to
 see everything?
 
 If there's no giant hidden data frame floating about, I figure I'll
 have to convert symmetric matrices to lower triangles or such to save
 space.  Unless R already is automatically saving a matrix in that way
 but just showing me the full matrix, which I suppose is possible. If
 you have other ideas about general ways to make saved objects smaller,
 I'm open for suggestions.
 

As an initial step, what is the result of running ls() with your RData
file loaded?  You should get a list of what is in memory.  Using RData
files can be as space-efficient or costly as the user's habits.  Did you
use save() or the save.image() command to produce the file? The
save.image() command stashes what is in memory and if you've run a
number of experimental procedures that did not pan out and you did not
discard with the results with rm(), they were saved to the rdata file
along with the information you did want, a procedure rather like filing
away all your work in a file drawer and then emptying the waste basket
into the drawer as well. If you save the data with ascii = TRUE as an
option, you can troll through the file and read what you saved.

JWD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Missing data?

2011-11-26 Thread R. Michael Weylandt

Why do you need to use a frequency attribute for these data? The point
of the zoo/xts line of time series implementations is that the time
stamps are carried through for each observation (unlike ts) and can be
irregular. Both classes exist precisely to avoid being forced into a
frequency attribute.

As far as setting up the time elements, wouldn't this work? Change the
start date to get weeks on any desired day

d - seq.Date(from = as.Date(2011-11-26), by = -7, length.out = 100)
xts(rep(NA, length(d)), d)

You can avoid the OHLC formatting of to.weekly if you want with the
OHLC = FALSE parameter. And if you want to index it by the first of
the week rather htan the last, just try this:

time(x) - time(x) - 6

Michael

On Tue, Nov 22, 2011 at 6:50 PM, Kevin Burton rkevinbur...@charter.net wrote:
 Void of any other suggestions this approach makes sense but for my case I
 think I need to use zoo objects rather than xts. If I sequence the data
 generally I don't know if there will be 365 days in the year or 366. So I
 have to sequence the dates as:

 seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)

 If I use this sequence with xts I get:

 ds - xts(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day))
 Error in xts(NA, seq(from = as.Date(2011-01-01), to =
 as.Date(2011-12-31),  :
  NROW(x) must match length(order.by)

 If I leave the 'data' empty I don't get the error but if I try to assign an
 individual item (fill as appropriate)

 ds - xts(, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day))
 ds[2011-12-24] - 10
 ds
 Error in structure(coredata(x), names = x.attr$dimnames[[1]]) :
  'names' attribute [365] must be the same length as the vector [358]

 So now I need to remember that I have not filled in all of the data. Also
 simple dereferencing gives:

 ds[1]
 Error in `[.xts`(ds, 1) : subscript out of bounds

 With zoo I am able to create a time-series where all of the data is
 initially NA:

 ds - zoo(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day))

 So I can fill the data as appropriate and the remaining slots will have NA.
 I may be new with xts but I cannot see a way of creating a useable 'blank'
 time-series.

 Also with xts it seems like the frequency is ignored.

 ds - xts(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day), frequency=52)
 frequency(ds)
 [1] 1

 Whereas zoo remembers the frequency setting

 ds - zoo(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day), frequency=52)
 frequency(ds)
 [1] 52

 But since the ultimate goal is to get the time-series in a 'ts' format (as
 many functions require 'ts') it seems like even zoo has problems:

 as.ts(ds)

 Time Series:
 Start = c(14975, 1)
 End = c(15339, 1)
 Frequency = 52
    [1]   1  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA
   [42]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   2  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA
   [83]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA   3  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA
  [124]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   4  NA  NA
 NA  NA  NA  NA  NA
  [165]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA  NA  NA
  [206] . . . . . .
  So the conversion from zoo to ts maintained the frequency but I am not sure
 where it decided on the start and end values. Also the conversion seemed to
 changed the data also. Notice that every period (52 entries) the original
 data is maintained. In other words if ds is the original zoo time series
 then ds[1] is 1 and ds[2] is 2 etc. The converted time-series keeps ds[1]
 but inserts 51 NA's then adds ds[2] etc till the end of the series.  That is
 not what the initial data was. The conversion is inserting data of its own.

 The conversion to ts from xts seems better behaved:

 ds - xts(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day), frequency=52)
 as.ts(ds)
 Time Series:
 Start = 1
 End = 365
 Frequency = 1
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
 18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 37  38  39  40  41  42
  [43]  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59
 60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78
 79  80  81  82  83  84
  [85]  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101
 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
 121 122 123 124 125 126
 [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141

Re: [R] Time series merge?

2011-11-26 Thread Hasan Diwan

Try xts (tsObj, order.by=index (tsobj))
On Nov 26, 2011 10:57 AM, Kevin Burton rkevinbur...@charter.net wrote:

 I have two time series



 a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10)

 b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10)



 Obviously 'b' is a subset of 'a'. I want a single index value indicating
 where that start of 'b' lines up with the start of 'a'. So in this simple
 example I would expect an index of 5. I was playing with 'merge'. But, for
 a
 'ts' object this does not produce anything that is useful:



  merge(a,b)

  x

 1 1

 2 2

 3 3

 4 4

 5 5



 I get the same answer if I use 'merge(b,a)' so I don't know how to convert
 this result to something useful. So then I decided to use 'xts'. But the
 conversion fails:



  ax - as.xts(a)

 Error in as.xts.ts(a) : could not convert index to appropriate type



 For this simple example I could code it myself using a simple for loop but
 if I add capability to handle missing dates, different frequencies, etc. it
 gets complicated very fast.  It seems that 'xts' has more extensive date
 handling facilities that 'ts' but I am stuck since it doesn't look like I
 can convert from 'ts' to 'xts'.



 Thanks in advance for your suggestions.



 Kevin


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Missing data?

2011-11-26 Thread Gabor Grothendieck

On Tue, Nov 22, 2011 at 6:50 PM, Kevin Burton rkevinbur...@charter.net wrote:
 Void of any other suggestions this approach makes sense but for my case I
 think I need to use zoo objects rather than xts. If I sequence the data
 generally I don't know if there will be 365 days in the year or 366. So I
 have to sequence the dates as:

 seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)

 If I use this sequence with xts I get:

 ds - xts(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day))
 Error in xts(NA, seq(from = as.Date(2011-01-01), to =
 as.Date(2011-12-31),  :
  NROW(x) must match length(order.by)

 If I leave the 'data' empty I don't get the error but if I try to assign an
 individual item (fill as appropriate)

 ds - xts(, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day))
 ds[2011-12-24] - 10
 ds
 Error in structure(coredata(x), names = x.attr$dimnames[[1]]) :
  'names' attribute [365] must be the same length as the vector [358]

 So now I need to remember that I have not filled in all of the data. Also
 simple dereferencing gives:

 ds[1]
 Error in `[.xts`(ds, 1) : subscript out of bounds

 With zoo I am able to create a time-series where all of the data is
 initially NA:

 ds - zoo(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day))

 So I can fill the data as appropriate and the remaining slots will have NA.
 I may be new with xts but I cannot see a way of creating a useable 'blank'
 time-series.

 Also with xts it seems like the frequency is ignored.

 ds - xts(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day), frequency=52)
 frequency(ds)
 [1] 1

 Whereas zoo remembers the frequency setting

 ds - zoo(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31),
 by=day), frequency=52)
 frequency(ds)
 [1] 52

 But since the ultimate goal is to get the time-series in a 'ts' format (as
 many functions require 'ts') it seems like even zoo has problems:

The problem is that you seem to want a fixed number of periods per
year but there is not a constant of 52 weeks nor 365 days in a year.
You are going to have give up something since your apparent criteria
conflict with reality.  For example, you could use months in which
case there are exactly 12 or you could stick more than 7 days into the
first or last week of the year so that there are exactly 52 weeks in a
year but they don't all have the same number of days, etc.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computationally singular error with mice()

2011-11-26 Thread Weidong Gu

Hi Fei,

I wouldn't worry to much about categorical variables for mice. Mice
would use logisitic regression for binary and polytomous logistic
regression for categorical variables with 2 levels. However, you
should not include factors with a lot of levels, saying30, in
imputation models because it would require a lot of dummy variables.

Another thing is that not excluding variables you would use in
substantive analysis. Otherwise, estimation would be biased.

Weidong

On Sat, Nov 26, 2011 at 12:07 PM, Fei fayechen0...@hotmail.com wrote:
Hi Josh,

Thanks for the kind reminder of posting the dataframe on. My dataframe
contains lots of categorical variables, which seems to be problematic. For
instance,

dob status edu mrext
married highschool yes, full time

Do you know how to specify the imputation methods and the visitSquence so
that those categorical variables are not involved in the imputation process?
Thank you.

Fei

--
View this message in context:
http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4110776.html
Sent from the R help mailing list archive at Nabble.com.

[R] Question about randomForest

2011-11-26 Thread Matthew Francis

I've been using the R package randomForest but there is an aspect I
cannot work out the meaning of. After calling the randomForest
function, the returned object contains an element called prediction,
which is the prediction obtained using all the trees (at least that's
my understanding). I've checked that this prediction set has the error
rate as reported by err.rate.

However, if I send the training data back into the the
predict.randomForest function I find I get a different result to the
stored set of predictions. This is true for both classification and
regression. I find the predictions obtained this way also have a much
lower error rate and perform very well (suspiciously well...) on
measures such as AUC.

My understanding is that the two predictions above should be the same.
Since they are not, I must be not understanding something properly.
Any ideas what's going on?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computationally singular error with mice()

2011-11-26 Thread Fei

Hi Weidong,

Thank you for the clear explanation. You are right it is not the categorical
variables that are causing the trouble. It might be the relatively small
number of sample that causing the problem given so many variables. I tried
to exclude some variables that are not essential to all the analyses I am
going to conduct and get the commands run successfully. Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4111304.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simplify source code

2011-11-26 Thread Christof Kluß


Hi

I would like to shorten

mod1 - nls(ColName2 ~ ColName1, data = table, ...)
mod2 - nls(ColName3 ~ ColName1, data = table, ...)
mod3 - nls(ColName4 ~ ColName1, data = table, ...)
...

is there something like

cols = c(ColName2,ColName3,ColName4,...)

for i in ...
  mod[i-1] - nls(ColName[i] ~ ColName1, data = table, ...)

I am looking forward to help

Christof

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] append to PDF file

2011-11-26 Thread Christof Kluß


Hi

is there a way to append a plot as PDF to an existing PDF file?
savePlot seems not to have this possibility.

Christof

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] append to PDF file

2011-11-26 Thread Ken

PDF files contain information at the end of them and so you cannot append 
without altering the file (universally true for PDF). Perhaps pdf() your plots 
and use external tools to convert the PDFs to .ps then re-merge. Might not be 
the best way, but an effective one. 
 Ken Hutchison
  

On Nov 26, 2554 BE, at 5:38 PM, Christof Kluß ckl...@email.uni-kiel.de wrote:

 Hi
 
 is there a way to append a plot as PDF to an existing PDF file?
 savePlot seems not to have this possibility.
 
 Christof
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] append to PDF file

2011-11-26 Thread jim holtman

There is the 'pdftk' (PDF tool kit) that you will find on the web that
will do the job.  I have used it to both combine and split out the
pages in the PDF file.

On Sat, Nov 26, 2011 at 5:51 PM, Ken vicvoncas...@gmail.com wrote:
 PDF files contain information at the end of them and so you cannot append 
 without altering the file (universally true for PDF). Perhaps pdf() your 
 plots and use external tools to convert the PDFs to .ps then re-merge. Might 
 not be the best way, but an effective one.
     Ken Hutchison


 On Nov 26, 2554 BE, at 5:38 PM, Christof Kluß ckl...@email.uni-kiel.de 
 wrote:

 Hi

 is there a way to append a plot as PDF to an existing PDF file?
 savePlot seems not to have this possibility.

 Christof

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data.table merge equivalent for all.x

2011-11-26 Thread ONKELINX, Thierry

Dear all,

I'm trying to use data.table to summarise a table and merge it to another 
table. Here is what I would like to do, but by using data.table() in a proper 
way.

library(data.table)
tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID)
tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2)
junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum)
merge(tab1, junk, by = D, all.x = TRUE)

This my attempt using data.table()

junk - tab2[, mean(B), by = D]
tab1[junk]

Best regards,

Thierry



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computationally singular error with mice()

2011-11-26 Thread Joshua Wiley

Hi Fei,

On Sat, Nov 26, 2011 at 9:07 AM, Fei fayechen0...@hotmail.com wrote:
 Hi Josh,

 Thanks for the kind reminder of posting the dataframe on. My dataframe
 contains lots of categorical variables, which seems to be problematic.  For
 instance,

 dob        status         edu               mrext
       married       highschool   yes, full time

Still not exactly a useable dataset, but here is a snippet of code I used:

##
# Multiple Imputation Model  #
##

## specify the predictor matrix for the imputation
pred.matrix - rbind(
  VFQRoleDifficulties1 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
  MOODVision1 =  c(1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
  MOODImpact1 =  c(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
[snip]
  SocialFunctioning1 =   c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1),
  RoleEmotional1 =   c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1),
  MentalHealth1 =c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0))
## set rownames to column names of the data (this is a square matrix)
colnames(pred.matrix) - colnames(dat)

## Set the methods used to impute each variable
imp.method - c(
  VFQRoleDifficulties1 = pmm,
  MOODVision1 = pmm,
  MOODImpact1 = pmm,
[snip]
  SocialFunctioning1 = pmm,
  RoleEmotional1 = pmm,
  MentalHealth1 = pmm
)

## Create multiply imputed dataset
datimp - mice(data = dat, m = 500, method = imp.method,
predictorMatrix = pred.matrix,
  seed = 1, print = FALSE)

Basically you can write a k x k matrix where k is the number of
variables in your dataset.  This can control what variables are used
in the imputation model for each variable (all 0s would mean no
variables).  You can also pass a k length character vector controlling
the method used for each variable.  You can also control the order
mice goes in.

Cheers,

Josh



 Do you know how to specify the imputation methods and the visitSquence so
 that those categorical variables are not involved in the imputation process?
 Thank you.

 Fei



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4110776.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simplify source code

2011-11-26 Thread Dennis Murphy

Hi:

Here's one way you could do it. I manufactured some fake data with a
simple model to illustrate. This assumes you are using the same model
formula with the same starting values and remaining arguments for each
response.

dg - data.frame(x = 1:10, y1 = sort(abs(rnorm(10))),
  y2 = sort(abs(rnorm(10))), y3 = sort(abs(rnorm(10

# Model: y = b0 + b1 exp(x/theta)
vars - c('y1', 'y2', 'y3')

# Function to create the model formula by plugging in the
# response y and run the model
mfun - function(y) {
 form - as.formula(paste(y, 'cbind(1, exp(x/th))', sep = ' ~ '))
 nls(form, data = dg, start = list(th = 0.3), algorithm = 'plinear')
}

# Generate a list of model objects:
mlist - lapply(vars, mfun)

# To see what they contain:
str(mlist[[1]])
str(summary(mlist[[1]]))

# Extract a few features from each:
# The first two return matrices, the third returns a list

do.call(rbind, lapply(mlist, function(m) coef(m)))
do.call(rbind, lapply(mlist, function(m) deviance(m)))
lapply(mlist, function(m) summary(m)$cov.unscaled)


To get more control over the output format, the plyr package can come
in handy. For example, to get data frames for the first two
extractions above, one would do

library('plyr')
ldply(mlist, function(m) coef(m))
ldply(mlist, function(m) deviance(m))

# ldply() means list input, data frame output (ld).
# For the third extraction, one has a list input and a list output:
llply(mlist, function(m) summary(m)$cov.unscaled)

HTH,
Dennis

On Sat, Nov 26, 2011 at 2:30 PM, Christof Kluß ckl...@email.uni-kiel.de wrote:
 Hi

 I would like to shorten

 mod1 - nls(ColName2 ~ ColName1, data = table, ...)
 mod2 - nls(ColName3 ~ ColName1, data = table, ...)
 mod3 - nls(ColName4 ~ ColName1, data = table, ...)
 ...

 is there something like

 cols = c(ColName2,ColName3,ColName4,...)

 for i in ...
  mod[i-1] - nls(ColName[i] ~ ColName1, data = table, ...)

 I am looking forward to help

 Christof

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about randomForest

2011-11-26 Thread Weidong Gu

Hi Matthew,

The error rate reported by randomForest is the prediction error based
on out-of-bag OOB data. Therefore, it is different from prediction
error on the original data  since each tree was built using bootstrap
samples (about 70% of the original data), and the error rate of OOB is
likely higher than the prediction error of the original data as you
observed.

Weidong

On Sat, Nov 26, 2011 at 3:02 PM, Matthew Francis
mattjamesfran...@gmail.com wrote:
 I've been using the R package randomForest but there is an aspect I
 cannot work out the meaning of. After calling the randomForest
 function, the returned object contains an element called prediction,
 which is the prediction obtained using all the trees (at least that's
 my understanding). I've checked that this prediction set has the error
 rate as reported by err.rate.

 However, if I send the training data back into the the
 predict.randomForest function I find I get a different result to the
 stored set of predictions. This is true for both classification and
 regression. I find the predictions obtained this way also have a much
 lower error rate and perform very well (suspiciously well...) on
 measures such as AUC.

 My understanding is that the two predictions above should be the same.
 Since they are not, I must be not understanding something properly.
 Any ideas what's going on?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.table merge equivalent for all.x

2011-11-26 Thread Dennis Murphy

Hi:

There may well be a more efficient way to do this, but here's one take.

library('data.table')
# Want to merge by D in the end, so set D as part of the key:
t1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID, D)
t2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2, D)

# The J expression produces sums of B (the non-key variable) for each D group
# .SD denotes 'sub-data'.  The result 'junk' is a data table.
junk - t2[, lapply(.SD, sum), by = D]

tables()   # junk has no key
# set a key for junk so that it can be merged
setkey(junk, 'D')
# t1 and junk have a common key variable D, so the left join is
merge(t1, junk, by = 'D', all.x = TRUE)

# check against
t1
junk

HTH,
Dennis


On Sat, Nov 26, 2011 at 3:59 PM, ONKELINX, Thierry
thierry.onkel...@inbo.be wrote:
 Dear all,

 I'm trying to use data.table to summarise a table and merge it to another 
 table. Here is what I would like to do, but by using data.table() in a proper 
 way.

 library(data.table)
 tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID)
 tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2)
 junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum)
 merge(tab1, junk, by = D, all.x = TRUE)

 This my attempt using data.table()

 junk - tab2[, mean(B), by = D]
 tab1[junk]

 Best regards,

 Thierry



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Boxplot

2011-11-26 Thread Jeffrey Joh


I'm trying to do the second case among Jim's suggestions.  I used Bert's 
suggestion and it works great.

I would also like to ask if anyone is familiar with a package for making 
box-plots.  I would like to bin my datapoints at defined X intervals and 
display a boxplot for each bin on the same chart.  In Stata, there is a tool 
for making these, and it varies the width of the boxplot based on the number of 
points in each plot.  I am hoping there is a similar tool for R.

Thank you,
Jeffrey


 Date: Tue, 22 Nov 2011 18:51:05 +1100
 From: j...@bitwrit.com.au
 To: johjeff...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Binned line plot

 On 11/22/2011 04:29 PM, Jeffrey Joh wrote:
 
  I have a scatter plot with 1 points. I would like to add a line that 
  bins every 50 points and connects the average of each bin. I'm looking for 
  something similar to line type m in Stata.
 
  With this dataset of 1 points, I would also like to bin the data and 
  make boxplots at certain intervals, so that I have a set of boxplots to 
  represent each bin. I would also like the width of each box to be 
  proportional to the number of points in each bin.
 
  How can I make these plots? Is there a simple package to use?
 
 Hi Jeffrey,
 There are three possibilities that come to mind:

 1) You want to bin the points based on their order in the data frame.

 2) You want to bin the points based on the x or y values of the coordinates.

 3) You want to bin the points based on the x _and_ y values of the
 coordinates.

 Number 1 is trivial and has already been answered (assume a two column
 data frame of coordinates named xypoints).

 #first point - set up a loop to get a vector of averages
 meanx-rep(0,200)
 meany-rep(0,200)
 for(index in 1:200) {
 start-1+50*(index-1)
 meanx[index]-mean(xypoints[start:(start+49),x])
 meany[index]-mean(xypoints[start:(start+49),y])
 }
 plot(meanx,meany,type=l)

 Number 2 requires that you sort the pairs based on the value of the one
 you want, then apply the same process as 1 to the sorted pairs. Number 3
 is somewhat more difficult.

 I don't do this much, and some of the people who do map analysis will
 probably come up with a much better method.

 Find the most extreme point.
 Find the 49 points closest to that point to constitute group 1.
 Remove those points from the data frame.
 Go back to the first step if there are any points left.

 You will end up with 200 groups of points that are spatially grouped.
 Get the centroids and plot as above.

 Another wild guess from

 Jim
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Boxplot

2011-11-26 Thread David Winsemius



On Nov 27, 2011, at 12:15 AM, Jeffrey Joh wrote:



I'm trying to do the second case among Jim's suggestions.  I used  
Bert's suggestion and it works great.


I would also like to ask if anyone is familiar with a package for  
making box-plots.  I would like to bin my datapoints at defined X  
intervals and display a boxplot for each bin on the same chart.


Combining `cut` (to define the intervals) and `boxplot` should be  
fairly straight-forward.



In Stata, there is a tool for making these, and it varies the width  
of the boxplot based on the number of points in each plot.


We have a tool for that, too. Study `quantile` a bit, to automatically  
pick cutpoints that will divide into approximately equal groups.


(I use the `cut2` function in the Hmisc package,  because it is  
integrated with `rms` that I use all the time, and because its  
defaults for cut()-ting are more to my liking. It also has a g=  
parameter that automates the cut( ..., quantile(...)) processing.





I am hoping there is a similar tool for R.

Thank you,
Jeffrey



Date: Tue, 22 Nov 2011 18:51:05 +1100
From: j...@bitwrit.com.au
To: johjeff...@hotmail.com
CC: r-help@r-project.org
Subject: Re: [R] Binned line plot

On 11/22/2011 04:29 PM, Jeffrey Joh wrote:


I have a scatter plot with 1 points. I would like to add a  
line that bins every 50 points and connects the average of each  
bin. I'm looking for something similar to line type m in Stata.


With this dataset of 1 points, I would also like to bin the  
data and make boxplots at certain intervals, so that I have a set  
of boxplots to represent each bin. I would also like the width of  
each box to be proportional to the number of points in each bin.


How can I make these plots? Is there a simple package to use?


Hi Jeffrey,
There are three possibilities that come to mind:

1) You want to bin the points based on their order in the data frame.

2) You want to bin the points based on the x or y values of the  
coordinates.


3) You want to bin the points based on the x _and_ y values of the
coordinates.

Number 1 is trivial and has already been answered (assume a two  
column

data frame of coordinates named xypoints).

#first point - set up a loop to get a vector of averages
meanx-rep(0,200)
meany-rep(0,200)
for(index in 1:200) {
start-1+50*(index-1)
meanx[index]-mean(xypoints[start:(start+49),x])
meany[index]-mean(xypoints[start:(start+49),y])
}
plot(meanx,meany,type=l)

Number 2 requires that you sort the pairs based on the value of the  
one
you want, then apply the same process as 1 to the sorted pairs.  
Number 3

is somewhat more difficult.

I don't do this much, and some of the people who do map analysis will
probably come up with a much better method.

Find the most extreme point.
Find the 49 points closest to that point to constitute group 1.
Remove those points from the data frame.
Go back to the first step if there are any points left.

You will end up with 200 groups of points that are spatially grouped.
Get the centroids and plot as above.

Another wild guess from

Jim




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sqldf if iif

2011-11-26 Thread Carlos Rivera

Dear all,

 

I have problems with iif function using sqldf library.

I counted abundance (Num) of different SPECIES in two moments (esf) saving
the information in two Tables (esf50, esf100):

esf50

SAMPLE  SPECIES  Num esf

1289diso1   44  50

1289diso2   5 50

1289diso3   1 50

diso1   44  50

diso2   5 50

diso3   1 50

   

esf100

SAMPLE  SPECIES  Num esf

1289diso1   82  100

1289diso2   13  100

1289diso3   2 100

1289diso4   3 100

diso1   82  100

diso2   13  100

diso3   2 100

diso4   3 100

 

I would like subtract column Num between the two moments considering only
the changes, therefore I use the conditional if:

 

var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num, esf100.esf,

   iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num,
esf100.Num as PIPAS 

   from esf100 left join esf50 on esf100.SAMPLE =
esf50.SAMPLE

   and esf100.SPECIES = esf50.SPECIES)

 

I think the structure is right because the SQL query run ok in Access. Is
the if syntax the problems?

 

Thank in advanced.

 

Best wishes,

 

Carlos Rivera

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generating a vector of y_t = \sum_{i = 1}^t (alpha^i * x_{t - i + 1})

2011-11-26 Thread Michael Kao


Dear R-help,

I have been trying really hard to generate the following vector given 
the data (x) and parameter (alpha) efficiently.


Let y be the output list, the aim is to produce the the following 
vector(y) with at least half the time used by the loop example below.


y[1] = alpha * x[1]
y[2] = alpha^2 * x[1] + alpha * x[2]
y[3] = alpha^3 * x[1] + alpha^2 * x[2]  + alpha * x[3]
.

below are the methods I have tried and failed miserably, some are just 
totally ridiculous so feel free to have a laugh but would appreciate if 
someone can give me a hint. Otherwise I guess I'll have to give RCpp a 
try.



## Bench mark the recursion functions
loopRec - function(x, alpha){
n - length(x)
y - double(n)
for(i in 1:n){
y[i] - sum(cumprod(rep(alpha, i)) * rev(x[1:i]))
}
y
}

loopRec(c(1, 2, 3), 0.5)

## This is a crazy solution, but worth giving it a try.
charRec - function(x, alpha){
n - length(x)
exp.mat - matrix(rep(x, each = n), nc = n, byrow = TRUE)
up.mat - matrix(eval(parse(text = paste(c(, 
paste(paste(paste(rep(0, , 0:(n - 1), ), sep = ),

paste(cumprod(rep(, alpha, ,, n:1, )), sep = ) , sep = ,),

  collapse = ,), 
), sep = ))), nc = n, byrow = TRUE)
colSums(up.mat * exp.mat)
}
vecRec(c(1, 2, 3), 0.5)

## Sweep is slow, shouldn't use it.
matRec - function(x, alpha){
n - length(x)
exp.mat - matrix(rep(x, each = n), nc = n, byrow = TRUE)
up.mat - sweep(matrix(cumprod(rep(alpha, n)), nc = n, nr = n,
   byrow = TRUE), 1,
   c(1, cumprod(rep(1/alpha, n - 1))), FUN = *)
up.mat[lower.tri(up.mat)] - 0
colSums(up.mat * exp.mat)
}
matRec(c(1, 2, 3), 0.5)

matRec2 - function(x, alpha){
n - length(x)
exp.mat - matrix(rep(x, each = n), nc = n, byrow = TRUE)
up.mat1 - matrix(cumprod(rep(alpha, n)), nc = n, nr = n, byrow = TRUE)
up.mat2 - matrix(c(1, cumprod(rep(1/alpha, n - 1))), nc = n, nr = n)
up.mat - up.mat1 * up.mat2
up.mat[lower.tri(up.mat)] - 0
colSums(up.mat * exp.mat)
}

matRec2(c(1, 2, 3), 0.5)

## Check whether value is correct
all.equal(loopRec(1:1000, 0.5), vecRec(1:1000, 0.5))
all.equal(loopRec(1:1000, 0.5), matRec(1:1000, 0.5))
all.equal(loopRec(1:1000, 0.5), matRec2(1:1000, 0.5))

## benchmark the functions.
benchmark(loopRec(1:1000, 0.5), vecRec(1:1000, 0.5), matRec(1:1000, 0.5),
  matRec2(1:1000, 0.5), replications = 50,
  order = relative)

Thank you very much for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nnet plot

2011-11-26 Thread RMSOPS

good night

   Again I ask for help to the community, as I am new at this, I have some
basic questions.

 I am looking for packages on neural networks and so you can search found
these two that I think are the most used, neuralnet, nnet.

 So you can test, and correct me if I'm wrong the neuralnet only accepts as
input values nomer, did a little test

 data (iris)
 library (neuralnet)
 Species.numeric - as.numeric (iris $ Species)
 iris.df - data.frame (iris, Species.numeric)
 net - neuralnet (~ Species.numeric Sepal.Width Sepal.Length + + +
Petal.Width Petal.Length, iris.df, hidden = 2)
 options (device = windows)
 plot (net)
 net

 I think the net library supports all type of data.

 library (nnet)
 library (nnet)
 RN - nnet (iris $ Species ~., Data = iris, size = 3, rang = 0.1, decay =
0.01, maxit = 20)
 plot (RN)

 my question is how this package can enter all the input attributes. and how
can I draw a sketch of the network similar to that of neuralnet, or how I
can put all the attributes not transform in the numeric neuralnet.

Is there a more effective package of neural networks.

 thank you

--
View this message in context: 
http://r.789695.n4.nabble.com/nnet-plot-tp4111620p4111620.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computationally singular error with mice()

2011-11-26 Thread Fei

Hi Josh,

You opened the blackbox up to me. Now I know what is the right way to go.
Thank you so much!

Best,
Fei

--
View this message in context: 
http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4111537.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf if iif

2011-11-26 Thread Jeff Newmiller

sqldf uses the SQLite database by default for backend processing. The iif 
function is specific to the Jet database engine syntax (which underlies MS 
Access). You could read up on SQLite syntax, or you could avoid using 
nonstandard SQL syntax, retrieve the data into a data frame, and use R code to 
do your logical merging into one column.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Carlos Rivera limnoriv...@gmail.com wrote:

Dear all,

 

I have problems with iif function using sqldf library.

I counted abundance (Num) of different SPECIES in two moments (esf)
saving
the information in two Tables (esf50, esf100):

esf50

SAMPLE  SPECIES  Num esf

1289diso1   44  50

1289diso2   5 50

1289diso3   1 50

diso1   44  50

diso2   5 50

diso3   1 50

   

esf100

SAMPLE  SPECIES  Num esf

1289diso1   82  100

1289diso2   13  100

1289diso3   2 100

1289diso4   3 100

diso1   82  100

diso2   13  100

diso3   2 100

diso4   3 100

 

I would like subtract column Num between the two moments considering
only
the changes, therefore I use the conditional if:

 

var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num,
esf100.esf,

  iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num,
esf100.Num as PIPAS 

   from esf100 left join esf50 on esf100.SAMPLE =
esf50.SAMPLE

   and esf100.SPECIES = esf50.SPECIES)

 

I think the structure is right because the SQL query run ok in Access.
Is
the if syntax the problems?

 

Thank in advanced.

 

Best wishes,

 

Carlos Rivera

 

 

 

 


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tikzDevice and sans serif

2011-11-26 Thread Thomas S. Dye

Aloha all,

I haven't been able to find how to choose the font used by tikzDevice.
My first tries have all been set with a serif font and I'd like to have
them use the sans serif font instead.  I've looked through the
documentation and googled a bit without success.  Is this possible? Can
someone point me to instructions?

All the best,
Tom

-- 
Thomas S. Dye
http://www.tsdye.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

46 matches

Mail list logo