[R] too large for hashing

2012-04-05 Thread Adam D. I. Kramer

Hello,

I'm doing some analysis on a rather large data set. In this case,
some simple commands are failing. For example, this one:


x$eventtype - factor(x$eventtype)

Error in unique.default(x) : length 1093574297 is too large for hashing

...I think this is a bug, because hashing should not be required for the
factor function. Am I right? The whole column does not need to be hashed,
only the unique keys. Sure, there is the potential to overflow the key
register, but this error should be thrown only if that occurs, no?

Cordially,
Adam D. I. Kramer, Ph.D.
Data Scientist, Facebook, Inc.
akra...@fb.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] too large for hashing

2012-04-05 Thread Adam D. I. Kramer

Thanks for your response, Duncan.

x$eventtype is a character vector (because the same hashing error
occurred when I tried to read.table() in the first place specifying
colClasses = c(..., factor, ...).

x really is that long:


dim(x)

[1] 1093574297 12

...the x$eventtype field has three unique values.

(I'm currently using a workaround of making a numeric column based on a
string of ifelse() and then setting class() - factor and then setting the
labels manually.)

--Adam

On Thu, 5 Apr 2012, Duncan Murdoch wrote:


On 05/04/2012 2:03 PM, Adam D. I. Kramer wrote:

Hello,

I'm doing some analysis on a rather large data set. In this case,
some simple commands are failing. For example, this one:

  x$eventtype- factor(x$eventtype)
Error in unique.default(x) : length 1093574297 is too large for hashing

...I think this is a bug, because hashing should not be required for the
factor function. Am I right? The whole column does not need to be hashed,
only the unique keys. Sure, there is the potential to overflow the key
register, but this error should be thrown only if that occurs, no?


It looks as though the error is coming when unique() tries to determine the 
unique levels in the argument, but really there's no way to answer your 
question without more information.  What type of object is x$eventtype?  It 
is really 1093574297 elements long?  How many unique values does it have?


Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R's unfortunate treatment of X11 failures

2010-04-19 Thread Adam D. I. Kramer

Hi,

I use R in a terminal environment on a linux box, using X11 for
graphics, often tunnelled to the terminal I'm using.  When an internet
connection dies (say, if the ssh connection dies when forwarding), R usually
gives a scary warning like Error: X11 fatal IO error: please save work and
shut down R. That's all well and good.

Then, if I ignore it and try to plot() something, R dies quite
ungracefully, producing (e.g.) something like this:

R: ../../src/xcb_io.c:385: _XAllocID: Assertion `ret != inval_id' failed.
Aborted

...and returning me to the shell. If I have not saved, I have lost my work.

Clearly, this is my fault--I ignored the warning. However, I just had the
above happen to me without my being warned...I lost some data, but not a
lot.  I am working on creating a reproducible case currently.

That said, in the meantime, I would suggest a broader fix. I'm sure we can
agree that R aborting due to failed assertions like this are pretty
unfortunate.  In the case of X11 failures like this, however, there is an
alternative: Print an error message and take a preventative action.

Error: X11 fatal IO error: X11 device disabled. ...and then call dev.off().

Indeed, if I just run dev.off() when I see this error, nothing about R seems
corrupt at all.  I can start a new X11 window with, X11(localhost:10) and
things (such as plotting) function fine.

Cordially,
Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-SIG-Mac] How to interrupt an R process that hangs

2010-03-15 Thread Adam D. I. Kramer

+1--this is the single most-annoying issue with R that I know of.

My usual solution, after accomplishing nothing as R spins idly for a couple
hours, is to kill the process and lose any un-saved work.  save.history() is
my friend, but is a big delay when you work with big data sets as I do, so I
don't run it after every command.

I have cc'd r-help here, however, because I experience this problem with
non-OSX R as well...when I run it in Linux or from the OSX command-line (I
compile R for Darwin without aqua/R-framework), the same thing happens.

Is there some way around this? Is this a known problem?

Google searching suggests no solution, timeline, or anything, but the
problem has been annoying users for at least twelve years:
http://tolstoy.newcastle.edu.au/R/help/9704/0151.html

Cordially,
Adam

On Mon, 15 Mar 2010, Matthew Keller wrote:


HI all,

Apologies for this question. I'm sure it's been asked many times, but
despite 20 minutes of looking, I can't find the answer. I never use
the GUI, I use emacs, but my postdoc does, so I don't know what to
tell her about the following:

Occasionally she'll mess up in her code and cause R to hang
indefinitely (e.g., R is trying to do something that will take days).
In these situations, is there an option other than killing R (and the
work you've done on your script to that point)?

Thank you,

Matthew Keller


--
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

___
R-SIG-Mac mailing list
r-sig-...@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parallel computing--help

2010-03-02 Thread Adam D. I. Kramer

Hi Sayan,

Just stumbled upon this email, three months old, since I was having
the same problem, and noticed you hadn't been answered.  I was able to solve
this problem by noting that the snow library was not installed on the
nodes--it has to be installed everywhere.

So, you need to ssh to the ip to another computer on the network , then be
sure the same version of R and the snow package are installed on that
computer, and THEN it should work.

Good luck!

--Adam


Hey R Users

I am trying to to  implement a linux snow sockets parallelism in R, I am
pretty new to parallel computing

Here is the problem I am facing

Whenever I am doing

newSOCKnode(localhost) Its working fine

But whenever I do newSOCKnode(ip to another computer on the network )
It throws up the following error
Fatal error: cannot open file
'/home/t136/R/i486-pc-linux-gnu-library/2.9/snow/RSOCKnode.R': No such file
or directory
R is stuck I have to force quit it with CTRL C . Now t136 is the machine I
am working on and manually I have checked that there exists a file
RSOCKnode.R on the specified directory

SSH is working fine as I have checked   system(ssh -l me  'ip to another
computer on the network' ls)
Works fine and I am getting the list of files on the other computer on my
machine


Please help!




Sayan Dasgupta



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Command-line arguments and --interactive

2009-11-10 Thread Adam D. I. Kramer


On Tue, 10 Nov 2009, Duncan Murdoch wrote:

--interactive tells R that there is a human producing the input stream, so it 
can ask questions and expect them to be answered.  In your experiments with 
it, your input stream was the pipe holding the output of echo, and R got 
confused because that pipe wouldn't answer its question.


I see. That makes sense, and thank you.


Your problem is that you want an input stream that starts out from your
fixed code and then switches to your shell's stdin.


Correct. Or, you might consider it an ability to add a command or two to the
effective end of .Rprofile.


I think you can do that on some systems by saving your input to a file
then concatenating it to stdin, e.g.  something like this:

echo '10*5; scan()' test.R
cat test.R - | R --interactive

I don't know if there's a way to do this in one line, and I'd expect some 
oddities.


Creating a new file progammatically is difficult.

Another solution, suggested by peterdc in the #R irc channel, was to run R
itself and then use the system() command to complete the
prior-to-launching-R task...that is the solution I'm going with, but thanks
very much for your help!

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hand-crafting an .RData file

2009-11-09 Thread Adam D. I. Kramer

Thanks as always for a very helpful response. I'm now loading a few million
rows in only a few seconds.

Cordially,
Adam Kramer

On Mon, 9 Nov 2009, Prof Brian Ripley wrote:

The R 'save' format (as used for the saved workspace .RData) is described in 
the 'R Internals' manual (section 1.8).  It is intended for R objects, and 
you would first have to create one[*] of those in your other application. 
That seems a lot of work.


The normal way to transfer numeric data between applications is to write a 
binary file: R can read such files with readBin(), and it also has 
wrappers/C-code to read a number of commmon binary data formats (e.g. those 
from SPSS).


With character data there are more issues (and more formats, see also 
readChar()), but load() is not particularly fast for those.


Ultimately the R functions pay a performance price for their flexibility so 
hand-crafted C code to read the format can be worthwhile: but see the 
comments below about whether I/O speed is that important.


[*] the 'save' format is a serialization of a single R object, even if you 
save many objects, since the object(s) are combined into a pairlist.


On Sun, 8 Nov 2009, Adam D. I. Kramer wrote:


Hello,

I frequently have to export a large quantity of data from some
source (for example, a database, or a hand-written perl script) and then
read it into R.  This occasionally takes a lot of time; I'm usually using
read.table(filename,comment.char=,quote=) to read the data once it is
written to disk.


Specifying colClasses and nrows will usually help.

To read from a database, packages such as RODBC use binary data transfer: 
with suitable tuning this can be fast.



However, I *know* that the program that generates the data is more
or less just calling printf in a for loop to create the csv or 
tab-delimited
file, writing, then having R parse it, which is pretty inefficient. 
Instead, I am interested in figuring out how to write the data in .RData

format so that I can load() it instead of read.table() it.


Without more details it is hard to say if it is inefficient. read.table() can 
read data pretty fast (millions of items per second) if used following the 
hints in the 'R Data' manual.  See e.g.

https://stat.ethz.ch/pipermail/r-devel/2004-December/031733.html

Almost anything non-trivial one might do with such data is much slower.  The 
trend is to write richer (and slower to read) data formats.



Trolling the internet, however, has not suggested anything about the
specification for an .RData file. Could somebody link me to a specification
or some information that would instruct me on how to construct a .RData
file (either compressed or uncompressed)?

Also, I am open to other suggestions of how to get load()-like
efficiency in some other way.

Many thanks,
Adam D. I. Kramer


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Command-line arguments and --interactive

2009-11-09 Thread Adam D. I. Kramer

Hello,

I am interested in passing a command or two to R on the command
line. The desired behavior is for R to run these commands first, and then
begin an interactive session. For example:

$ R -e 'foo - read.csv(/tmp/foo.csv)'
...which would launch R and execute that command, so when I return from
getting coffee, foo would be loaded.

I know that if I put this command into my .Rprofile, it will execute as
desired, but I'm looking for something a bit more modular.

I have tried the following:

$ echo '10*5' | R
...fails unless I specify --save --no-save or --vanilla. I choose the
latter; R prints 50 and quits...but I do not want it to quit!

$ echo '10*5' | R --interactive

...fails in an interesting way: I get an infinite loop of Save workspace
image? [y/n/c]: Save workspace image? [y/n/c]:, etc. I have to kill R in
another window. This also happens with
$ echo '10*5; scan()' | R --interactive

...even though I'd expect scan() to prompt for input. If I specify

$ echo '10*5; scan()' | R --interactive --vanilla

...then R just prints 50 and quits. Is this a bug with interactive? The
description of Force an interactive session is not informative enough for
me to have any further guess as to what to do here, but my expectation is
that it would make R not auto-quit.

...perhaps I don't know how to use --interactive properly? Could somebody
point me on the right track? Googling for R --interactive is nigh untu
useless. :-\

This behavior is present in R 2.9.2 and 2.10.0.

Many thanks!

Cordially,
Adam Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hand-crafting an .RData file

2009-11-08 Thread Adam D. I. Kramer

Hello,

I frequently have to export a large quantity of data from some
source (for example, a database, or a hand-written perl script) and then
read it into R.  This occasionally takes a lot of time; I'm usually using
read.table(filename,comment.char=,quote=) to read the data once it is
written to disk.

However, I *know* that the program that generates the data is more
or less just calling printf in a for loop to create the csv or tab-delimited
file, writing, then having R parse it, which is pretty inefficient. 
Instead, I am interested in figuring out how to write the data in .RData

format so that I can load() it instead of read.table() it.

Trolling the internet, however, has not suggested anything about the
specification for an .RData file. Could somebody link me to a specification
or some information that would instruct me on how to construct a .RData
file (either compressed or uncompressed)?

Also, I am open to other suggestions of how to get load()-like
efficiency in some other way.

Many thanks,
Adam D. I. Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing loadings (next to each other)

2009-07-18 Thread Adam D. I. Kramer

Dear colleagues,

I've been running some principal components analyses, which generate
tables of loadings that I'm interested in looking at. 
print(f1$rot$load,cutoff=.4) is what I use, and it gives me what I want.


However, I'm now interested in comparing these loadings across a few
data sets. In other words, I would like R to match the loadings on
rownames() and display them next to each other, e.g.:

Loadings:
PC 1PC 2PC 1PC 2PC 1PC 2
col10.400.450.90
col2		0.80		0.55 
col3		0.77		0.70		0.42


...I could certainly just cbind them together, but then I can't class them
as loadings:


class(x) - loadings

Error in class(x) - loadings :
  cannot coerce type 'closure' to vector of type 'character'

...and I'm very interested in using the cutoff feature of print.loadings.

Any suggestions? I could go in and alter print.loadings myself, but if
there's an easier way let me know.

Many thanks,
--
Adam D. I. Kramer
a...@uoregon.edu
Ph.D. Candidate, Social and Personality Psycholgoy
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sem and nlm and ols instead of ml

2009-04-24 Thread Adam D. I. Kramer

Dear colleagues,

Has anybody any experience using the sem package to fit structural
equation models using a fitting function other than ML? I have heard tell
that OLS may provide better estimates when using standardized matrices
generated from small sample sizes, so I was interested in comparing the two
for a few models. However, ML appears to be hard-coded into the source for
sem...but maybe there is some way around this.

So, if anybody has done this, has a hint as to how to do this, or
would be able to say that this is perhaps way too much trouble to try, I
would appreciate advice on the topic.

--
Adam D. I. Kramer
Ph.D. Candidate, Social and Personality Psychology
University of Oregon
a...@uoregon.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why na.rm=FALSE is the default

2009-03-24 Thread Adam D. I. Kramer

Dear Colleagues,

I've been searching for a post or article or something which
explains why having na.rm=FALSE or na.action=na.fail as the default is a
better choice than TRUE or na.omit.

I understand the basic argument: it does not make sense to average a
nonexistance into an aggregate, and removing them implicitly leads to
accidental pairwise deletion in some cases, and sum(x) / length(x)  mean(x)
(which many would find disturbing)...I'm just looking for a source to cite
on this issue to support mimicking R's behavior in a database system's
aggregating functions (sum, avg, var, etc.).

Cordially,
Adam Kramer
Ph.D. Candidate, Social Psychology
University of Oregon
adik at uoregon dot edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregating multiple columns

2009-03-19 Thread Adam D. I. Kramer

Dear colleagues,

Consider the following data frame:

x - data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10))

...it is my goal to aggregate x to compute a linear effect of order
for each subject. So, ideally, result would be a vector containing a single
number for each subject, representing the linear relationship between y and
order.

I first tried this:

result - aggregate(x[1:2,],list(subject=x$subject),
function (z) { lm(y ~ order, data=z)$coefficients[2] }
  )

...because lm(y ~ order, data=x, subset=x$subject==1)$coefficients[2] would
give me the correct term for subject 1 (i.e., that is the number I am
actually looking for).

However, when used on data frames, aggregate() aggregates every
COLUMN in x _separately_ using FUN...while lm needs both columns *together.*

...I then turned to tapply, but that is useful only on atomic
objects, and not data frames.

I have two solutions, which I find inelegant and slow:

1) result - sapply(levels(factor(x$subject)),
   function(z) { lm(y ~ order, data=x, 
subset=subject==z)$coefficients[2]}
 )

...this gets the job done, but is very slow.

2) result - c();
for (z in 1:nlevels(x$s2)) { result[z] - lm(y ~ order, data=x,
subset=x$s2==levels(x$s2)[z])$coefficients[2] };
result - unlist(result);

...also does the job, but is also very slow.

Is there a better solution? I miss the speed of tapply and aggregate; the
example has only 100 rows and 10 subjects, but the actual data has many more
of each.

Cordially,
Adam D. I. Kramer
Ph.D. Candidate, Social and Personality Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.exp?

2009-03-10 Thread Adam D. I. Kramer

Dear Colleagues,

The foreign library, ever-exceptionally useful, does not seem able
to deal with this ancient SPSS ascii portable file I have with extension
'.exp'.

If anybody has familiarity with this filetype and could point me
towards an R function that would allow me to get the data from this file, I
would be much obliged! Or, even knowledge of the specification for these
filetypes would be helpful, and I'll try my own hand at writing the import
function.

Cordially,
Adam Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Issue with step(): Fails to look for object$model

2009-02-14 Thread Adam D. I. Kramer

Hi,

I'm playing around with stepwise regression, using the step
function, and believe I have found a bug (or at least, a strong case for
improvement):


ex - data.frame(y=rnorm(100),x=rnorm(100))
l - lm(y ~ x, data=ex)
step(l) 

[output is correct]

rm(ex)
step(l)

Start:  AIC=11.79
y ~ x

   Df Sum of Sq RSS AIC
- x 1 0.120 108.221   9.900
none  108.100  11.789
Error in inherits(x, data.frame) : object ex not found

...ex is not found, so step fails. However, all of the necessary data to run
the step function is present in l$model.

I would also argue that step() *should* use l$model if at all possible, as
it seems reasonable to expect that ex may undergo changes. Further, step()
does not appear to test any other columns present in ex (even if
direction=both is specified), unless they are specified in scope.

If I am misunderstanding step() or if there is a good reason why it operates
this way, please let me know!

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with step(): Fails to look for object$model

2009-02-14 Thread Adam D. I. Kramer

A fine answer, and indeed I am aware of the option to leave the model empty.
I like the option of consistently not being protected from myself.

That said, some other R functions (predict comes to mind) use $model if it
exists (which it usually does, though needn't) but allow new (or other)
data to be provided.

So while I can't argue against trusting $call, I think there's a decent
argument that the data held within the model is the most trustworthy and
appropriate to use (though an explicitly named other data frame should
surely be useful as well).

In the meantime, I'll just modify l$call, and I thank you for that
suggestion.

--Adam


On Sun, 15 Feb 2009, bill.venab...@csiro.au wrote:


Without arguing you case I would point out that the data need not be there
in l$model:


l - lm(y ~ x, data=ex, model = FALSE)
l


Call:
lm(formula = y ~ x, data = ex, model = FALSE)

Coefficients:
(Intercept)x
0.1310  -0.1736


l$model

NULL

Model objects are often huge simply because by default they squirrel away
a copy of the original data inside themselves.  One way to avoid memory
problems can be only to keep this backup copy within the fitted model
object only when you really need to do so, which is not all that often, in
fact.

I can see why step() (and allies) do not work the way you think they
should.  These functions use the call component of the fitted model object
and modify that.  And if the call component says your data are in the data
frame 'ex', they take you seriously.  They your word for it.  We're not
dealing with Microsoft software here, you know.

Bill Venables
http://www.cmis.csiro.au/bill.venables/


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Adam D. I. Kramer
Sent: Sunday, 15 February 2009 12:44 PM
To: r-help@r-project.org
Subject: [R] Issue with step(): Fails to look for object$model

Hi,

I'm playing around with stepwise regression, using the step
function, and believe I have found a bug (or at least, a strong case for
improvement):


ex - data.frame(y=rnorm(100),x=rnorm(100))
l - lm(y ~ x, data=ex)
step(l)

[output is correct]

rm(ex)
step(l)

Start:  AIC=11.79
y ~ x

   Df Sum of Sq RSS AIC
- x 1 0.120 108.221   9.900
none  108.100  11.789
Error in inherits(x, data.frame) : object ex not found

...ex is not found, so step fails. However, all of the necessary data to run
the step function is present in l$model.

I would also argue that step() *should* use l$model if at all possible, as
it seems reasonable to expect that ex may undergo changes. Further, step()
does not appear to test any other columns present in ex (even if
direction=both is specified), unless they are specified in scope.

If I am misunderstanding step() or if there is a good reason why it operates
this way, please let me know!

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tunnelling X for R graphics

2009-02-03 Thread Adam D. I. Kramer

Thanks very much for the reassurance.

Really, I can just open a new X11 device on the same display, since the
display (localhost:10) is effectively reconnected when I ssh in again.

I'll reply again to this post if I find other parts of R working poorly
after the disconnection.

--Adam

On Tue, 3 Feb 2009, Prof Brian Ripley wrote:

To answer your basic question, you do need to shut down everything involivng 
X, that is X11() devices and the X11 dataeditor.  If you do that (and 
graphics.off() will suffice for the first), you should be able to re-open an 
X11 device on another display (which is what presumably a new VNC connection 
gives you).


The warning comes from any X erorr, and it is not possible to know how 
serious it is without external information.


On Mon, 2 Feb 2009, Adam D. I. Kramer wrote:



On Tue, 3 Feb 2009, Patrick Connolly wrote:


The problem, and maybe I'm just whining here, is that because the
data sets are large this takes several minutes where I'm basically just
sitting around.  This happens once every other day as the VPN software
I'm using times out after about 24 hours and thus the ssh session dies.


Is it possible to do anything about the VPN software?  I use tightVNC to
do something similar and it doesn't time out after 24 hours.  Even closing
the desktop machine down altogether does not lose the ssh connexion. 
Restarting the desktop a week later will still find the X session without

loss.


The VPN software is managed and maintained by the company I'm doing
statistical computing work for...out of my control. Your comments about
TightVNC are pretty impressive, though--I'm not really sure how that would
work...though if you set your ssh connection to not push any data towards
your computer, I gather the server would have no reason to believe you were
unresponsive?

In any case, this sadly doesn't help me, but many thanks!

For now, I'm just trying my hardest to remeber to dev.off() when I'm done
using graphics.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tunnelling X for R graphics

2009-02-02 Thread Adam D. I. Kramer

Indeed, I am running R in screen. That is the context in which this error
occurs.

The problem is that screen passes $DISPLAY as the $DISPLAY for the actual
terminal. So when the ssh session dies, the X11 connection is broken.

The REST of R works fine...which is why I use screen in the first place. My
data is not lost, etc., however it tells me I need to save and quit
immediately.  That is my concern.

--Adam

On Sat, 31 Jan 2009, Dylan Beaudette wrote:


Try starting your R session after starting a 'screen' session. Like this:

$ screen
$ R
# do stuff, when taking a break do CTRL-A D to disconnect
# use as normal

See the man page for screen, it is basically a terminal multiplexer
that can gracefully accommodate connection failures. If you get
disconnected, re-connect, and then re-attach the screen process:
$ screen -r

and you should be ok.

Cheers,

Dylan


On 1/31/09, Adam D. I. Kramer a...@ilovebacon.org wrote:

Dear colleagues,

I run R on a few different machines, and view graphs and the like by
tunnelling X through SSH to my local machine. This is useful for me because
my local machine can't easily handle some of the data sets I work with.

However, when an ssh connection dies, the tunnelled X session also
dies, which breaks R's device connection, generating this error:


Error: X11 fatal IO error: please save work and shut down R


...that's kinda scary, so I quit(save=yes) and then run R again.

The problem, and maybe I'm just whining here, is that because the
data sets are large this takes several minutes where I'm basically just
sitting around.  This happens once every other day as the VPN software I'm
using times out after about 24 hours and thus the ssh session dies.

I can't really guess at why a broken X session would corrupt a
running session of R so severely that it would need to be completely
restarted.  Can anyone explain this to me?  Or perhaps (hopefully) someone
has enough knowledge of the X11 device to be able to tell me that I can
ignore this message, and just use dev.off() and then X11(localhost:10) to
open a new working X11 connection?

Cordially,
Adam Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems in Recommending R

2009-02-02 Thread Adam D. I. Kramer


On Tue, 3 Feb 2009, Rolf Turner wrote:


I think the R website is just fine as it is.  Effort should be put into
content and not into cosmetics.  Those (potential) users who would be
likely to be influenced by such trivialities as the appearance of the web
page are unlikely to be the sort of people who would use R anyway.


I respectfully disagree. In my repeated experience, I have seen colleagues
in industry and university simply write R off as too difficult or not
worth the effort based on purely cosmetic grounds, and then at my urging
and after some instruction embrace R as being a fantastic piece of software.

The reality of the situation is that before you read a book, you only have
its cover to judge. Suggesting that people should read every book regardless
of the cover does not make sense for people who have other things to do.

In the ecological context of open-source software, the cover or cosmetics
of a software program, its documentation, and its support structure are
actually quite correlated with overall ease of use, and if functionality is
modeled as the factorial interaction of information produced with the
amount of time it takes to produce the information, then functionality
correlates with ease of use, and so the appearance of the webpage is not a
triviality.

Cordially,
--
Adam D. I. Kramer
Ph.D. Student, Social Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power analysis for MANOVA?

2009-02-02 Thread Adam D. I. Kramer

Hi Rick,

I understand the authors' point and also agree that post-hoc power
analysis is basically not telling me anything more than the p-value and
initial statistic for the test I am interested in computing power for.

Beta is a simple function of alpha, p, and the statistic.

My intention is, as I mentioned in my response to Stephan Kolassa,
to transform my p-value and statistic into a form of effect size--sample
size necessary to attain significance at alpha=.05. This will communicate no
more information, it is just a mathematical re-representation of my data in
a way I believe my readers will find more informative and useful. In other
words, there is no more information *encoded*, but there is more information
*communicated,* just like for any effect size measure.

If you have any suggestions on a more reliable effect size for
MANOVA which is *also* commonly known in the social psychology community
(e.g., a correlation or Cohen's d analogue), I'm interested--but the
multivariate nature of the beast makes these more or less impossible to
translate.

The poster I was asking for is now printed, and we reported the
multivariate R-squared using the techniques in Cohen (1988), though I'm
expecting to spend a lot of time explaining what that means to people in a
multivariate context, rather than describing the results of the study.

Cordially,
Adam D. I. Kramer

On Sun, 1 Feb 2009, Rick Bilonick wrote:


On Wed, 2009-01-28 at 21:21 +0100, Stephan Kolassa wrote:

Hi Adam,

first: I really don't know much about MANOVA, so I sadly can't help you
without learning about it an Pillai's V... which I would be glad to do,
but I really don't have the time right now. Sorry!

Second: you seem to be doing a kind of post-hoc power analysis, my
result isn't significant, perhaps that's due to low power? Let's look at
the power of my experiment! My impression is that post-hoc power
analysis and its interpretation is, shall we say, not entirely accepted
within the statistical community, see:

Hoenig, J. M.,  Heisey, D. M. (2001, February). The abuse of power: The
pervasive fallacy of power calculations for data analysis. The American
Statistician, 55 (1), 1-6

And this:
http://staff.pubhealth.ku.dk/~bxc/SDC-courses/power.pdf

However, I am sure that lots of people can discuss this more competently
than me...

Best wishes
Stephan



The point of the article was that doing a so-called retrospective
power analysis leads to logical contradictions with respect to the
confidence intervals and p-values from the analysis of the data. In
other words, DON'T DO IT! All the information is contained in the
confidence intervals which are based on the observed data - an after the
fact power analysis cannot provide any insight - it's not data
analysis.

Rick B.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tunnelling X for R graphics

2009-02-02 Thread Adam D. I. Kramer


On Tue, 3 Feb 2009, Patrick Connolly wrote:


The problem, and maybe I'm just whining here, is that because the
data sets are large this takes several minutes where I'm basically just
sitting around.  This happens once every other day as the VPN software
I'm using times out after about 24 hours and thus the ssh session dies.


Is it possible to do anything about the VPN software?  I use tightVNC to
do something similar and it doesn't time out after 24 hours.  Even closing
the desktop machine down altogether does not lose the ssh connexion. 
Restarting the desktop a week later will still find the X session without

loss.


The VPN software is managed and maintained by the company I'm doing
statistical computing work for...out of my control. Your comments about
TightVNC are pretty impressive, though--I'm not really sure how that would
work...though if you set your ssh connection to not push any data towards
your computer, I gather the server would have no reason to believe you were
unresponsive?

In any case, this sadly doesn't help me, but many thanks!

For now, I'm just trying my hardest to remeber to dev.off() when I'm done
using graphics.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplots by variable

2009-02-02 Thread Adam D. I. Kramer

Does boxplot(as.data.frame(final)) do what you want?

--Adam

On Mon, 2 Feb 2009, Vemuri, Aparna wrote:


Dear R users,

I have a matrix final which looks like this:

final
oSO4   oNO3   mSO4   mNO3
[1,]  3.3728 0.2110  1.9517421 1.01883602
[2,]  0.8249 0.0697  1.5970292 0.11368781
[3,]  0.2636 0.1004  0.6012445 0.24356332
[4,]  8.0072 0.3443  6.1016998 3.63207149
[5,] 13.5079 0.6593 12.4011068 1.55323386
[6,]  6.1293 0.1989  5.7620926 0.12884845
[7,]  0.6004 0.0661  0.7375408 0.17218600
[8,]  0.6912 0.1672  1.1563314 0.13469750
[9,]  1.0478 0.1504  1.5637809 0.99000758
[10,]  0.4825 0.1160  0.2297545 0.08121805

I would like to create boxplots for this matrix with data binned by
oNO3, oSO4, mNO3 and mSO4, all in the same plot.

I tried
boxplot(final), boxplot(final[,1]) etc. But all those commands create
individual plots and not what I am trying to achieve. I was wondering if
there is an R equivalent of the matlab command hold on or if there is
a simpler way around this.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tunnelling X for R graphics

2009-01-31 Thread Adam D. I. Kramer

Dear colleagues,

I run R on a few different machines, and view graphs and the like by
tunnelling X through SSH to my local machine. This is useful for me because
my local machine can't easily handle some of the data sets I work with.

However, when an ssh connection dies, the tunnelled X session also
dies, which breaks R's device connection, generating this error:


Error: X11 fatal IO error: please save work and shut down R


...that's kinda scary, so I quit(save=yes) and then run R again.

The problem, and maybe I'm just whining here, is that because the
data sets are large this takes several minutes where I'm basically just
sitting around.  This happens once every other day as the VPN software I'm
using times out after about 24 hours and thus the ssh session dies.

I can't really guess at why a broken X session would corrupt a
running session of R so severely that it would need to be completely
restarted.  Can anyone explain this to me?  Or perhaps (hopefully) someone
has enough knowledge of the X11 device to be able to tell me that I can
ignore this message, and just use dev.off() and then X11(localhost:10) to
open a new working X11 connection?

Cordially,
Adam Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: can't get package boot to load

2009-01-31 Thread Adam D. I. Kramer

If you check ?library, you will see that the package argument only takes one
packages. So your code was like saying library(package=RODBC, help=boot)
which does not make sense...but which would indeed load RODBC and not boot.
That is why your problem occurred and why your solution worked.

--Adam

On Sat, 31 Jan 2009, Gary Smith wrote:


Changing
library(RODBC,boot)
to
library(RODBC)
library(boot)
seems to have solved the problem.

 _

_
From: Gary Smith [mailto:gary.smit...@comcast.net]
Sent: Saturday, January 31, 2009 12:55 PM
To: 'r-help@R-project.org'
Subject: can't get package boot to load

Hi,
I am new to R and I'm totally confused by this problem.  I'm trying to load
data and run a simple correlation using corr() in the boot package.
Yesterday my code worked.  Today it can't find the function corr().  I've
tried searching the web and haven't found the root of my problem.  Apologies
if I've missed the obvious.
My code is simple:

install.packages('RODBC')
install.packages('boot')
library(RODBC,boot)
setwd(C:/Documents and Settings/Gary Smith/My Documents/Blog/)
data = odbcConnectExcel(LafferCurve2.xls)
mydata = sqlFetch(data,Data)
n=23
m=73
gnp=mydata[[3]][n:m]
incTaxFirst=mydata[[7]][n:m]
corr(cbind(gnp,incTax))

Here are the console messages:


install.packages('RODBC')

Warning: package 'RODBC' is in use and will not be installed

install.packages('boot')

trying URL
'http://cran.stat.ucla.edu/bin/windows/contrib/2.8/boot_1.2-35.zip'
Content type 'application/zip' length 779420 bytes (761 Kb)
opened URL
downloaded 761 Kb

package 'boot' successfully unpacked and MD5 sums checked

The downloaded packages are in
   C:\Documents and Settings\Gary Smith\Local
Settings\Temp\RtmpRT1RUh\downloaded_packages
updating HTML package descriptions

library(RODBC,boot)
setwd(C:/Documents and Settings/Gary Smith/My Documents/Blog/)
data = odbcConnectExcel(LafferCurve2.xls)
mydata = sqlFetch(data,Data)
n=23
m=73
gnp=mydata[[3]][n:m]
incTaxFirst=mydata[[7]][n:m]
corr(cbind(gnp,incTax))

Error: could not find function corr
Using search(), I see:
[1] .GlobalEnvpackage:RODBC
[3] package:stats package:graphics
[5] package:grDevices package:utils
[7] package:datasets  package:methods
[9] Autoloads package:base

Typing install.packages('boot') at the prompt gets me the same result.

Thanks for any help.

Gary
___
Accept no assertions without evidence.  --Goompah wisdom from Omega by
Jack McDevitt--

Nature is complete because it does not serve itself.  --Tao De Jing by
Lao Tze--




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power analysis for MANOVA?

2009-01-29 Thread Adam D. I. Kramer

Thanks for the response, Stephan.

Really, I am trying to say, My result is insignificant, my effect sizes are
tiny, you may want to consider the possibility that there really are no
meaningful differences. Computing post-hoc power makes a bit stronger of a
claim in this setting.

My real goal in this case was to put a single line on a poster that says,
Significance using our estimates would require N observations, which is
larger than the population. I am trying to solve for N.  N in this case is
a sort of effect size.  In this case, it is indeed a simple transformation
of Pillai's V and the p-value for the study, and I do not intend to suggest
that it is anything more than that.  However, I believe that the latter
effect size makes a much more compelling case, given that a lot of people
(such as yourself) don't have much experience with Pillai's V.

--Adam

On Wed, 28 Jan 2009, Stephan Kolassa wrote:


Hi Adam,

first: I really don't know much about MANOVA, so I sadly can't help you 
without learning about it an Pillai's V... which I would be glad to do, but I 
really don't have the time right now. Sorry!


Second: you seem to be doing a kind of post-hoc power analysis, my result 
isn't significant, perhaps that's due to low power? Let's look at the power 
of my experiment! My impression is that post-hoc power analysis and its 
interpretation is, shall we say, not entirely accepted within the statistical 
community, see:


Hoenig, J. M.,  Heisey, D. M. (2001, February). The abuse of power: The 
pervasive fallacy of power calculations for data analysis. The American 
Statistician, 55 (1), 1-6


And this:
http://staff.pubhealth.ku.dk/~bxc/SDC-courses/power.pdf

However, I am sure that lots of people can discuss this more competently than 
me...


Best wishes
Stephan


Adam D. I. Kramer schrieb:


On Mon, 26 Jan 2009, Stephan Kolassa wrote:


My (and, judging from previous traffic on R-help about power analyses,
also some other people's) preferred approach is to simply simulate an
effect size you would like to detect a couple of thousand times, run your
proposed analysis and look how often you get significance.  In your simple
case, this should be quite easy.


I actually don't have much experience running monte-carlo designs like
this...so while I'd certainly prefer a bootstrapping method like this one,
simulating the effect size given my constraints isn't something I've done
before.

The MANOVA procedure takes 5 dependent variables, and determines what
combination of the variables best discriminates the two levels of my
independent variable...then the discrimination rate is represented in the
statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). 
So

coming up with a set of constraints that would produce V=.00019 given my
data set doesn't quite sound trivial...so I'll go for the par library
reference mentioned earlier before I try this.  That said, if anyone can
refer me to a tool that will help me out (or an instruction manual for 
RNG),

I'd also be much obliged.

Many thanks,
Adam




HTH,
Stephan


Adam D. I. Kramer schrieb:

Hello,

I have searched and failed for a program or script or method to
conduct a power analysis for a MANOVA. My interest is a fairly simple 
case

of 5 dependent variables and a single two-level categorical predictor
(though the categories aren't balanced).

If anybody happens to know of a script that will do this in R, I'd
love to know of it! Otherwise, I'll see about writing one myself.

What I currently see is this, from help.search(power):

stats::power.anova.test
Power calculations for balanced one-way
analysis of variance tests
stats::power.prop.test
Power calculations two sample test for
proportions
stats::power.t.test Power calculations for one and two sample t
tests

Any references on power in MANOVA would also be helpful, though of
course I will do my own lit search for them myself.

Cordially,
Adam D. I. Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.











__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need help combining two datasets

2009-01-27 Thread Adam D. I. Kramer

You probably want the merge function.

?merge

--Adam

On Wed, 28 Jan 2009, Somani, Dinesh K wrote:


Hi

I am a new R user.

I have two CSV files, one with daily stock returns using method A {date,
stock, returnA, some uninteresting columns}, and another with method B
{date, stock, returnB, more columns}.  Both have different sets of stocks.

I want to combine the two into a single data table, so that I can run some
analyses for the overlapping date ranges and stocks.  I know how to do
this using a database but is there an equivalent way to perform a similar
kind of join in R?

Data size is small - just a few years worth of daily data.

Would appreciate your help.

Thanks a lot
Dinesh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Power analysis for MANOVA?

2009-01-26 Thread Adam D. I. Kramer

Hello,

I have searched and failed for a program or script or method to
conduct a power analysis for a MANOVA. My interest is a fairly simple case
of 5 dependent variables and a single two-level categorical predictor
(though the categories aren't balanced).

If anybody happens to know of a script that will do this in R, I'd
love to know of it! Otherwise, I'll see about writing one myself.

What I currently see is this, from help.search(power):

stats::power.anova.test
Power calculations for balanced one-way
analysis of variance tests
stats::power.prop.test
Power calculations two sample test for
proportions
stats::power.t.test Power calculations for one and two sample t
tests

Any references on power in MANOVA would also be helpful, though of
course I will do my own lit search for them myself.

Cordially,
Adam D. I. Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power analysis for MANOVA?

2009-01-26 Thread Adam D. I. Kramer


On Mon, 26 Jan 2009, Stephan Kolassa wrote:


My (and, judging from previous traffic on R-help about power analyses,
also some other people's) preferred approach is to simply simulate an
effect size you would like to detect a couple of thousand times, run your
proposed analysis and look how often you get significance.  In your simple
case, this should be quite easy.


I actually don't have much experience running monte-carlo designs like
this...so while I'd certainly prefer a bootstrapping method like this one,
simulating the effect size given my constraints isn't something I've done
before.

The MANOVA procedure takes 5 dependent variables, and determines what
combination of the variables best discriminates the two levels of my
independent variable...then the discrimination rate is represented in the
statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71).  So
coming up with a set of constraints that would produce V=.00019 given my
data set doesn't quite sound trivial...so I'll go for the par library
reference mentioned earlier before I try this.  That said, if anyone can
refer me to a tool that will help me out (or an instruction manual for RNG),
I'd also be much obliged.

Many thanks,
Adam




HTH,
Stephan


Adam D. I. Kramer schrieb:

Hello,

I have searched and failed for a program or script or method to
conduct a power analysis for a MANOVA. My interest is a fairly simple case
of 5 dependent variables and a single two-level categorical predictor
(though the categories aren't balanced).

If anybody happens to know of a script that will do this in R, I'd
love to know of it! Otherwise, I'll see about writing one myself.

What I currently see is this, from help.search(power):

stats::power.anova.test
Power calculations for balanced one-way
analysis of variance tests
stats::power.prop.test
Power calculations two sample test for
proportions
stats::power.t.test Power calculations for one and two sample t
tests

Any references on power in MANOVA would also be helpful, though of
course I will do my own lit search for them myself.

Cordially,
Adam D. I. Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power analysis for MANOVA?

2009-01-26 Thread Adam D. I. Kramer


On Mon, 26 Jan 2009, Charles C. Berry wrote:



If you know what a 'general linear hypothesis test' is see

http://cran.r-project.org/src/contrib/Archive/hpower/hpower_0.1-0.tar.gz



I do, and am quite interested, however this package will not install on R
2.8.1: First, it said that there was no maintainer in the description, so
I added one (figuring that the 1991 date of the package was to blame),
however it still will not compile:

parmesan:tmp$ sudo R CMD INSTALL hpower/
* Installing to library '/usr/local/lib/R/library'
* Installing *source* package 'hpower' ...
** R
** preparing package for lazy loading
Error in parse(n = -1, file = file) : unexpected '{' at
5: ##
6: pfnc_function(q,df1,df2,lm,iprec=c(6)) {
Calls: Anonymous - code2LazyLoadDB - sys.source - parse
Execution halted
ERROR: lazy loading failed for package 'hpower'
** Removing '/usr/local/lib/R/library/hpower'
parmesan:tmp$

...any tips?

--Adam


HTH,

Chuck

On Mon, 26 Jan 2009, Adam D. I. Kramer wrote:



On Mon, 26 Jan 2009, Stephan Kolassa wrote:


 My (and, judging from previous traffic on R-help about power analyses,
 also some other people's) preferred approach is to simply simulate an
 effect size you would like to detect a couple of thousand times, run your
 proposed analysis and look how often you get significance.  In your 
simple

 case, this should be quite easy.


I actually don't have much experience running monte-carlo designs like
this...so while I'd certainly prefer a bootstrapping method like this one,
simulating the effect size given my constraints isn't something I've done
before.

The MANOVA procedure takes 5 dependent variables, and determines what
combination of the variables best discriminates the two levels of my
independent variable...then the discrimination rate is represented in the
statistic (Pillai's V=.00019), which is then tested (F[5,18653] = 0.71). 
So

coming up with a set of constraints that would produce V=.00019 given my
data set doesn't quite sound trivial...so I'll go for the par library
reference mentioned earlier before I try this.  That said, if anyone can
refer me to a tool that will help me out (or an instruction manual for 
RNG),

I'd also be much obliged.

Many thanks,
Adam




 HTH,
 Stephan


 Adam D. I. Kramer schrieb:
  Hello,
   I have searched and failed for a program or script or method to
  conduct a power analysis for a MANOVA. My interest is a fairly simple  
case

  of 5 dependent variables and a single two-level categorical predictor
  (though the categories aren't balanced).
   If anybody happens to know of a script that will do this in R, 
I'd

  love to know of it! Otherwise, I'll see about writing one myself.
   What I currently see is this, from help.search(power):
   stats::power.anova.test
  Power calculations for balanced one-way
  analysis of variance tests
  stats::power.prop.test
  Power calculations two sample test for
  proportions
  stats::power.t.test Power calculations for one and two sample t
  tests
   Any references on power in MANOVA would also be helpful, though 
of

  course I will do my own lit search for them myself.
   Cordially,
  Adam D. I. Kramer
   __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html

  and provide commented, minimal, self-contained, reproducible code.
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




Charles C. Berry(858) 534-2098
   Dept of Family/Preventive 
Medicine

E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print a list in columns

2008-12-20 Thread Adam D. I. Kramer

Hi Borja,

write.table doesn't work, because it's not clear how to print it in
two columns, so we can't help you yet either.  Do you want to match the
first item in v1 with the first item in v2, and have a bunch of NAs hanging
off the end of the shorter column in the output?  Or align the last items? 
Or do you want to just drop the extra items in one column?  Etc.


On Sat, 20 Dec 2008, Borja Soto Varela wrote:


Dear R-Users

I have a list with two vectors of doubles tha have different lengths. I want
to export it to a file and I also want to print it in two columns.
I try with write.table but it need vectors of the same length.

Does anyone know how to do it?


Thanks
Borja

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tab completion / X11 fatal IO error

2008-12-19 Thread Adam D. I. Kramer

Dear colleagues,

I have two questions about R that I cannot answer via google
searching:

1) Is there any way to speed up tab completion? This is near-instant in
bash, and occasionally takes upwards of 10 seconds in R.  When I hit tab,
I've usually typed dataframe$ab or lst$ab or something, and
length(colnames(dataframe)) or length(names(lst)) are rarely very long.

2) I often run R from the linux command-line in a screen session on a remote
computer. As such, to see graphics, I tunnel X through ssh, then run screen,
then run X11(localhost:10) which opens an X window on my screen, just as
desired. However, if I forget to dev.off() before logging out of ssh (or
before the ssh connection drops on its own, which it does with some
frequency, hence running 'screen'), I come back to an error like this one:


Error: X11 fatal IO error: please save work and shut down R


...and then I shut down R (saving my data, which takes a few minutes) and
then load R again (loading my data, which once again takes several minutes),
which is a pretty big time sink.  Is this error message quite serious?  I
know and do not mind that the X connection has died, and I can't imagine
that this would really cause too many troubles, but I'm a bit afraid to
ignore a message that says to save and shut down.


version
   _ 
platform   x86_64-unknown-linux-gnu 
arch   x86_64 
os linux-gnu 
system x86_64, linux-gnu 
status 
major  2 
minor  8.0 
year   2008 
month  10 
day20 
svn rev46754 
language   R 
version.string R version 2.8.0 (2008-10-20)


Thanks in advance,
Adam Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Difficulty understanding sem errors / failed confirmatory factor analysis

2008-09-18 Thread Adam D. I. Kramer

Hello,

I'm trying to fit a pretty simple confirmatory factor analysis using
the sem package. There's a CFA example in the examples, which is helpful,
but the output for my (failing) model is hard to understand. I'd be
interested in any other ways to do a CFA in R, if this proves troublesome.

The CFA is replicating a 5 uncorrelated-factor structure (for those
interested, it is a structure of word usage patterns in weblogs) in a
special population. The model looks like model.txt (attached as many people
hate long emails); the correlation matrix cors.txt as well.

I'm setting no overlap between factors, no correlation between
factors, and estimating a separate variance for each observed variable
(which should be everything on the right-hand side of the - arrows), but
setting the factor variances equal to 1...pretty standard. I've ensured that
everything is typed correctly to the best I am able.

The problem:

library(sem)
model.kr - specify.model(file=model.txt) # printing it checks out ok
correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok
kr.sem - sem(ram=model.kr,S=correl,N=3034)
...about 10 seconds pass...
Warning message:
In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = vars,  :
  Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

(running qr on correl works fine; randomly-generated correl matrices fail in
the same way; I do not know how to further troubleshoot this)

...and then the model itself (which is produced, as the above was just a
warning):

summary(kr.sem)
Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))), par.code) :
  arguments imply differing number of rows: 47, 0

...both of these error messages are beyond my ability to troubleshoot. Any
help would be greatly appreciated. Because I am unsure what exactly the
problem with this analysis is, I can't create a simpler example for testing
purposes...but I think my model and correlation matrix are fairly simple.


unlist(R.Version())

  platform   arch
x86_64-unknown-linux-gnu   x86_64
os system
   linux-gnux86_64, linux-gnu
status  major
2
 minor   year
 7.2 2008
 monthday
  08   25
   svn rev   language
   46428R
version.string 
R version 2.7.2 (2008-08-25)


...sem installed via install.packages(sem) which I assume is current.

Cordially,
Adam Kramer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis

2008-09-18 Thread Adam D. I. Kramer

No new info, but the model and correlation table are pasted at the end of
this message.

--Adam

On Thu, 18 Sep 2008, Adam D. I. Kramer wrote:


Hello,

I'm trying to fit a pretty simple confirmatory factor analysis using
the sem package. There's a CFA example in the examples, which is helpful,
but the output for my (failing) model is hard to understand. I'd be
interested in any other ways to do a CFA in R, if this proves troublesome.

The CFA is replicating a 5 uncorrelated-factor structure (for those
interested, it is a structure of word usage patterns in weblogs) in a
special population. The model looks like model.txt (attached as many people
hate long emails); the correlation matrix cors.txt as well.

I'm setting no overlap between factors, no correlation between
factors, and estimating a separate variance for each observed variable
(which should be everything on the right-hand side of the - arrows), but
setting the factor variances equal to 1...pretty standard. I've ensured that
everything is typed correctly to the best I am able.

The problem:

library(sem)
model.kr - specify.model(file=model.txt) # printing it checks out ok
correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok
kr.sem - sem(ram=model.kr,S=correl,N=3034)
...about 10 seconds pass...
Warning message:
In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = vars, 
:

 Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

(running qr on correl works fine; randomly-generated correl matrices fail in
the same way; I do not know how to further troubleshoot this)

...and then the model itself (which is produced, as the above was just a
warning):

summary(kr.sem)
Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))), par.code) :
 arguments imply differing number of rows: 47, 0

...both of these error messages are beyond my ability to troubleshoot. Any
help would be greatly appreciated. Because I am unsure what exactly the
problem with this analysis is, I can't create a simpler example for testing
purposes...but I think my model and correlation matrix are fairly simple.


unlist(R.Version())

 platform   arch
   x86_64-unknown-linux-gnu   x86_64
   os system
  linux-gnux86_64, linux-gnu
   status  major
   2
minor   year
7.2 2008
monthday
 08   25
  svn rev   language
  46428R
   version.string R version 2.7.2 (2008-08-25)

...sem installed via install.packages(sem) which I assume is current.

Cordially,
Adam Kramer



model.kr - specify.model()
Melancholy - Affect, mel.aff, NA
Melancholy - Negemo, mel.neg, NA
Melancholy - Sad,  mel.sad, NA
Melancholy - Physcal, mel.phys, NA
Melancholy - Body, mel.phys, NA
Melancholy - Eating, mel.eat, NA
Melancholy - Groom, NA, 1
Social  - numlines, soc.nrow, NA
Social  - Leisure, soc.leis, NA
Social  - Home, soc.home, NA
Social  - Sports, soc.sports, NA
Social  - TV,  soc.tv, NA
Social  - Music, soc.mus, NA
Social  - Money, NA, 1
Rant  - Swear, rant.swear, NA
Rant  - Sexual, rant.sex, NA
Rant  - Anger, rant.anger, NA
Rant  - I,  NA, 1
Metaphysical - Metaph, met.met, NA
Metaphysical - Relig, met.relig, NA
Metaphysical - Death, NA, 1
Work  - Occup, work.occ, NA
Work  - School, work.school, NA
Work  - Job,  NA, 1
Affect - Affect,Affect.var,NA
Negemo - Negemo,Negemo.var,NA
Sad - Sad,Sad.var,NA
Physcal - Physcal,Physcal.var,NA
Body - Body,Body.var,NA
Eating - Eating,Eating.var,NA
Groom - Groom,Groom.var,NA
numlines - numlines,numlines.var,NA
Leisure - Leisure,Leisure.var,NA
Home - Home,Home.var,NA
Sports - Sports,Sports.var,NA
TV - TV,TV.var,NA
Music - Music,Music.var,NA
Money - Money,Money.var,NA
Swear - Swear,Swear.var,NA
Sexual - Sexual,Sexual.var,NA
Anger - Anger,Anger.var,NA
I - I,I.var,NA
Metaph - Metaph,Metaph.var,NA
Relig - Relig,Relig.var,NA
Death - Death,Death.var,NA
Occup - Occup,Occup.var,NA
School - School,School.var,NA
Job - Job,Job.var,NA
Melancholy - Melancholy, NA, 1
Social - Social, NA, 1
Rant - Rant, NA, 1
Work - Work, NA, 1
Metaphysical - Metaphysical, NA, 1

correl - matrix(0,nrow=24,ncol=24)
correl[lower.tri(correl,diag=TRUE)] - c(1, 0.940530496413442,
0.765560263936915, 0.705665939921134,
0.659038546655712, 0.282665099938120, 0.234892888297051, 0.0554321360979252,
0.671137592040541, 0.54382418910777, 0.463922205203901, 0.353418190097785, 
0.414864918025334, 0.436177075274485, 0.401019650838241, 0.370116202378091,

0.777297151879925, 0.676782523496444

Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis

2008-09-18 Thread Adam D. I. Kramer

Dear John,

On Thu, 18 Sep 2008, John Fox wrote:


I'm trying to fit a pretty simple confirmatory factor analysis using
the sem package. There's a CFA example in the examples, which is helpful,
but the output for my (failing) model is hard to understand. I'd be
interested in any other ways to do a CFA in R, if this proves
troublesome.

The CFA is replicating a 5 uncorrelated-factor structure (for those
interested, it is a structure of word usage patterns in weblogs) in a
special population. The model looks like model.txt (attached as many
people hate long emails); the correlation matrix cors.txt as well.


As far as I can see, the attachments aren't there. If you like, you can
send them to me privately. Without the input covariance matrix and your
model, it's very hard to tell what the source of the problem is, but one
guess (assuming that you've specified the model correctly) is that the
assumption of uncorrelated factors is too far off. Also see below.


I have pasted the matrix into another email; apologies for failing to attach
them acceptably before.

I also augmented the model to allow the factors to correlate, by adding
these lines to the model:

Melancholy - Social, Soc.Mel, NA
Melancholy - Rant, Rant.Mel, NA
Melancholy - Work, Work.Mel, NA
Melancholy - Metaphysical, Meta.Mel, NA
Social - Rant, Soc.Rant, NA
Social - Work, Soc.Work, NA
Social - Metaphysical, Soc.Meta, NA
Rant - Work, Rant.Work, NA
Rant - Metaphysical, Rant.Meta, NA
Work - Metaphysical, Work.Meta, NA

...and obtain the same errors.



I'm setting no overlap between factors, no correlation between
factors, and estimating a separate variance for each observed variable
(which should be everything on the right-hand side of the - arrows), but
setting the factor variances equal to 1...pretty standard. I've ensured

that

everything is typed correctly to the best I am able.

The problem:

library(sem)
model.kr - specify.model(file=model.txt) # printing it checks out ok
correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok
kr.sem - sem(ram=model.kr,S=correl,N=3034)
...about 10 seconds pass...
Warning message:
In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =

vars,

:
   Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

(running qr on correl works fine; randomly-generated correl matrices fail

in

the same way; I do not know how to further troubleshoot this)


Doing a QR decomposition on the correlation matrix of the data is
essentially irrelevant. The issue is the Hessian. (The scaled inverse
Hessian is the covariance matrix of the parameter estimates, not of the
data.) That you observe similar problems for randomly generated covariance
matrices may or may not be troublesome, depending upon how you generated
them.


df - as.data.frame(matrix(rnorm(3034*24),nrow=3034,ncol=24))
df.cor - cor(df)
rownames(df.cor) - colnames(df.cor) - colnames(correl)
sem.df - sem(model.kr, df.cor, 3034)

...which now does not throw errors with the new model, even though that syntax
was copied from my .Rhistory. I think I may have gotten unlucky with random
data the first time.

Thanks for the info on what the error message means, though--I was largely
in the dark on that.


...and then the model itself (which is produced, as the above was just a
warning):

summary(kr.sem)
Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))),

par.code) :

   arguments imply differing number of rows: 47, 0


If the Hessian isn't positive-definite, it won't be possible to get
estimated coefficient standard errors. I suspect that this is the source
of this error message. If so, it would be better for summary.sem() to
provide a more informative error message.


This makes sense. It may also be useful for the sem() function to throw an
error rather than a warning if the Hessian matrix cannot be decomposed,
perhaps? How often is an SEM model without estimated coefficient standard
errors desirable?

Thanks again for the assistance. I think the trouble may now be in my
correlation matrix; I will play around with my model and see whether
something else is more reasonable.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difficulty understanding sem errors / failed confirmatory factor analysis

2008-09-18 Thread Adam D. I. Kramer

Hi John,

Thanks very much for your response. It does appear now that my
correlation matrix and not your software is the problem. I apologize for
wasting your time! I do think that a more informative error message may have
prompted me to consider this possibility more fully.

--Adam

On Thu, 18 Sep 2008, John Fox wrote:


Dear Adam,

(1) Note that your input correlation matrix appears to be numerically
singular:


solve(R)

Error in solve.default(R) :
 system is computationally singular: reciprocal condition number =
2.38183e-17

det(R)

[1] -1.753523e-25

qr(R)$rank

[1] 23

(2) In addition, you have specified what are essentially redundant
constraints on the model, fixing *both* the loading for a reference
indicator for each factor to 1 *and* fixing the factor variances to 1. One
would normally do one or the other.

I think that (1) rather than (2) is the essential source of the problem, and
fixing (2) doesn't make the problem go away.

Because sem() can't compute the covariance matrix of the estimated
parameters, this component is missing from the returned object, causing the
cryptic error message in summary.sem(). I'll provide a more informative
error or warning.

I hope this helps,
John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

On

Behalf Of Adam D. I. Kramer
Sent: September-18-08 1:37 PM
To: r-help@r-project.org
Subject: Re: [R] Difficulty understanding sem errors / failed confirmatory
factor analysis

No new info, but the model and correlation table are pasted at the end of
this message.

--Adam

On Thu, 18 Sep 2008, Adam D. I. Kramer wrote:


Hello,

I'm trying to fit a pretty simple confirmatory factor analysis using
the sem package. There's a CFA example in the examples, which is

helpful,

but the output for my (failing) model is hard to understand. I'd be
interested in any other ways to do a CFA in R, if this proves

troublesome.


The CFA is replicating a 5 uncorrelated-factor structure (for those
interested, it is a structure of word usage patterns in weblogs) in a
special population. The model looks like model.txt (attached as many

people

hate long emails); the correlation matrix cors.txt as well.

I'm setting no overlap between factors, no correlation between
factors, and estimating a separate variance for each observed variable
(which should be everything on the right-hand side of the - arrows),

but

setting the factor variances equal to 1...pretty standard. I've ensured

that

everything is typed correctly to the best I am able.

The problem:

library(sem)
model.kr - specify.model(file=model.txt) # printing it checks out ok
correl - read.csv(cors.csv, header=TRUE) # printing it checks out ok
kr.sem - sem(ram=model.kr,S=correl,N=3034)
...about 10 seconds pass...
Warning message:
In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =

vars,

:
 Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

(running qr on correl works fine; randomly-generated correl matrices

fail

in

the same way; I do not know how to further troubleshoot this)

...and then the model itself (which is produced, as the above was just a
warning):

summary(kr.sem)
Error in data.frame(object$coeff, se, z, 2 * (1 - pnorm(abs(z))),

par.code)

:

 arguments imply differing number of rows: 47, 0

...both of these error messages are beyond my ability to troubleshoot.

Any

help would be greatly appreciated. Because I am unsure what exactly the
problem with this analysis is, I can't create a simpler example for

testing

purposes...but I think my model and correlation matrix are fairly

simple.



unlist(R.Version())

 platform   arch
   x86_64-unknown-linux-gnu   x86_64
   os system
  linux-gnux86_64, linux-gnu
   status  major
   2
minor   year
7.2 2008
monthday
 08   25
  svn rev   language
  46428R
   version.string R version 2.7.2 (2008-08-25)

...sem installed via install.packages(sem) which I assume is current.

Cordially,
Adam Kramer



model.kr - specify.model()
Melancholy - Affect, mel.aff, NA
Melancholy - Negemo, mel.neg, NA
Melancholy - Sad,  mel.sad, NA
Melancholy - Physcal, mel.phys, NA
Melancholy - Body, mel.phys, NA
Melancholy - Eating, mel.eat, NA
Melancholy - Groom, NA, 1
Social  - numlines, soc.nrow, NA

[R] Trouble with difftime()

2008-09-18 Thread Adam D. I. Kramer

Hello again,

I'm interested in manipulating some date objects, and have been
playing with the difftime function. I'm not sure that this is the desired
behavior:


pa$days.before.apr30 - difftime(may01,as.POSIXct(pa$training.max,origin=as.POSIXct(1969-12-31 
16:00:00,tz=PDT)),units=days)
units(pa$days.before.apr30)

[1] days
units(pa$days.before.apr30) 

[1] days

range(pa$days.before.apr30,na.rm=TRUE)

Time differences in secs
[1] 1817 15723146

...the oddity being that the range is displayed in seconds, while the unit
attached is days. The ?difftime info says that, If the units are changed,
the numerical value is scaled accordingly. This appears to be true, as the
numbers reported by range() do indeed correspond to the desired output *
60*60*24 (days converted to seconds).

Also, plotting a histogram via

hist(pa$days.before.apr30)

...produces a histogram of seconds, not days.

Using as.numeric(pa$days.before.apr30) or
as.numeric(pa$days.before.apr30,units=days) to convert the difftime object
into a number both as expected, so I just used that in order to be working
(reliably) with days, but this seems like it might be a bug, so I thought I
would bring it to your attention.

Also, I spent a bit of time freaking out about the histogram being on a
scale of 1e7, so it might be wise to print a warning if a unit conversion
occurs implicitly even if this behavior is as intended.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble with difftime()

2008-09-18 Thread Adam D. I. Kramer

I apologize, my original google searches were insufficient to turn up this
archived message:

http://tolstoy.newcastle.edu.au/R/help/06/04/24642.html

...in which my question is answered. Yes, this is the expected behavior,
because range() is designed to operate on difftime objects with any
unit...and it does so by converting to the theoretically-smallest unit.

My (and that poster's) confusion is likely due to the fact that, for cases
where seconds is not the highest common factor, the conversion is
surprising.

I think part of my surprise is also due to the fact that using the lowest
common multiple doesn't necessarily seem appropriate; why not, for instance,
convert everything to days, to weeks, or to the smallest or largest unit
passed to the summary function?

Also, please note here that I am not intending to whine or complain about
the functionality here, I'm just trying to understand it--much better to
convert all vectors to the same unit (ANY unit!). Having a better idea of
why choices like this have been made in R's user interface help me to better
guess what's up, to be able to solve my problems faster, and on my own.

--Adam

On Thu, 18 Sep 2008, Adam D. I. Kramer wrote:


Hello again,

I'm interested in manipulating some date objects, and have been
playing with the difftime function. I'm not sure that this is the desired
behavior:

pa$days.before.apr30 - 
difftime(may01,as.POSIXct(pa$training.max,origin=as.POSIXct(1969-12-31 
16:00:00,tz=PDT)),units=days)

units(pa$days.before.apr30)

[1] days
units(pa$days.before.apr30) 

[1] days

range(pa$days.before.apr30,na.rm=TRUE)

Time differences in secs
[1] 1817 15723146

...the oddity being that the range is displayed in seconds, while the unit
attached is days. The ?difftime info says that, If the units are changed,
the numerical value is scaled accordingly. This appears to be true, as the
numbers reported by range() do indeed correspond to the desired output *
60*60*24 (days converted to seconds).

Also, plotting a histogram via

hist(pa$days.before.apr30)

...produces a histogram of seconds, not days.

Using as.numeric(pa$days.before.apr30) or
as.numeric(pa$days.before.apr30,units=days) to convert the difftime object
into a number both as expected, so I just used that in order to be working
(reliably) with days, but this seems like it might be a bug, so I thought I
would bring it to your attention.

Also, I spent a bit of time freaking out about the histogram being on a
scale of 1e7, so it might be wise to print a warning if a unit conversion
occurs implicitly even if this behavior is as intended.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] moving from aov() to lmer()

2008-09-14 Thread Adam D. I. Kramer


On Sat, 13 Sep 2008, roberto toro wrote:


Hello,
I've used this command to analyse changes in brain volume:

mod1-aov(Volume~Sex*Lobe*Tissue+Error(Subject/(Lobe*Tissue)),data.vslt)

I'm comparing males/females. For every subject I have 8 volume measurements
(4 different brain lobes and 2 different tissues (grey/white matter)).

As aov() provides only type I anovas, I would like to use lmer() with type
II, however, I have struggled to find the right syntaxis.

How should I write the model I use with aov() using lmer()??

Specifying Subject as a random effect is straightforward

mod2-lmer(Volume~Sex*Lobe*Tissue+(1|Subject),data.vslt)

but I can't figure out the /(Lobe*Tissue) part...


You're trying to model a separate effect of lobe, of tissue, and of the
interaction between lobe and tissue for each subject, so you want

mod2-lmer(Volume~Sex*Lobe*Tissue+(Lobe*Tissue|Subject),data.vslt)

...the resulting fixed effect for Lobe, Tissue, and L:T in the summary()
then corresponds to the within-subjects effect aggregated (but not exactly
AVERAGED) across subjects. So, it's not exactly providing you a Type II
ANOVA...it's doing a mixed-effects model (or HLM, if you prefer), which as
you've written it is a Type III analysis (though once again, not an ANOVA in
the classical sense).

To get something more akin to type II using the lmer function (and I trust
someone will pipe up if there is a better way), you could first fit

mod2.additive-lmer(Volume~Sex*Lobe+Tissue+(Lobe+Tissue|Subject),data.vslt)

...and interpret the coefficients and effects provided by it, then fit the
crossed model to get the coefficients and effects for the higher-order
terms.

I hope this made sense and that I have understood you correctly.

--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fetching a range of columns

2008-09-14 Thread Adam D. I. Kramer

Hi Jason,

data[] is a data frame, remember--you need to specify rows AND columns. So,
data[,c(2,12,17)] is what you should be doing in the first place, and
data[,842:2411] in the second place.

Not sure if the help you needed was using the comma, or the : syntax, or if
you're trying to read only certain columns during the read.csv process
(which I don't think that's possible).

--Adam

On Sun, 14 Sep 2008, Jason Thibodeau wrote:


Hello,

I realize that using: x[x  3  x  5] I can fetch all elements between 3
and 5. However I read in from a CSV file, and I would like to fetch all
columns from within a range ( 842-2411). In teh past, I have done this to
fetch just select few columns:

data - read.csv(filein, header=TRUE, nrows=320, skip=nskip)
   data_filter - data[c(2,12,17)]
   write.table(data_filter, fileout, append = TRUE,
sep= ,, row.names= FALSE, col.names = FALSE)
   nskip - nskip+320

This time, however, instead of grabbing columns 2, 12, 17, I woudl like all
columns in the range of 842-2411. I can't seem to do this correctly. Could
somebody please provide some insight? Thanks in advance.

--
Jason Thibodeau

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] k-sample Kolmogorov-Smirnov test?

2008-09-14 Thread Adam D. I. Kramer

Maybe you should look a little harder.

help.search(Kolmogorov)


PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


--Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power PC with a linux distribution and R

2008-09-12 Thread Adam D. I. Kramer

To Stephen's credit, Apple will no longer support PowerPC chips (such as his
G4) in the next operating system (Snow Leopard), so some sort of switchover
will be necessary in order to maintain SOME sort of state-of-the-art
software packaging for PPC-based Mac users in the near future.

Also, it is important to note that Leopard on a G4 iBook would probably run
disgustingly (read: unusably) slow unless the memory is upgraded: It shipped
with 256MB of RAM (and that money would be better saved for a new computer),
and Leopard (on my Macbook Pro) is currently using about 700MB for the
operating system. Switching to linux and X-windows allows for an old system
to be, in a word, functional.

While I love MacOSX and use it on the computers of mine which can run it
usefully (G5 tower, PowerBook G4 w/2GB of ram), I've been much happier with
Linux on my older macs. The machine on my desk is a G4 iMac with 256MB of
RAM which has no trouble fitting mixed models in R while I browse the web on
Firefox 3 while the computer itself serves 80% of the Psychology
Department's web surveys. Under Leopard, these programs would not even run
simultaneously, let alone usably.

Further, while some amount of hand-compiling is necessary (Debian Linux
provides almost all of the software I would need), it's also quite
helpful--using Simon Urbanek's R optimization flags for G5 and G4 (see
https://stat.ethz.ch/pipermail/r-sig-mac/2005-February/001641.html ) and my
own version of ATLAS, the performance of R on that machine is comparable to
the unoptimized internal-BLAS no-effort-necessary .pkg from CRAN running on a 
G4 with 2GB of RAM and 1.5 times the processor speed under

Leopard.

Sure, it took a whole night to compile/optimize ATLAS and most of the next
day to compile R with those flags, but to a grad student like myself, that's
vastly superior to waiting twice as long for my analyses to run...or to the
impossibly high cost of a new computer.

--Adam

On Fri, 12 Sep 2008, Doran, Harold wrote:


Are you aware that a BSD unix OS runs on the mac already?


-Original Message-
From: [EMAIL PROTECTED] on behalf of Dr Eberhard W Lisse
Sent: Fri 9/12/2008 4:38 PM
To: stephen sefick
Cc: R-help Mailing List; Dr Eberhard W Lisse
Subject: Re: [R] Power PC with a linux distribution and R

Yes,

I'd install the OS X Leopard (10.5.4) distribution on it :-)-O

el

On 12 Sep 2008, at 22:30 , stephen sefick wrote:


This is an operating system question, but it is with the intent of
using R on that operating system.  I have an ibook G4 Power PC that I
am going to install linux on.  Is there a better, worse, or perhaps
easier (I am a linux newby migrating from mac) distribution that I
should look at.  I appreciate your help.  I didn't post this in the
sig-mac because I don't know if it fits there better than anywhere
else.
thanks

--
Stephen Sefick


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] match and incomparables

2008-09-12 Thread Adam D. I. Kramer

I can replicate this and also do not understand it.


match(1:3,1:3,incomparables=5)

[1] NA  2  3

match(1:3,1:3,incomparables=4)

[1] 1 2 3

match(1:3,1:3,incomparables=3)

[1] 1 2 3

match(1:3,1:3,incomparables=2)

[1] 1 2 3

match(1:3,1:3,incomparables=1)

[1] NA  2  3

...every other integer value for incomparables produces 1 2 and 3 for
output. I'm using R 2.7.2, self-compiled, under linux.

--Adam

On Fri, 12 Sep 2008, McGehee, Robert wrote:


Hello,
I was playing around with the newly implemented 'incomparables' argument
in 'match' and realized the argument does not behave anything like I
expected. Can someone explain what is going on here? Sorry if I'm
misreading the documentation.


match(1:3, 1:3, incomparables=1)

[1] NA  2  3  # This seems right, the 1 in 'x' is 'incomparable'


match(1:3, 1:3, incomparables=2)

[1] 1  2  3   # Shouldn't this be 1 NA 3? Why isn't the 2 incomparable?


match(1:3, 1:3, incomparables=5)

[1] NA  2  3   # Why isn't the 5 ignored?

Note from ?match:
incomparables: a vector of values that cannot be matched. Any value in
x matching a value in this vector is assigned the nomatch value. For
historical reasons, FALSE is equivalent to NULL.

Thanks in advance!
Robert

platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  7.2
year   2008
month  08
day25
svn rev46428
language   R
version.string R version 2.7.2 (2008-08-25)

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Tel: 617/392-8396Fax:617/476-6389
mailto:[EMAIL PROTECTED]



This e-mail, and any attachments hereto, are intended fo...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power PC with a linux distribution and R

2008-09-12 Thread Adam D. I. Kramer

Hi El,

As does my PowerBook G4 with 2GB of RAM. The iBook shipped with 256MB,
however, which Leopard does not run well on. My point was more that Linux
was superior to MacOS in the case of very low memory...not that the chip was
somehow the problem.

...but now I am solidly off-topic!

Cheers,
Adam

On Sat, 13 Sep 2008, Dr Eberhard W Lisse wrote:


Adam,

I (or rather the kids and the wife) got two iMinis and one iBook
with G4. They all run 10.5.4 happily on their 1GB of RAM...

greetings, el

On 12 Sep 2008, at 23:43 , Adam D. I. Kramer wrote:

To Stephen's credit, Apple will no longer support PowerPC chips (such as 
his

G4) in the next operating system (Snow Leopard), so some sort of switchover
will be necessary in order to maintain SOME sort of state-of-the-art
software packaging for PPC-based Mac users in the near future.

Also, it is important to note that Leopard on a G4 iBook would probably run
disgustingly (read: unusably) slow unless the memory is upgraded: It 
shipped
with 256MB of RAM (and that money would be better saved for a new 
computer),

and Leopard (on my Macbook Pro) is currently using about 700MB for the
operating system. Switching to linux and X-windows allows for an old system
to be, in a word, functional.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] request: most repeated component of a list

2008-09-11 Thread Adam D. I. Kramer

That is indeed different from what I thought the first time.

x - sapply(1:length(l), function(x) {
  sum(sapply(l, function(y) {
if ( nrow(l[[x]]) != nrow(y) | ncol(l[[x]]) != ncol(y) ) FALSE
else sum(y != l[[x]]) == 0
  }))
} )

names(x) - names(l)

Then, x has the same names as l, and x[i] is the number of matches that
l[[i]] has...so you want the index or indices of max(x).

--Adam

On Thu, 11 Sep 2008, Muhammad Azam wrote:


May be i could not explain properly. Actually there are components of
list i.e. [[1]] to [[500]]. Each component containing r-rows (may be
different for each [[ k ]] and c-columns same for all). I have to
compare all the [[ k ]] components of that list and found the one
appearing maximum no of times. e.g. from three components [[1]] to
[[3]] given below. The most repeated is

[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000Please help to find it. Thanks and

best regards
Muhammad Azam


- Original Message 
From: jim holtman [EMAIL PROTECTED]
To: Muhammad Azam [EMAIL PROTECTED]
Cc: R-help request [EMAIL PROTECTED]; R Help r-help@r-project.org
Sent: Wednesday, September 10, 2008 5:59:28 PM
Subject: Re: [R] request: most repeated component of a list

If want you want is the summary from all of them, then 'rbind' the
data together into one matrix and analyze it:

totalMat - do.call(rbind, listOfMatrices)

On Wed, Sep 10, 2008 at 11:49 AM, Muhammad Azam [EMAIL PROTECTED] wrote:

Dear R community


I have stored the results of arrays in a list consist of J-components
(say 200 components). Each component containing same no of columns but
may be different no of rows. e.g

[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000

[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000

[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]44100
[3,]44100
[4,]44000
[5,]44000



For 200 components i want to make a frequency table. How can i make a
frequency table of these components or the most repeated component out
of all? Any help in this regard will be appreciated.



Muhammad Azam



   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] request: most repeated component of a list

2008-09-11 Thread Adam D. I. Kramer

Did you try my function? I don't see how it doesn't fit the need of this
reexplanation.

For this set, x would end up equal to c(2,2,1), indicating that array 1
appears twice in the set and array 2 appears twice in the set. So, array 1
and array 2 appear maximally in the set. So, look at a[[1]] or a[[2]] and
that is your answer.

--Adam

On Thu, 11 Sep 2008, Muhammad Azam wrote:


Thanks for the effort but still we are far from the desired result. May be
this example will help you to understand the situation. Example



a1=c(1:12);   a1=array(a1,dim=c(3,4));   a2=c(1:12);  a2=array(a2,dim=c(3,4));  
a3=c(1:16)
a3=array(a3,dim=c(4,4));
a=list(a1,a2,a3);
a
[[1]]
[,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

[[2]]
[,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

[[3]]
[,1] [,2] [,3] [,4]
[1,]159   13
[2,]26   10   14
[3,]37   11   15
[4,]48   12   16

Here [[1]] and [[2]] are same out of three (internal values wise). The
whole array [[1]] or [[2]] is in majority. So i want to get the whole
array or component of list which is in majority. The result should be like
this



[,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

Hope it is much more clear as before.

best regards
Muhammad Azam


- Original Message 
From: Adam D. I. Kramer [EMAIL PROTECTED]
To: Muhammad Azam [EMAIL PROTECTED]
Cc: R Help r-help@r-project.org
Sent: Thursday, September 11, 2008 9:53:40 AM
Subject: Re: [R] request: most repeated component of a list

That is indeed different from what I thought the first time.

x - sapply(1:length(l), function(x) {
  sum(sapply(l, function(y) {
if ( nrow(l[[x]]) != nrow(y) | ncol(l[[x]]) != ncol(y) ) FALSE
else sum(y != l[[x]]) == 0
  }))
} )

names(x) - names(l)

Then, x has the same names as l, and x[i] is the number of matches that
l[[i]] has...so you want the index or indices of max(x).

--Adam

On Thu, 11 Sep 2008, Muhammad Azam wrote:


May be i could not explain properly. Actually there are components of
list i.e. [[1]] to [[500]]. Each component containing r-rows (may be
different for each [[ k ]] and c-columns same for all). I have to
compare all the [[ k ]] components of that list and found the one
appearing maximum no of times. e.g. from three components [[1]] to
[[3]] given below. The most repeated is

[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000Please help to find it. Thanks and

best regards
Muhammad Azam


- Original Message 
From: jim holtman [EMAIL PROTECTED]
To: Muhammad Azam [EMAIL PROTECTED]
Cc: R-help request [EMAIL PROTECTED]; R Help r-help@r-project.org
Sent: Wednesday, September 10, 2008 5:59:28 PM
Subject: Re: [R] request: most repeated component of a list

If want you want is the summary from all of them, then 'rbind' the
data together into one matrix and analyze it:

totalMat - do.call(rbind, listOfMatrices)

On Wed, Sep 10, 2008 at 11:49 AM, Muhammad Azam [EMAIL PROTECTED] wrote:

Dear R community


I have stored the results of arrays in a list consist of J-components
(say 200 components). Each component containing same no of columns but
may be different no of rows. e.g

[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000

[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000

[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]44100
[3,]44100
[4,]44000
[5,]44000



For 200 components i want to make a frequency table. How can i make a
frequency table of these components or the most repeated component out
of all? Any help in this regard will be appreciated.



Muhammad Azam



   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo

Re: [R] Calculate mean/var by ID

2008-09-11 Thread Adam D. I. Kramer

aggregate(value,list(ID=ID),mean)
aggregate(value,list(ID=ID),var)

--Adam

On Thu, 11 Sep 2008, liujb wrote:



Hello,

I have a data set that looks like this.
IDvalue
111 5
111 6
111 2
178 7
178 3
138 3
138 8
138 7
138 6
.
.
.

I'd like to calculate the mean and var for each object identified by the ID.
I can in theory just loop through the whole thing..., but is there a easier
way/command which let me calculate the mean/var by ID?

Thanks,
Julia
--
View this message in context: 
http://www.nabble.com/Calculate-mean-var-by-ID-tp19440461p19440461.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calcaulate matrices for two subsets

2008-09-11 Thread Adam D. I. Kramer

Hi Bill,

Tell me more about the Obs object. The subset of an lm should be a vector
telling the lm which observations to use...if Obs[197,396] is a single
number, only one observation will be used, and chances are your model is not
what you intended.

Also, predict(result1,newdata=Obs[397,339]) needs that cell in Obs to have a
column called SP500 index, and predict.lm (check the man page for it) will
only return one number, the prediction for the given new data.

If you want it to have 3 numbers, maybe Obs[397:399] is what you want? The
colon means 397 through 399, inclusive while the comma means row 397,
column 399.  If this is your mistake, you actually want subset=197:396,
data=Obs in your call to lm.

Hope this helped.

--Adam

On Thu, 11 Sep 2008, Xianchun Liao wrote:


I am an R beginner and trying to run a market model using event study in
R framework.



First, I run a market model, that is lm(stock security~SP500 index,
subset=Obs[197, 396]) -result1

Then I get predict results for a  new dataset using predict (result1,
newdata=Obs[397,399]) -pred1

Pred1 should have three numbers.



Now I need to calculate abnormal return by the formula stock security
[397,399] -pred1

But it does not work after trying many times.



So, how to write a R code to implement the formula and get right
results.





Thanks,



Bill


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to load functions in R

2008-09-11 Thread Adam D. I. Kramer

Source(file.path) executes the file at file.path in order, just as if you
had typed it in.

So, the source file should in fact name each function in turn:

f1 - function(x) { ... }
f2 - function(x) { ... }
...etc.

So a good way to debug is to just copy and paste lines from your source file
into the R command line, and see if they behave as expected.

--Adam

On Thu, 11 Sep 2008, Adaikalavan Ramasamy wrote:


Strange.

source() should read all the function in that file unless there was a syntax 
error or something else preventing the other function from being parsed 
correctly. Could you send us a simplified example that reproduces this 
problem?


Thanks.

Regards, Adai



[EMAIL PROTECTED] wrote:

 Hello,
It seems that all methods work. Source() however loads only the last 
function. with save(a,b,file=path) i can save more than 1 function. 
Thanks a lot,


Mihai
-Ursprüngliche Nachricht-
Von: Yihui Xie [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 11. 
September 2008 16:48

An: [EMAIL PROTECTED]
Cc: Mirauta, Mihai; r-help@r-project.org
Betreff: Re: [R] How to load functions in R

We may just read them in the R console instead of an external editor, and 
fix() or edit() them when we need to make any modifications. A trivial 
advantage of saving them as an image file in Windows is that you can 
double-click the file and R will be started with these objects loaded 
automatically. Anyway, to save the functions as ASCII files or even write 
a package are also good solutions :-)


Regards,
Yihui

On Thu, Sep 11, 2008 at 10:34 PM, Adaikalavan Ramasamy 
[EMAIL PROTECTED] wrote:


I would recommend saving the functions into a separate file and then 
using

source() as bartjoosen suggested.

I do not recommend using save() here because the output is non-readable 
(even when using ascii=TRUE option). Which means that you have to load() 
it, then copy-and-paste into an editor before making changes and then 
running it again in R and then save() again.


Another better option is to consider making your own package. It may 
sound complicated but once you mastered it, it makes your functions more 
portable and encourages you to document it. Further, the function 
package.skeleton() simplifies much of it.


Regards, Adai



Yihui Xie wrote:


Hi, you may save your functions somewhere on your disk using save()
and load them next time when you want to use them. See ?save and ?load

Yihui

On Thu, Sep 11, 2008 at 9:30 PM,  [EMAIL PROTECTED] wrote:


Hello,

I am trying to use self created functions in other scripts than the one 
where they are stored.

For the moment I am using the following structure of commands to do
that:

1. Load the text file with the functions in the current script:
x=parse(path)
2. transform the tex in a function: f1=eval(x[1]), f2=eval(x[2]) if 
more than one function is stored in the text file 3. use the functions 
as normal


Is there another possibility to do the same?
Thank you,

Mihai Mirauta

  [[alternative HTML version deleted]]








--
Yihui Xie [EMAIL PROTECTED]
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building, Renmin University 
of China, Beijing, 100872, China





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About Plot.new

2008-09-11 Thread Adam D. I. Kramer

You can't add=TRUE unless the graph exists in the first place. So, drop that
if you're creating the graph. Or if that's there because you want to put a
boxplot on top of a preexisting graph, make sure you have created the
preexisting graph already.

--Adam

On Thu, 11 Sep 2008, cathelf wrote:



Hi, sorry for bothering your guys.
I will trying to make some nice graph using boxplot. when I check the help
file of boxplot, there is a sample code as:

boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
subset = supp == OJ, col = orange)
legend(2, 9, c(Ascorbic acid, Orange juice),
   fill = c(yellow, orange))

But when I run it, it shows the following error:
Error in xypolygon(xx, yy, lty = blank, col = boxfill[i]) :
   plot.new has not been called yet


what does it mean? If  I first run plot.new(), then running the above
code, only the x-axis and y-axis is on the graph, no boxplot inside.

Can anyone tell me how to call  plot.new or at least how to run the above
code correctly?

Thank you very much!


--
View this message in context: 
http://www.nabble.com/About-%22Plot.new%22-tp19446258p19446258.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed model MANCOVA

2008-09-10 Thread Adam D. I. Kramer

Hi Erika,

I have not tried this before, and I hope that somebody will correct
me if I'm wrong, but the glmer function in the lme4 library appears to do
what you want. From examples(lmer):

lmer (gm1 - glmer(cbind(incidence, size - incidence) ~ period + (1 |
herd), family = binomial, data = cbpp))

...I guess that this will do what you want it to because it has multiple
variables on the LHS and both continuous and categorical variables on the
RHS, along with an explicit grouping structure.

In your case, you probably want to leave the family= argument out, as noted
in ?glmer, If 'family' is missing then a linear mixed model is fit;
otherwise a generalized linear mixed model is fit. ...MANCOVA tend to be
generalized linear models.

Once again, though, I have not used this system personally, haven't seen
your data, and don't know what output to expect. Hopefully somebody else can
confirm or deny this solution's efficacy.

--Adam

On Mon, 8 Sep 2008, Erika Crispo wrote:


Hello,

I need to perform a mixed-model (with nesting) MANCOVA, using Type III
sums of squares. I know how to perform each of these types of tests
individually, but I am not sure if performing a mixed-model MANCOVA is
possible. Please let me know.

Erika

  
Erika Crispo, PhD candidate
McGill University, Department of Biology
http://www.biology.mcgill.ca/grad/erika/index.htm

  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble compiling R with self-compiled LAPACK/ATLAS under Linux

2008-09-10 Thread Adam D. I. Kramer

Hello,

I just had a lot of trouble compiling a version of R which uses a
recently-compiled version of ATLAS and LAPACK. ATLAS and LAPACK compiled
correctly and installed fine and passed all of their checks (and yes, I did
compile them all with -fPIC as is required for R). This was my initial
configure line:

./configure --with-x --enable-threads=posix
--with-blas=-L/usr/local/lib -lptf77blas -lptcblas -latlas -lpthread
--with-lapack=-L/usr/local/lib -llapack -lptf77blas -lptcblas -latlas 
-lpthread
--prefix=/usr/local

...this configure line was chosen by examining R-admin.pdf and ./configure 
--help.
It configured fine, no problem. The trouble was when I attempt to make the
package:

...a lot of stuff compiles correctly...

gcc -std=gnu99 -shared -L/usr/local/lib64 -o grDevices.so chull.o devNull.o 
devPicTeX.o devPS.o devQuartz.o init.o
../../../../library/grDevices/libs/grDevices.so is unchanged
make[5]: Leaving directory `/home/akramer/R-2.7.2/src/library/grDevices/src'
make[4]: Leaving directory `/home/akramer/R-2.7.2/src/library/grDevices/src'
Warning in solve.default(rgb) :
  unable to load shared library '/home/akramer/R-2.7.2/modules//lapack.so':
  /home/akramer/R-2.7.2/modules//lapack.so: undefined symbol: cblas_izamax
Error in solve.default(rgb) : lapack routines cannot be loaded
Error: unable to load R code in package 'grDevices'
Execution halted
make[3]: *** [all] Error 1
make[3]: Leaving directory `/home/akramer/R-2.7.2/src/library/grDevices'
make[2]: *** [R] Error 1
make[2]: Leaving directory `/home/akramer/R-2.7.2/src/library'
make[1]: *** [R] Error 1
make[1]: Leaving directory `/home/akramer/R-2.7.2/src'
make: *** [R] Error 1

...So, I looked into the troublesome library:

$ nm /home/akramer/R-2.7.2/modules//lapack.so | grep cblas_izamax
 U cblas_izamax

...the function is not there. However, referring to my configure line,
the function is indeed in the libraries I passed to R:

[EMAIL PROTECTED] ~/R-2.7.2] nm /usr/local/lib/libptcblas.a | grep cblas_izamax
 T cblas_izamax

...so, it appears to me that there is a problem in R building lapack.so. So,
I looked back into my config.log file, and found this line:

configure:37951: checking for zgeev_
configure:38015: gcc -std=gnu99 -o conftest -O3 -mtune=opteron  -I/usr/local/include  
-L/usr/local/lib64 conftest.c -L/usr/local/lib -lptf77blas -lptcblas -latlas -lpthread 
 -lgfortran -lm -ldl -lm  5

...which then failed. The key to the above line is that -llapack was *not 
included,*
even though zgeev_ is a lapack function!

Then, I added -llapack to my --with-blas line, and R compiled correctly,
tested fine, and runs nicely.

This information is provided because I feel like I have either done
something wrong, I am misunderstanding the process of building R, or there
is a bug in the configuration for people using --with-blas.

Cordially,
Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] request: most repeated component of a list

2008-09-10 Thread Adam D. I. Kramer

You might try

apply(t(sapply(l,apply,2,sum)),2,sum)

--Adam

On Wed, 10 Sep 2008, Muhammad Azam wrote:


Dear R community



I have stored the results of arrays in a list consist of J-components (say
200 components). Each component containing same no of columns but may be
different no of rows. e.g



[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]43400
[3,]43400
[4,]43000

[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]44300
[3,]44300
[4,]44000
[5,]44000

[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,]40000
[2,]44100
[3,]44100
[4,]44000

For 200 components i want to make a frequency table. How can i make a
frequency table of these components or the most repeated component out of
all? Any help in this regard will be appreciated.


Muhammad Azam



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing a list and vector and returnig the listname

2008-09-10 Thread Adam D. I. Kramer

Maybe, for vector v and list l

match(v,names(l))

?

--Adam

On Wed, 10 Sep 2008, Rajasekaramya wrote:



hi,

I have list of length 5453 and vector of length 14318.I need to compare the
vector with the list and return the list name if matched.I am thinking of
using an lapply but how to retrive the listname is wat i am puzzled abt.
kindly let me know how to go abt it.

Ramya
--
View this message in context: 
http://www.nabble.com/comparing-a-list-and-vector-and-returnig-the-listname-tp19415038p19415038.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed model MANCOVA

2008-09-10 Thread Adam D. I. Kramer

Hi Erika,

As mentioned, I haven't run the model before and I don't have access
to your data set, so you might want to post your reply to the list as well
(cc'd again).

As another guessing-without-trying-anything, I'd first make sure
that in fact pop and family are factor variables with the same length as
a,b,c,treat,centroidsize.

Also having not used glmer before, I'm not sure how to get the p
values...since estimates and std. errors and t-values are reported, the df's
are likely known and so they probably exist in the model object you
created. Of course, your highest t-value is 1.92, so none of your fixed
effects would be significant at the .05 level (the two-tailed z-score cutoff
is 1.93, which is the limit for t).

--Adam

On Wed, 10 Sep 2008, Erika Crispo wrote:


Thanks! I am still having some problems. I have tried the following:


model=glmer(cbind(a,b,c)~pop*treat+centroidsize+(1|pop/family))

Error: Matrices must have same number of columns in rbind2(..1, r)
In addition: Warning messages:
1: In family:pop :
numerical expression has 104 elements: only the first used
2: In family:pop :
numerical expression has 104 elements: only the first used




I don't get the error messages if I exclude the nesting (i.e. exclude pop on 
the RHS). But even then, I don't know how to interpret the output. How can I 
get P values for pop and treat? I've attached my data file.



summary(model)

Linear mixed model fit by REML
Formula: cbind(a, b, c) ~ pop * treat + centroidsize + (1 | family)
  AICBIC logLik deviance REMLdev
-685.3 -656.2  353.6   -822.8  -707.3
Random effects:
Groups   NameVariance   Std.Dev.
family   (Intercept) 9.8877e-13 9.9437e-07
Residual 2.3502e-05 4.8478e-03
Number of obs: 104, groups: family, 28

Fixed effects:
   Estimate Std. Error t value
(Intercept) 4.518e-03  4.329e-03  1.0438
popkah -2.338e-03  1.902e-03 -1.2297
popkant-2.328e-03  1.881e-03 -1.2380
poprwe -3.728e-03  1.941e-03 -1.9204
treatn -8.703e-04  1.957e-03 -0.4448
centroidsize-1.886e-06  2.464e-06 -0.7656
popkah:treatn   3.440e-03  2.695e-03  1.2765
popkant:treatn  1.198e-03  2.699e-03  0.4439
poprwe:treatn   4.662e-03  2.746e-03  1.6976

Correlation of Fixed Effects:
  (Intr) popkah popknt poprwe treatn cntdsz ppkh:t ppknt:
popkah  -0.228
popkant -0.335  0.507
poprwe  -0.193  0.490  0.492
treatn  -0.092  0.485  0.476  0.479
centroidsize -0.951  0.009  0.119 -0.023 -0.128
popkah:trtn  0.121 -0.705 -0.352 -0.346 -0.719  0.036
popknt:trtn  0.095 -0.352 -0.680 -0.347 -0.721  0.063  0.520
poprwe:trtn  0.117 -0.346 -0.346 -0.707 -0.706  0.037  0.510  0.511


  
Erika Crispo, PhD candidate
McGill University, Department of Biology
http://www.biology.mcgill.ca/grad/erika/index.htm

  
- Original Message - From: Adam D. I. Kramer 
[EMAIL PROTECTED]

To: Erika Crispo [EMAIL PROTECTED]
Cc: r-help@r-project.org
Sent: Wednesday, September 10, 2008 5:47 PM
Subject: Re: [R] mixed model MANCOVA



Hi Erika,

 I have not tried this before, and I hope that somebody will correct
me if I'm wrong, but the glmer function in the lme4 library appears to do
what you want. From examples(lmer):

lmer (gm1 - glmer(cbind(incidence, size - incidence) ~ period + (1 |
herd), family = binomial, data = cbpp))

...I guess that this will do what you want it to because it has multiple
variables on the LHS and both continuous and categorical variables on the
RHS, along with an explicit grouping structure.

In your case, you probably want to leave the family= argument out, as noted
in ?glmer, If 'family' is missing then a linear mixed model is fit;
otherwise a generalized linear mixed model is fit. ...MANCOVA tend to be
generalized linear models.

Once again, though, I have not used this system personally, haven't seen
your data, and don't know what output to expect. Hopefully somebody else 
can

confirm or deny this solution's efficacy.

--Adam

On Mon, 8 Sep 2008, Erika Crispo wrote:


Hello,

I need to perform a mixed-model (with nesting) MANCOVA, using Type III
sums of squares. I know how to perform each of these types of tests
individually, but I am not sure if performing a mixed-model MANCOVA is
possible. Please let me know.

Erika

  
Erika Crispo, PhD candidate
McGill University, Department of Biology
http://www.biology.mcgill.ca/grad/erika/index.htm

  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.








No virus found in this incoming message.
Checked by AVG - http

Re: [R] Splitting Data Frame into Two Based on Source Array

2008-09-09 Thread Adam D. I. Kramer

data_main[ match(src,data_main$V1), ]

and the compliment of src (call it srcc)

data_main[ match(srcc,data_main$V1), ]

...this only works so long as there is only one occurrance of each item in
V1 in V1.

--Adam

On Tue, 9 Sep 2008, Gundala Viswanath wrote:


Dear all,

Suppose I have this data frame:



data_main

  V1  V2
foo13.1
bar   12.0
qux   10.4
cho  20.33
pox   8.21

And I want to split the data into two parts
first part are the one contain in the source array:


src

[1] bar pox

and the other one the complement.

In the end we hope to get this two dataframes:


data_child1

V1 V2
bar   13.1
pox   8.21

and


data_child2_complement

foo 13.1
qux 10.4
cho 20.33

Is there a compact way to do it in R?




- Gundala Viswanath
Jakarta - Indonesia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] isolate elements in vector that match one of many possible values

2008-09-09 Thread Adam D. I. Kramer

Check out ?match, ?%in%


x - c(1,2,3,4)
y - c(1,2,4)
match(y,x)

[1] 1 2 4




--Adam

On Mon, 8 Sep 2008, Andrew Barr wrote:


Hi all,

I want to get the index numbers of all elements of a vector which match any
of a long series of possible values.  Say x - c(1,2,3,4) and I want to know
which values are equal to 1, 2 or 4.  I could do

which(x == 1 | x==2 | x==4)
[1] 1 2 4

This gets really ugly though, when the list of values of interest is really
long.  Is there a nicer way to do this?  Something akin to the MySQL
construction in(), as in

#MySQL script example
Select * from table where parameter in(x,y,z);

Thanks!

--
W. Andrew Barr
Biological Anthropology
University of Texas at Austin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] printing all rows

2008-09-09 Thread Adam D. I. Kramer

options(max.print)

$max.print
[1] 9


options(max.print=10)
options(max.print)

$max.print
[1] 1e+05

...so check what your max.print is, and figure out whether you need to
set it to nrow, ncol, or nrow*ncol of your data frame...then do so...though
of course, this is a global variable, so everything you print from then on
will just keep printing and printing.

Really, though, you might get more utility out of write.table and then using
a word processor to read the data in your table.

--Adam

On Tue, 9 Sep 2008, ANJAN PURKAYASTHA wrote:


Hi,
my data table has 38939 rows. R prints  the first 1 columns and then
prints an error message:[ reached getOption(max.print) -- omitted 27821
rows ]].
is it possible to set the maxprint parameter so that R prints all the rows?

tia,
anjan

--
=
anjan purkayastha, phd
bioinformatics analyst
whitehead institute for biomedical research
nine cambridge center
cambridge, ma 02142

purkayas [at] wi [dot] mit [dot] edu
703.740.6939

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating table of averages

2008-09-09 Thread Adam D. I. Kramer

Maybe something like this:

by(df[,c(77,81,86,90,94,98,101,106)],df$category,apply,2,mean)

...which would then need to be reformatted into a data frame (there is
probably an easy way to do this which I don't know).

aggregate seems like a more reasonable choice, but the function for
aggregate must return scalars, not rows...tapply doesn't take data.frame
inputs. Maybe someone else has a suggestion?

--Adam

On Tue, 9 Sep 2008, Lawrence Hanser wrote:


Dear Colleagues,

I have a dataframe with variables:

 [1] ID category   a11a12
a13a21
 [7] a22a23a31a32
b11b12
[13] b13b21b31b32
b33b41
[19] b42c11c12c21
c22c23
[25] c31c32c33d11
d12d13
[31] d14d21d22d23
d24d25
[37] d31d32d33e11
e12e13
[43] e21e22e23e31
e32e33
[49] f11f12f13f14
f21f22
[55] f23f24g11g12
g13g14
[61] g21g22g23g24
g31g32
[67] g33g41g42g43
h11h12
[73] h13h21h22h23
C1.Employ  SC11.Ops
[79] SC12.Unit  SC13.Nonadvers C2.Enterprise  SC21.Structure
SC22.Gov   SC23.Culture
[85] SC24.Stratcomm C3.Manage  SC31.Resource  SC32.Change
SC33.Continue  C4.Stratthink
[91] SC41.VisionSC42.Decision  SC43.Adapt C5.Lead
SC51.Develop   SC52.Care
[97] SC53.Diversity C6.Foster  SC61.Teams SC62.Negotiate
C7.Embody  SC71.Ethical
[103] SC72.Follower  SC73.Warrior   SC74.Develop   C8.Comm
C81.Speak  C82.Listen
[109] OverallImp

The variable category has four values: Regular, CCM, CFM, and Other

I'd like to create a table like this to feed into barplot2:

row.name  C1.Employ C2.Enterprise  C3.Manage  C4.Stratthink  C5.Lead
C6.Foster  C7.Embody  C8.Comm
Regular 3.68  4.27 3.22
etc..
CCM 4.32  4.56  etc.
CFM  etc.
Other etc.

So far, I have been able to get this far:


mean(subset(impchiefs08,category==Regular,select=c(C1.Employ,C2.Enterprise,C3.Manage,C4.Stratthink,C5.Lead,C6.Foster,C7.Embody,C8.Comm
)))
   C1.Employ C2.Enterprise C3.Manage C4.Stratthink   C5.Lead
C6.Foster C7.Embody   C8.Comm
3.60  3.85  4.48  4.346667  4.608889
4.44  4.60  4.49




But I am stumped as to how to get what I want.

Thanks in advance.

Larry

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test for equality of complicatedly related average correlations

2008-09-08 Thread Adam D. I. Kramer

Hi Ralph,

My approach provides the same answer by asking a different question.
Effectively, my approach tests whether the difference between timepoints is
larger for X than for Y (and also gives you the main effects of whether X
has a higher mean than Y and whether scores increase or decrease over time,
which may or may not be of interest but as control variables serve the same
purpose as using correlations rather than difference scores).

One objection to this mode of comparing test-retest reliabilities is that
one test may be more variable over time (at the item level) than the other,
but that the overall means don't differ...but the correlational approch you
asked about also suffers from this problem.

--Adam

On Sun, 7 Sep 2008, Ralph79 wrote:


I have to get a bit more familiar with the model you propose in order to
understand if it applies to my problem as well.

My question is not really does time show a different effect but which
one of two measures is more reliable: My respondents have completed
exactly the same questionnaire twice (t=1 and t=2). The questionnaire
consisted of two ways of measuring attribute importance, and the better
method of measuring these importances is the one that gives the same
importances for each respondent in t=1 and t=2. In other words: I want to
examine test-retest reliability of the two measures. Naturally, if
X(t=1,t=2)-correlation is higher for a specific respondent than the
Y(t=1,t=2)-corralation, than for this respondent the method that yields
the X-importances is more reliable. All I want to do is to see if this
holds for the whole sample as well...



Anyway, thank you again, I will think of your approach.

Ralph



Adam D. I. Kramer-3 wrote:


Hi Ralph,

I had the same problem you do a few months ago, and realized that
the question I had (does time show a different effect for X than Y) was
not
best modeled as differences between correlations across individuals, but
as
whether time interacts with condition.

I answered this question with

library(nlme)
lme(obs ~ cond*time, random=~cond*time|subj)


...where obs is the responses on the X or Y variable, cond is a factor of
either X or Y, and subj is your subject variable. This fits a heirarchical
linear model to the data. The relationship between X and time is sig.
diff.
from the relationship between Y and time if the cond:time fixed effect is
true.

This approach makes better use of your data, because when you correlate
the
observations, you're effectively losing variability (because
correlations
are doubly standardized) as well as degrees of freedom (you have 9 df
within
each individual, but each correlation is only one number).

--Adam

On Sat, 6 Sep 2008, Ralph79 wrote:



Dear R-Users,

I am currently looking for a way to test the equality of two correlations
that are related in a very special way. Let me describe the situation
with
an example.

- There are 100 respondents, and there are 2 points in time, t=1 and t=2.

- For each of the respondents and at each of the time points, I have
information on 10 X-variables and on 10 Y-variables.

- Based on this information, I calculate two correlations for each
respondent: cor(X[t=1],X[t=2]) and cor(Y[t=1],Y[t=2]), with X and Y being
the vectors of the corresponding 10 variables.

- Now I get the average correlations over the whole sample using Fishers
Z-transformation, i.e. I have mean(cor(X[t=1],X[t=2])) and
mean(cor(X[t=1],X[t=2])) and want to know if the mean correlations are
significantly different!


I haven't found any test that deals with exactly my situation. Therefore,
I
simply apply a paired t-test based on the individual z-correlations.
From
my point of view this should be ok, because of the z's normality.
However, I
am unsure if there is a better way to test the hypothesis that I am
interested in?

I'd be grateful for any comment or hint.

Thank you very much,

Ralph

-
Ralph Wirth
University Erlangen-Nuremberg, Chair of Statistics
GfK Group, Department of Methods and Product Development

--
View this message in context:
http://www.nabble.com/Test-for-equality-of-complicatedly-related-average-correlations-tp19346312p19346312.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





-
Ralph Wirth
University Erlangen-Nuremberg, Chair of Statistics
GfK Group, Department of Methods and Product Development

--
View this message in context: 
http://www.nabble.com/Test-for-equality

Re: [R] ON MAC, how to copy a plot on to Word document?

2008-09-08 Thread Adam D. I. Kramer

For what it's worth, copy/paste between R and Word 2008 works perfectly. It
doesn't work at all for Word 2004...so that's at least ONE improvement (the
only one I've seen) in versions of Word.

--Adam

On Sun, 7 Sep 2008, Stefan Evert wrote:



On 7 Sep 2008, at 16:57, John Kane wrote:

I think you need to save the plot and import it into Word.  AFAIK you can 
only copy and paste a plot in Windows.


In the R.app GUI, you can click the plot window and then select Copy from 
the Edit menu (or press Command-C). You can now past the plot into most Mac 
applications (e.g. Apple's own iWork suit), but it doesn't work in Word 2004 
-- so it's really a problem with Word.



Have a look at ?png  (There are other formats available)


Or, much better, a PDF file by selecting Save As ... from the File menu 
(Shift-Command-S). You should then be able to drag and drop the PDF file into 
Word.


HTH,
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Averaging 'blocks' of data

2008-09-08 Thread Adam D. I. Kramer

Hi Steve,

You probably want to check out ?by or ?aggregate, maybe using
(rownames(df) %/% 60) : (colnames(df) %/% 60) as your index variable.

--Adam

On Sun, 7 Sep 2008, Steve Murray wrote:




Dear all,

I have a large dataset which I hope to reduce in size, to make it more
useable. I hope to do this by taking an average of each 60 x 60 blockof
values and forming a new data frame out of the averaged values.

How would I go about taking averages of 60 x 60 'blocks' in R, and cycling
through the whole dataset, recording each calculated value in a new
table/data frame?

Many thanks for any advice offered.

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to draw a vertical line from points to x-axis

2008-09-08 Thread Adam D. I. Kramer

I think you want the ?lines function.

To connect a point (x,y) to the x-axis,

lines(x=c(x,x),y=c(y,0))

...draws a line from that point to the x-axis. You may also want to specify
pch=c(?,),type=b where ? is the original point type (which you don't
want to run over) and  is the pch for theline on the axis.

--Adam

On Sun, 7 Sep 2008, Anny Huang wrote:


Hello,

I want to know how to draw a line connecting each point to the x-axis
perpendicularly (i.e. a vertical line).
abline(v=...) seems not to work for my purpose, because it runs over the
data point. Can anyone help? Thanks.

Anny

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested ANOVA with fixed and random variables missing data

2008-09-08 Thread Adam D. I. Kramer


On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote:


I have a nested ANOVA, with a fixed factor tmt nested within site
(random). There are missing values in the data set.

aeucs, tmt and site have been defined as objects

I have tried:

model1=lme(aeucs~tmt,random=~1|tmt/site)


I think you want lme(aeucs~tmt,random=~tmt|site)

...as tmt is a fixed factor, you don't want to use it for grouping. Instead,
you want to estimate a separate tmt effect for each site...so estimate (~) a
fixed tmt effect (tmt) within (|) sites (site) = ~tmt|site.

--Adam



I get the following error message

Error in na.fail.default(list(aeucs = c(0.8, 1, 1, 1, 1, 1, 
0.7,  :
 missing values in object

I need to know where I am going wrong.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting max row from data matrix

2008-09-08 Thread Adam D. I. Kramer

Hi Srini,

This may be as simple as tapply(weight,fruit,max)

or t(that) if you want it as you specified.

--Adam

On Sun, 7 Sep 2008, Srinivas Iyyer wrote:


dear group,
i have a data matrix with some replicate items with different values. I want to 
extract the row with max value.

for example:

x

  fruit weight
1  apple1.3
2  apple1.5
3  apple1.6
4 orange1.4
5 orange1.6


x is a data frame.
I want to extract unique items from fruits that has max weight.

that is:

3  apple1.6
5 orange1.6

I want to be able to use apply functions. Could some one lend some help please.

Thanks
srini

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to preserve date format while aggregating

2008-09-08 Thread Adam D. I. Kramer

Hi Erich,

Since min() is defined for numbers and not dates, the problem is in the
min() function. min() is converting from date format to number format.

Your best bet is to make this conversion explicit...such that it is
reversable. So, convert the date into UTC, then UTC to seconds since epoch,
then take the minimum, then convert back to UTC time.

This sounds like a pain...but that's basically what a version of min()
designed to work with dates would do. The reason this is a pain is basically
due to timezones:

Consider a comparison between x = 3:54 PM September 8 in California (right
now where I am) and y = 12:54 AM September 9 in Zurich (right now where you
are). Is it earlier here than there? Yes, because it's Sept 8 to your Sept
9. Is it earlier there than here? Yes, because your day started 56 minutes
ago, mine over 15 hours ago. Is it the same time here than there? Yes,
because our UTC times are equal.

So it's not clear what min should return, so min is not defined for dates.
However, min is defined for numbers, and dates can be converted to
numbers...but what those numbers actually mean is not necessarily clear.

--Adam

On Mon, 8 Sep 2008, Erich Studerus wrote:


Hi

I have a dataframe in which some subjects appear in more than one row. I
want to extract the subject-rows which have the minimum date per subject. I
tried the following aggregate function.

attach(dataframe.xy)

aggregate(Date,list(SubjectID),min)

Unfortunately, the format of the Date-column changes to numeric, when I'm
applying this function. How can I preserve the date format?

Thanks

Erich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving functions

2008-09-08 Thread Adam D. I. Kramer

Hi Robin,

source(filename.R)

will open filename.R from the working directory and behave as if you had
typed in its contents, line by line.

You can include a full path if you like, or use file.choose() for a file not
in your working directory.

--Adam

On Mon, 8 Sep 2008, Williams, Robin wrote:


Hi,
 Appologies for the simple nature of this question, I am unable to find
the answer in manuals (EG and introduciton to R).
 I have written a function in a text editor and saved it with an .R
extension. It is saved in my working directory. How can I run it, do I
need to use source? If so, how do I supply the arguments to the
function? Or does it need to be saved in a particular directory?
 Do I need a different file extension?
Many thanks for any help.


Robin Williams
Met Office summer intern - Health Forecasting
[EMAIL PROTECTED]



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about multiple regression

2008-09-08 Thread Adam D. I. Kramer

Hi Dimitri,

On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote:


Dear R-list,
maybe some of you could point me in the right direction:

Are you aware of any FREE Fortran or Java libraries/actual pieces of
code that are VERY efficient (time-wise) in running the regular linear
least-squares multiple regression?


You almost certainly want the LAPACK fortran libraries, avail at
http://www.netlib.org/lapack/

...the function of interest to you is probably called
dgels:

http://www.netlib.org/lapack/explore-html/dgels.f.html

...of course, this runs faster if you have a fast BLAS library installed.
These exist in many forms, and may already be installed on your system.

--Adam


More specifically, I have to run small regression models (between 1
and 15 predictors) on samples of up to N=700 but thousands and
thousands of them.

I am designing a simulation in R and running those regressions and R
itself is way too slow. So, I am thinking of compiling the regression
run itself in Fortran and Java and then calling it from R.

Thank you very much for any advice!

Dimitri Liakhovitski
MarketTools, Inc.
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test for equality of complicatedly related average correlations

2008-09-06 Thread Adam D. I. Kramer

Hi Ralph,

I had the same problem you do a few months ago, and realized that
the question I had (does time show a different effect for X than Y) was not
best modeled as differences between correlations across individuals, but as
whether time interacts with condition.

I answered this question with

library(nlme)
lme(obs ~ cond*time, random=~cond*time|subj)


...where obs is the responses on the X or Y variable, cond is a factor of
either X or Y, and subj is your subject variable. This fits a heirarchical
linear model to the data. The relationship between X and time is sig. diff.
from the relationship between Y and time if the cond:time fixed effect is
true.

This approach makes better use of your data, because when you correlate the
observations, you're effectively losing variability (because correlations
are doubly standardized) as well as degrees of freedom (you have 9 df within
each individual, but each correlation is only one number).

--Adam

On Sat, 6 Sep 2008, Ralph79 wrote:



Dear R-Users,

I am currently looking for a way to test the equality of two correlations
that are related in a very special way. Let me describe the situation with
an example.

- There are 100 respondents, and there are 2 points in time, t=1 and t=2.

- For each of the respondents and at each of the time points, I have
information on 10 X-variables and on 10 Y-variables.

- Based on this information, I calculate two correlations for each
respondent: cor(X[t=1],X[t=2]) and cor(Y[t=1],Y[t=2]), with X and Y being
the vectors of the corresponding 10 variables.

- Now I get the average correlations over the whole sample using Fishers
Z-transformation, i.e. I have mean(cor(X[t=1],X[t=2])) and
mean(cor(X[t=1],X[t=2])) and want to know if the mean correlations are
significantly different!


I haven't found any test that deals with exactly my situation. Therefore, I
simply apply a paired t-test based on the individual z-correlations. From
my point of view this should be ok, because of the z's normality. However, I
am unsure if there is a better way to test the hypothesis that I am
interested in?

I'd be grateful for any comment or hint.

Thank you very much,

Ralph

-
Ralph Wirth
University Erlangen-Nuremberg, Chair of Statistics
GfK Group, Department of Methods and Product Development

--
View this message in context: 
http://www.nabble.com/Test-for-equality-of-complicatedly-related-average-correlations-tp19346312p19346312.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lme: Testing variance components

2008-08-25 Thread Adam D. I. Kramer

Hello,

I've been making a decent amount of use of the lme function
recently, and I am aware that I can extract the variance and correlation
components with the VarCorr(model) function. However, I am interested in
testing these components as well...is there a module or function available
for testing these?

I understand there's some debate as to which test is best for a
given situation (e.g., Wald vs. likelihood ratio), as well as some good
arguments that it is perhaps ill-advised to test (variance is something to
be predicted; if there is literally no variance, then the estimate perfectly
predicts the outcome and the scientific question is basically answered; no
statistics necessary), but there are some circumstances in which a
nonsignificant variance component might be useful: For example, when
deciding (in an HLM) whether it is necessary or at all useful to include a
grouping factor.

Thanks in advance.

Adam D. I. Kramer
Ph.D. Candidate, Social and Personality Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coercing by/tapply to data.frame for more than two indices?

2008-05-03 Thread Adam D. I. Kramer

Dear Colleagues,

Apologies for a long email to ask what I feel may be a very simple
question; I figure it's better to overspecify my situation.

I was asked a question, recently, by a colleague in my department
about pre-aggregating variables, i.e., computing the mean of defined subsets
of a data frame. Naturally, I thought of the 'by' and 'tapply' functions, as
they have always been the solution for me. However, my colleague had three
indices, and as such needs to pay attention to the indices of the
output...this is to say, the create an array function of tapply doesn't
quite work because an array is not quite what we want.

Consider this data set:

df - data.frame(var1= factor(rep(rep(1:5,25*5),10)),
 var2= factor(rep(rep(1:5,each=25*5),10),
trial= rep(rep(1:25,25),10),
   id= factor(rep(1:10,each=5*5*25)),
score= rnorm(n=5*5*25*10) )

...this is to say, each of 10 ids has scores for 5 different levels of
var1 and 5 different levels of var2...across 25 trials. Basically, a
three-way crossed repeated measures design...where tapply does what I want
for a two-way design, it does not quite suit my purposes for a 3-way or
n-way for n  2.

The goal is to predict score from var1 and var2. The straightforward guess
of what to do would be to simply have the AOV function aggregate across
trials:

aov(score ~ var1*var2 + Error(id/(var1*var2)), data=df)

(or lm with defined contrasts)

...however, there are missing data on some trials for some people, which
makes this design unbalanced (i.e., it introduces a correlation between var1
and var2). Because my colleague knows (from a theoretical standpoint) that
he wants to analyze the mean, his ANOVA on the aggregated trial means WOULD
be balanced, which is to say, the analysis he wants to run would produce
different output from the above.

So, what he needs is a data frame with four variables instead of five: var1,
var2, id, and mscore (mean score), which has been averaged across trials.

Clearly (to me, it seems), the way to do this is with tapply:

x - tapply(df$score, list(df$var1,df$var2,df$id), mean, na.rm=TRUE)

...which returns a var1*var2 matrix for each ID, when what I want is a
observation-per-row data frame.

So, my question: How do I end up with what I'm looking for?

My current process involves setting df2 - data.frame(mscore=c(x), ...)
where ... is a bunch of factor(rep) columns that would specify the var1 var2
and id levels. My problem with this approach is that it seems like a hack;
it is not a general solution because I must use knowledge of the process by
which x was generated in order to get it right, and there's a decent
amount of room for unnoticed error on my part.

I suppose what I'm looking for is either a way to take by or tapply and have
it return a set of index variable columns based on the list of indices I
provide to it...or a way to collapse an n-way table into a single data frame
with index variables. Any suggestions?

Cordially,

Adam D. I. Kramer
Ph.D. Candidate, Social Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coercing by/tapply to data.frame for more than two indices?

2008-05-03 Thread Adam D. I. Kramer

Thanks very much...it is exactly what I needed, and I'm a bit embarassed
that I couldn't find it on my own.

One might consider adding aggregate to the See also: lines of by and
tapply. That would have prevented me from needing to email the list (which I
may have accidentally done twice; I apologize for that).

--Adam

On Sat, 3 May 2008, jim holtman wrote:


?aggregate


aggregate(df$score, list(df$var1, df$var2, df$id), mean, na.rm=TRUE)

   Group.1 Group.2 Group.3 x
1 1   1   1  0.1053576980
2 2   1   1  0.1514888520
3 3   1   1  0.1270477403
4 4   1   1 -0.0193129404
5 5   1   1  0.2574346931
6 1   2   1  0.0185013523
7 2   2   1 -0.0886420632
8 3   2   1 -0.1304342272
9 4   2   1 -0.0972963702
105   2   1 -0.1463502593



On Fri, May 2, 2008 at 6:43 PM, Adam D. I. Kramer [EMAIL PROTECTED] wrote:

Dear Colleagues,

   Apologies for a long email to ask what I feel may be a very simple
question; I figure it's better to overspecify my situation.

   I was asked a question, recently, by a colleague in my department
about pre-aggregating variables, i.e., computing the mean of defined subsets
of a data frame. Naturally, I thought of the 'by' and 'tapply' functions, as
they have always been the solution for me. However, my colleague had three
indices, and as such needs to pay attention to the indices of the
output...this is to say, the create an array function of tapply doesn't
quite work because an array is not quite what we want.

   Consider this data set:

df - data.frame(var1= factor(rep(rep(1:5,25*5),10)),
var2= factor(rep(rep(1:5,each=25*5),10),
   trial= rep(rep(1:25,25),10),
  id= factor(rep(1:10,each=5*5*25)),
   score= rnorm(n=5*5*25*10) )

...this is to say, each of 10 ids has scores for 5 different levels of
var1 and 5 different levels of var2...across 25 trials. Basically, a
three-way crossed repeated measures design...where tapply does what I want
for a two-way design, it does not quite suit my purposes for a 3-way or
n-way for n  2.

The goal is to predict score from var1 and var2. The straightforward guess
of what to do would be to simply have the AOV function aggregate across
trials:

aov(score ~ var1*var2 + Error(id/(var1*var2)), data=df)

(or lm with defined contrasts)

...however, there are missing data on some trials for some people, which
makes this design unbalanced (i.e., it introduces a correlation between var1
and var2). Because my colleague knows (from a theoretical standpoint) that
he wants to analyze the mean, his ANOVA on the aggregated trial means WOULD
be balanced, which is to say, the analysis he wants to run would produce
different output from the above.

So, what he needs is a data frame with four variables instead of five: var1,
var2, id, and mscore (mean score), which has been averaged across trials.

Clearly (to me, it seems), the way to do this is with tapply:

x - tapply(df$score, list(df$var1,df$var2,df$id), mean, na.rm=TRUE)

...which returns a var1*var2 matrix for each ID, when what I want is a
observation-per-row data frame.

So, my question: How do I end up with what I'm looking for?

My current process involves setting df2 - data.frame(mscore=c(x), ...)
where ... is a bunch of factor(rep) columns that would specify the var1 var2
and id levels. My problem with this approach is that it seems like a hack;
it is not a general solution because I must use knowledge of the process by
which x was generated in order to get it right, and there's a decent
amount of room for unnoticed error on my part.

I suppose what I'm looking for is either a way to take by or tapply and have
it return a set of index variable columns based on the list of indices I
provide to it...or a way to collapse an n-way table into a single data frame
with index variables. Any suggestions?

Cordially,

Adam D. I. Kramer
Ph.D. Candidate, Social Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coercing by/tapply to data.frame for more than two indices?

2008-05-02 Thread Adam D. I. Kramer

Dear Colleagues,

Apologies for a long email to ask what I feel may be a very simple
question; I figure it's better to overspecify my situation.

I was asked a question, recently, by a colleague in my department
about pre-aggregating variables, i.e., computing the mean of defined subsets
of a data frame. Naturally, I thought of the 'by' and 'tapply' functions, as
they have always been the solution for me. However, my colleague had three
indices, and as such needs to pay attention to the indices of the
output...this is to say, the create an array function of tapply doesn't
quite work because an array is not quite what we want.

Consider this data set:

df - data.frame(var1= factor(rep(rep(1:5,25*5),10)),
 var2= factor(rep(rep(1:5,each=25*5),10),
trial= rep(rep(1:25,25),10),
   id= factor(rep(1:10,each=5*5*25)),
score= rnorm(n=5*5*25*10) )

...this is to say, each of 10 ids has scores for 5 different levels of
var1 and 5 different levels of var2...across 25 trials. Basically, a
three-way crossed repeated measures design...where tapply does what I want
for a two-way design, it does not quite suit my purposes for a 3-way or
n-way for n  2.

The goal is to predict score from var1 and var2. The straightforward guess
of what to do would be to simply have the AOV function aggregate across
trials:

aov(score ~ var1*var2 + Error(id/(var1*var2)), data=df)

(or lm with defined contrasts)

...however, there are missing data on some trials for some people, which
makes this design unbalanced (i.e., it introduces a correlation between var1
and var2). Because my colleague knows (from a theoretical standpoint) that
he wants to analyze the mean, his ANOVA on the aggregated trial means WOULD
be balanced, which is to say, the analysis he wants to run would produce
different output from the above.

So, what he needs is a data frame with four variables instead of five: var1,
var2, id, and mscore (mean score), which has been averaged across trials.

Clearly (to me, it seems), the way to do this is with tapply:

x - tapply(df$score, list(df$var1,df$var2,df$id), mean, na.rm=TRUE)

...which returns a var1*var2 matrix for each ID, when what I want is a
observation-per-row data frame.

So, my question: How do I end up with what I'm looking for?

My current process involves setting df2 - data.frame(mscore=c(x), ...)
where ... is a bunch of factor(rep) columns that would specify the var1 var2
and id levels. My problem with this approach is that it seems like a hack;
it is not a general solution because I must use knowledge of the process by
which x was generated in order to get it right, and there's a decent
amount of room for unnoticed error on my part.

I suppose what I'm looking for is either a way to take by or tapply and have
it return a set of index variable columns based on the list of indices I
provide to it...or a way to collapse an n-way table into a single data frame
with index variables. Any suggestions?

Cordially,

Adam D. I. Kramer
Ph.D. Candidate, Social Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Omnibus main effects in summary.lme?

2008-01-10 Thread Adam D. I. Kramer
Hello,

I've been running some HLMs using the lme function quite happily; it
does what I want and I'm pretty sure I understand it.

The issue is that I'm currently trying to estimate a model with a
14-level nusiance factor as an independent variable...which makes the
output quite ugly. All I'm really interested in is the question of whether
these factor as a whole (and its interactions with other factors) are
significant.

The summary.aov function provides this sort of aggregation for lm
objects, but does not run on lme objects. I've also tried estimating the
full model and restricted model, leaving out a main effect or interaction
term and then using anova.lme to compare the models, but these models appear
to be being fit differently. Say I have l2, and then

l3 - update(l2, .~.-useful:nusience)
anova.lme(l3,l2)

...to see whether the interaction term is significant, produces the error,
Fitted objects with different fixed effects. REML comparisons are not
meaningful. Upon examination using summary(l3), it seems that the fixed
factors are indeed different.

So, my question is this: How do I estimate omnibus main effects for
multi-level factors and multi-level factor interactions in lme models?

Many thanks,
Adam D. I. Kramer
Ph.D. Student, Social and Personality Psychology
University of Oregon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.