[R] foreach + dopar: how to check progress of parallel computations?

2010-12-28 Thread Marius Hofert
Dear expeRts,

I use foreach to do parallel computations. Is it possible to have some progress 
output written while the computations are done? In the minimal example below, I 
just print a number (n) to check the progress. If you run this example with 
%do% instead of %dopar%, then the computations are done sequentially and 
the number n is printed to the console. I am looking for something similar but 
with %dopar%. In the minimal example you can see that n is not written to the 
console if the computations are done in parallel. How [with which construction] 
can I check the progress?

Cheers,

Marius 

## load packages
library(doSNOW)
library(Rmpi)
library(foreach)

## parameters
param.1 - 1:2 #c(a1, b1)
param.2 - 1:4 #c(a2, b2, c2, d2)

## setup cluster
cl - makeCluster(mpi.universe.size(), type =MPI)
registerDoSNOW(cl)

## main work
n - 1
res - foreach(p1 = param.1) %:% foreach(p2 = param.2) %dopar% {
print(n)
p1 * p2
n - n + 1
}

stopCluster(cl) # stop cluster

res # result
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian Belief Networks in R

2010-12-28 Thread Petr Savicky
On Thu, Dec 23, 2010 at 09:12:41AM -0500, Data Analytics Corp. wrote:
 Hi,
 
 Does anyone know of a package for or any implementation of a Bayesian 
 Belief Network in R?

Different types of graphical models in R including Bayesian networks are
described in CRAN Task View gR
  http://cran.at.r-project.org/web/views/gR.html

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] batch file output

2010-12-28 Thread Mikkel Grum
I run a batch file with the following command in Windows XP:

C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me\file.R 
C:\users\me\file.out 21

Is there any way to get only the output of R in file.out, without getting all 
the code from file.R too?

Any help greatly appreciated,
Mikkel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Link prediction in social network with R

2010-12-28 Thread Gábor Csárdi
Dear Eu,

On Wed, Dec 22, 2010 at 12:00 AM, EU JIN LOK ejl...@hotmail.com wrote:

 Dear R users

 I'm a novice user of R and have absolutely no prior knowledge of social 
 network analysis, so apologies if my question is trivial. I've spent alot of 
 time trying to solve this on my own but I really can't so hope someone here 
 can help me out. Cheers!

 The dataset:
 I'm trying to predict the existance of links (True or False) in a test set 
 using a training set. Both data sets are in an edgelist format, where User 
 IDs represents nodes in both columns with the 1st column directing to the 2nd 
 column (see figure 1 below). Using the AUC to evaluate the performance, I am 
 looking for the best algorithm to predict the existance of links in the test 
 data (50% are true and rest are false).

 Figure 1:
 training
 Vertices: 1133143
 Edges: 999
 Directed: TRUE
 Edges:

 [0]       105 -  850956
 [1]       105 - 1073420
 [2]       105 - 1102667
 [3]       165 -  888346
 [4]       165 -  579649
 [5]       165 -  136665
 etc..

 I'm having problems obtaining the probability scores for the links / edges as 
 most of the scores are for the nodes. An example of this is the graph.knn and 
 page.rank module in igraph.

 So my questions are:
 1) What do I need to do to obtain the scores for the links instead of the 
 nodes (I presume it must be a data preparation step that I must be missing 
 out)?

In general, most people are interested in the nodes of the network, so
most network indices are node level. If you want edge-level indices,
you can create another graph from yours, by transforming the edges
into vertices and vice-versa. Two vertices are connected in the new
graph, if the corresponding two edges in the old graph share an
incident vertex. However, I am sure that there are some vertex
measures that don't make sense for edges at all, so you need to be
careful with this, especially with the interpretation of the results.

Another possibility is to use the few edge-level indices, e.g. edge
betweenness, or just define analog edge measures for the existing
vertex measures.

 2) Which R package would be the best for running the various techniques - 
 Jackard index, Adamic-Adar, common neightbours, PropFlow, etc

The first three are implemented in igraph if I remember well.

 3) How to implement a supervised learning method such as random forest (I am 
 guessing I need to obtain a feature list but again, how can I get the scores 
 for the edges)?

I am not an expert on this, but there are are several R packages for
supervised methods, random forests as well, look around on CRAN.

I hope this helps, Best,
Gabor

 Hope I've explain my questions well but do let me know if more clarification 
 is need.

 Thanks in advance
 Eu Jin
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gabor Csardi gabor.csa...@unil.ch     UNIL DGM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] batch file output

2010-12-28 Thread David Winsemius


On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote:


I run a batch file with the following command in Windows XP:

C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me 
\file.R C:\users\me\file.out 21


Is there any way to get only the output of R in file.out, without  
getting all the code from file.R too?


Put a sink(file=C:\users\me\file2.out) in the file.R would be one  
way but your general strategy looks a bit strange. One does not  
generally use the interactive version of R for batch execution. See:


http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] batch file output

2010-12-28 Thread David Winsemius


On Dec 28, 2010, at 8:27 AM, David Winsemius wrote:



On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote:


I run a batch file with the following command in Windows XP:

C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me 
\file.R C:\users\me\file.out 21


Is there any way to get only the output of R in file.out, without  
getting all the code from file.R too?


Put a sink(file=C:\users\me\file2.out)


Would probably work better to use forward slashes.

in the file.R would be one way but your general strategy looks a bit  
strange. One does not generally use the interactive version of R for  
batch execution. See:


http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html

--


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] batch file output

2010-12-28 Thread Gabor Grothendieck
On Tue, Dec 28, 2010 at 8:09 AM, Mikkel Grum mi2kelg...@yahoo.com wrote:
 I run a batch file with the following command in Windows XP:

 C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me\file.R 
 C:\users\me\file.out 21

 Is there any way to get only the output of R in file.out, without getting all 
 the code from file.R too?

 Any help greatly appreciated,
 Mikkel

Try Rscript.exe in your R distribution.

Also in the batchfiles distribution, http://batchfiles.googlecode.com,
there is a file #Rscript.bat, that can be used to turn an R script
into a Windows batch file.   #Rscript without arguments gives
instructions.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gamma Lognormal Model

2010-12-28 Thread Michael Dewey

At 20:08 27/12/2010, Louisa wrote:


Dear,

I'm very new to R Gui and I have to make an assignment on Gamma Regressions.
Surfing on the web doesn't help me very much so i hope this forum may be a
step forward.


Well since you are so honest about it being homework try Googling for
lognormal gamma regression
The top hit from where I am sitting is an extensive set of notes with 
examples in R although beware the use of _ for -




The question sounds as follows:
The data set is in the library MASS
first install library(MASS)
then type data(mammals)
attach(mammals)


At this point you should complain that you are being taught poor 
practice as it is nearly always better to use the data= parameter and 
not attach data frames.



Assignment:
Fit the gamma model and lognormal model for the mammals data.

I appreciate any help you can provide.

Best Wishes,
Louisa

--
View this message in context: 
http://r.789695.n4.nabble.com/Gamma-Lognormal-Model-tp3165408p3165408.html

Sent from the R help mailing list archive at Nabble.com.


Michael Dewey
http://www.aghmed.fsnet.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Jaccard dissimilarity matrix for PCA

2010-12-28 Thread Flabbergaster

Hi
I have a large dataset, containing a wide range of binary variables.
I would like first of all to compute a jaccard matrix, then do a PCA on this
matrix, so that I finally can do a hierarchical clustering on the principal
components. 
My problem is, that I don't know how to compute the jaccard dissimilarity
matrix in R? Which package to use, and so on...
Can anybody help me?
Alternatively I'm search for another way to explore the clusters present in
my data.
Another problem is, that I have cases with missing values on different
variables.

Jacob 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading sas7bdat files into R

2010-12-28 Thread Frank Harrell

Whoops - thought I was replying to google medstats instead of r-help.
Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Reading-sas7bdat-files-into-R-tp3165608p3166047.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jaccard dissimilarity matrix for PCA

2010-12-28 Thread Marcelo Luiz de Laia
Flabbergaster jlunding at gmail.com writes:
 My problem is, that I don't know how to compute the jaccard dissimilarity
 matrix in R? Which package to use, and so on...

http://rss.acs.unt.edu/Rdoc/library/arules/html/dissimilarity.html

http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/vegdist.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jaccard dissimilarity matrix for PCA

2010-12-28 Thread David L Lorenz
Jacob,
  You might have a look at the vegan package. It might compute the Jaccard 
distance and it might have some other toolsa that you might be interested 
in.
Dave




From:
Flabbergaster jlund...@gmail.com
To:
r-help@r-project.org
Date:
12/28/2010 08:26 AM
Subject:
[R] Jaccard dissimilarity matrix for PCA
Sent by:
r-help-boun...@r-project.org




Hi
I have a large dataset, containing a wide range of binary variables.
I would like first of all to compute a jaccard matrix, then do a PCA on 
this
matrix, so that I finally can do a hierarchical clustering on the 
principal
components. 
My problem is, that I don't know how to compute the jaccard dissimilarity
matrix in R? Which package to use, and so on...
Can anybody help me?
Alternatively I'm search for another way to explore the clusters present 
in
my data.
Another problem is, that I have cases with missing values on different
variables.

Jacob 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] batch file output

2010-12-28 Thread Mikkel Grum
Thanks. The way I run it, I can determine what version of R to run with which 
script. Don't know how to do that with R CMD BATCH.

Placing options(echo = FALSE) in the infile solves my problem. I got that from 
the page you linked to.

Mikkel

--- On Tue, 12/28/10, David Winsemius dwinsem...@comcast.net wrote:

 From: David Winsemius dwinsem...@comcast.net
 Subject: Re: [R] batch file output
 To: David Winsemius dwinsem...@comcast.net
 Cc: Mikkel Grum mi2kelg...@yahoo.com, r-help@r-project.org
 Date: Tuesday, December 28, 2010, 8:30 AM
 
 On Dec 28, 2010, at 8:27 AM, David Winsemius wrote:
 
  
  On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote:
  
  I run a batch file with the following command in
 Windows XP:
  
  C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore
 C:\users\me\file.R C:\users\me\file.out 21
  
  Is there any way to get only the output of R in
 file.out, without getting all the code from file.R too?
  
  Put a sink(file=C:\users\me\file2.out)
 
 Would probably work better to use forward slashes.
 
  in the file.R would be one way but your general
 strategy looks a bit strange. One does not generally use the
 interactive version of R for batch execution. See:
  
  http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html
  
  --
 
 David Winsemius, MD
 West Hartford, CT
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jaccard dissimilarity matrix for PCA

2010-12-28 Thread Christian Hennig

jaccard in package prabclus computes a Jaccard matrix for you.

By the way, if you want to do hierarchical clustering, it doesn't seem to 
be a good idea to me to run PCA first. Why 
not cluster the dissimilarity matrix directly without information loss by 
PCA? (I should not make too general statements on this because generally 
how to cluster data always depends on the aim of clustering, the cluster 
concept you are interested in etc.)


prabclus also contains clustering methods for such data; have a 
look at the functions prabclust and hprabclust (however, they are 
documented as functions for clustering species distribution ranges, so if 
your application is different, you may have to think about whether and how 
to adapt them).


Hope this helps,
Christian




On Tue, 28 Dec 2010, Flabbergaster wrote:



Hi
I have a large dataset, containing a wide range of binary variables.
I would like first of all to compute a jaccard matrix, then do a PCA on this
matrix, so that I finally can do a hierarchical clustering on the principal
components.
My problem is, that I don't know how to compute the jaccard dissimilarity
matrix in R? Which package to use, and so on...
Can anybody help me?
Alternatively I'm search for another way to explore the clusters present in
my data.
Another problem is, that I have cases with missing values on different
variables.

Jacob
--
View this message in context: 
http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] batch file output

2010-12-28 Thread David Winsemius


On Dec 28, 2010, at 10:38 AM, Mikkel Grum wrote:

Thanks. The way I run it, I can determine what version of R to run  
with which script. Don't know how to do that with R CMD BATCH.


Seems as though something like this (using absolute path to the  
instance of R.exe)  should work:


C:\R\R-2.12.1\bin\R CMD BATCH [options] infile [outfile]

At least if I remember my command line Windows conventions  ... it's  
been a few years.


--
David.


Placing options(echo = FALSE) in the infile solves my problem. I got  
that from the page you linked to.


Mikkel

--- On Tue, 12/28/10, David Winsemius dwinsem...@comcast.net wrote:


From: David Winsemius dwinsem...@comcast.net
Subject: Re: [R] batch file output
To: David Winsemius dwinsem...@comcast.net
Cc: Mikkel Grum mi2kelg...@yahoo.com, r-help@r-project.org
Date: Tuesday, December 28, 2010, 8:30 AM

On Dec 28, 2010, at 8:27 AM, David Winsemius wrote:



On Dec 28, 2010, at 8:09 AM, Mikkel Grum wrote:


I run a batch file with the following command in

Windows XP:


C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore

C:\users\me\file.R C:\users\me\file.out 21


Is there any way to get only the output of R in

file.out, without getting all the code from file.R too?


Put a sink(file=C:\users\me\file2.out)


Would probably work better to use forward slashes.


in the file.R would be one way but your general

strategy looks a bit strange. One does not generally use the
interactive version of R for batch execution. See:


http://stat.ethz.ch/R-manual/R-patched/library/utils/html/BATCH.html

--


David Winsemius, MD
West Hartford, CT








David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Faster way to do it??...using apply?

2010-12-28 Thread M.Ribeiro

Hi, 
I have a simple task, but I am looking for a clever and fast way to do it:

I have a vector x with 0,1 or 2 and I want to create another vector y with
the same length following the rules:
If the element in x is equal to 0, the element in y is equal to 0
If the element in x is equal to 2, the element in y is equal to 1
If the element in x is equal to 1, the element in y is either 0 or 1 (sample
from c(0,1))

thus the vector
 x
 [,1]
[1,]0
[2,]2
[3,]1
[4,]2
[5,]0
[6,]1
[7,]2

could produce the vector y (this is one of the possibilities since y|x=1 is
either 0 or 1

 y
 [,1]
[1,]0
[2,]1
[3,]1
[4,]1
[5,]0
[6,]0
[7,]1


I know how to do this using for loops but I was wondering if you guys could
suggest a better way
Thanks

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166161.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster way to do it??...using apply?

2010-12-28 Thread Henrique Dallazuanna
Try this:

replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1)

On Tue, Dec 28, 2010 at 2:43 PM, M.Ribeiro mresende...@yahoo.com.br wrote:


 Hi,
 I have a simple task, but I am looking for a clever and fast way to do it:

 I have a vector x with 0,1 or 2 and I want to create another vector y with
 the same length following the rules:
 If the element in x is equal to 0, the element in y is equal to 0
 If the element in x is equal to 2, the element in y is equal to 1
 If the element in x is equal to 1, the element in y is either 0 or 1
 (sample
 from c(0,1))

 thus the vector
  x
 [,1]
 [1,]0
 [2,]2
 [3,]1
 [4,]2
 [5,]0
 [6,]1
 [7,]2

 could produce the vector y (this is one of the possibilities since y|x=1 is
 either 0 or 1

  y
 [,1]
 [1,]0
 [2,]1
 [3,]1
 [4,]1
 [5,]0
 [6,]0
 [7,]1


 I know how to do this using for loops but I was wondering if you guys could
 suggest a better way
 Thanks

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166161.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem applying McNemar's - Different values in SPSS and R

2010-12-28 Thread Manoj Aravind
Hi friends,
I get different values for McNemar's test in R and SPSS. Which one should i
rely on when the p values differ.
I came across this problem when i started learning R and seriously give up
on SPSS or any other proprietary software.
Thank u in advance

Output in SPSS follows

*Crosstab*


   hsc

Total

 ABN

NE

ABN

tvs

ABN

Count

40

3

43

 Row %

93.0%

7.0%

100.0%

 COL%

78.4%

30.0%

70.5%

  NE

Count

11

7

18

 Row %

61.1%

38.9%

100.0%

 COL%

21.6%

70.0%

29.5%

Total

Count

51

10

61

  Row %

83.6%

16.4%

100.0%

  COL%

100.0%

100.0%

100.0%



 * Chi-Square Tests*


  Value

Exact Sig. (2-sided)

McNemar Test

  .057(a)

N of Valid Cases

61

   a Binomial distribution used.

Output from R is as follows

 tvshsc-

+ matrix(c(40,11,3,7),

+ nrow=2,

+ dimnames=list(TVS=c(ABN,NE),

+ HSC=c(ABN,NE)))

 tvshsc

 HSC

TVS   ABN NE

  ABN  40  3

  NE   11  7

 mcnemar.test(tvshsc)


McNemar's Chi-squared test with continuity correction


data:  tvshsc

McNemar's chi-squared = 3.5, df = 1, p-value = 0.06137

Regards

Dr. B Manoj Aravind

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem applying McNemar's - Different values in SPSS and R

2010-12-28 Thread Marc Schwartz

On Dec 28, 2010, at 11:05 AM, Manoj Aravind wrote:

 Hi friends,
 I get different values for McNemar's test in R and SPSS. Which one should i
 rely on when the p values differ.
 I came across this problem when i started learning R and seriously give up
 on SPSS or any other proprietary software.
 Thank u in advance
 
 Output in SPSS follows
 
 *Crosstab*
 
 
   hsc
 
 Total
 
 ABN
 
 NE
 
 ABN
 
 tvs
 
 ABN
 
 Count
 
 40
 
 3
 
 43
 
 Row %
 
 93.0%
 
 7.0%
 
 100.0%
 
 COL%
 
 78.4%
 
 30.0%
 
 70.5%
 
  NE
 
 Count
 
 11
 
 7
 
 18
 
 Row %
 
 61.1%
 
 38.9%
 
 100.0%
 
 COL%
 
 21.6%
 
 70.0%
 
 29.5%
 
 Total
 
 Count
 
 51
 
 10
 
 61
 
  Row %
 
 83.6%
 
 16.4%
 
 100.0%
 
  COL%
 
 100.0%
 
 100.0%
 
 100.0%
 
 
 
 * Chi-Square Tests*
 
 
  Value
 
 Exact Sig. (2-sided)
 
 McNemar Test
 
  .057(a)
 
 N of Valid Cases
 
 61
 
   a Binomial distribution used.
 
 Output from R is as follows
 
 tvshsc-
 
 + matrix(c(40,11,3,7),
 
 + nrow=2,
 
 + dimnames=list(TVS=c(ABN,NE),
 
 + HSC=c(ABN,NE)))
 
 tvshsc
 
 HSC
 
 TVS   ABN NE
 
  ABN  40  3
 
  NE   11  7
 
 mcnemar.test(tvshsc)
 
 
 McNemar's Chi-squared test with continuity correction
 
 
 data:  tvshsc
 
 McNemar's chi-squared = 3.5, df = 1, p-value = 0.06137
 
 Regards
 
 Dr. B Manoj Aravind


The SPSS test appears to be an exact test, whereas the default R function does 
not perform an exact test, so you are not comparing Apples to Apples...

Try this using the 'exact2x2' CRAN package:

 require(exact2x2)
Loading required package: exact2x2
Loading required package: exactci

 mcnemar.exact(matrix(c(40, 11, 3, 7), 2, 2))

Exact McNemar test (with central confidence intervals)

data:  matrix(c(40, 11, 3, 7), 2, 2) 
b = 3, c = 11, p-value = 0.05737
alternative hypothesis: true odds ratio is not equal to 1 
95 percent confidence interval:
 0.04885492 1.03241985 
sample estimates:
odds ratio 
 0.2727273 


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster way to do it??...using apply?

2010-12-28 Thread M.Ribeiro

Hi Henrique,
Thanks for the fast answer,
The only problem in your code, which I think I didn't mention in my message
is that I would like one different random sampling procedure for each 1 in
my vector

The way it was written, it samples only once and replace by every 1:
 x = as.matrix(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
 replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1)
  [,1]
 [1,]1
 [2,]1
 [3,]1
 [4,]1
 [5,]1
 [6,]1
 [7,]1
 [8,]1
 [9,]1
[10,]1
[11,]1
[12,]1
[13,]1
[14,]1
[15,]1

Thanks

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jaccard dissimilarity matrix for PCA

2010-12-28 Thread Flabbergaster

This sounds like something I could use..
I'm kind of new with R, meaning I've having some minor troubles all the
time...
Say I have a range of binary(0,1) variables X1 to Xn, with missing data for
different cases.
At the moment my data is a binary indicator matrix; rows representing the i
individuals or subjects, columns representing presence(1)/absence(0) of
various characteristics. 
Actually I have 5 groups of variables (102 variables in total), describing
different aspects of the subject(s) I'm studying (people; i.e. refugees).
O - O1 to O43 
A - A1 to A38
R - R1 to R6
AP - AP1 to AP8
PT - PT1 to PT7

Can someone help me with the programming of a jaccard matrix in prabclus (or
in any other package). I'm having troubles defining the input-object to the
function, I think?
I get error messages like:
'x' must be an array of at least two dimensions
ERROR:  argument is not a matrix

Jacob


Christian Hennig wrote:
 
 jaccard in package prabclus computes a Jaccard matrix for you.
 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3166205.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem applying McNemar's - Different values in SPSS and R

2010-12-28 Thread Johannes Huesing
Marc Schwartz marc_schwa...@me.com [Tue, Dec 28, 2010 at 06:30:59PM CET]:
 
 On Dec 28, 2010, at 11:05 AM, Manoj Aravind wrote:
 
  Hi friends,
  I get different values for McNemar's test in R and SPSS. Which one should i
  rely on when the p values differ.

[...]
 
 
 The SPSS test appears to be an exact test, whereas the default R function 
 does not perform an exact test, so you are not comparing Apples to Apples...
 

Indeed, binom.test(11, 14) renders the same p-value as SPSS, whereas 
mcnemar.test() uses the approximation (|a_12 - a_21| - 1)²/(a_21 + a_12) 
with the -1 removed if correct=FALSE.

An old question of mine: Is there any reason not to use binom.test()
other than historical reasons?
-- 
Johannes Hüsing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:johan...@huesing.name  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster way to do it??...using apply?

2010-12-28 Thread Henrique Dallazuanna
Try this indeed

replace(replace(x, x == 1, sample(0:1, sum(x == 1), rep = TRUE)), x == 2, 1)

On Tue, Dec 28, 2010 at 3:14 PM, M.Ribeiro mresende...@yahoo.com.br wrote:


 Hi Henrique,
 Thanks for the fast answer,
 The only problem in your code, which I think I didn't mention in my message
 is that I would like one different random sampling procedure for each 1 in
 my vector

 The way it was written, it samples only once and replace by every 1:
  x = as.matrix(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
  replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1)
   [,1]
  [1,]1
  [2,]1
  [3,]1
  [4,]1
  [5,]1
  [6,]1
  [7,]1
  [8,]1
  [9,]1
 [10,]1
 [11,]1
 [12,]1
 [13,]1
 [14,]1
 [15,]1

 Thanks

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-tp3166161p3166203.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster way to do it??...using apply?

2010-12-28 Thread Jonathan P Daily
I don't know if it's any faster, but it is also possible this way:

y - ifelse(x ==1, round(runif(x)), sign(x))
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly

r-help-boun...@r-project.org wrote on 12/28/2010 12:48:04 PM:

 [image removed] 
 
 Re: [R] Faster way to do it??...using apply?
 
 Henrique Dallazuanna 
 
 to:
 
 M.Ribeiro
 
 12/28/2010 12:51 PM
 
 Sent by:
 
 r-help-boun...@r-project.org
 
 Cc:
 
 r-help
 
 Try this indeed
 
 replace(replace(x, x == 1, sample(0:1, sum(x == 1), rep = TRUE)), x == 
2, 1)
 
 On Tue, Dec 28, 2010 at 3:14 PM, M.Ribeiro mresende...@yahoo.com.br 
wrote:
 
 
  Hi Henrique,
  Thanks for the fast answer,
  The only problem in your code, which I think I didn't mention in my 
message
  is that I would like one different random sampling procedure for each 
1 in
  my vector
 
  The way it was written, it samples only once and replace by every 1:
   x = as.matrix(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
   replace(replace(x, x == 1, sample(0:1, 1)), x == 2, 1)
[,1]
   [1,]1
   [2,]1
   [3,]1
   [4,]1
   [5,]1
   [6,]1
   [7,]1
   [8,]1
   [9,]1
  [10,]1
  [11,]1
  [12,]1
  [13,]1
  [14,]1
  [15,]1
 
  Thanks
 
  --
  View this message in context:
  http://r.789695.n4.nabble.com/Faster-way-to-do-it-using-apply-
 tp3166161p3166203.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in combined for() and if() code

2010-12-28 Thread Nathan Miller
Hello,

I am trying to filter a data set like below so that the peaks in the Phase
value are more obvious and can be identified by a peak finding function
following the useful advise of Carl Witthoft. I have written the following

for(i in length(data$Phase)){
newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){
data$Phase[i+1]
}else{data$Phase[i]
}
}

I get the following error which I have not seen before when I paste the code
into R

Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
  missing value where TRUE/FALSE needed

I don't have much experience with such loops as I have tried to avoid using
them in the past. Can anyone identify the error(s) in the code I have
written or a simpler means of writing such a filter?

Thank you,
Nate


data=
Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in combined for() and if() code

2010-12-28 Thread Duncan Murdoch

On 28/12/2010 1:08 PM, Nathan Miller wrote:

Hello,

I am trying to filter a data set like below so that the peaks in the Phase
value are more obvious and can be identified by a peak finding function
following the useful advise of Carl Witthoft. I have written the following

for(i in length(data$Phase)){
newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){


When i is at its maximum, i+1 will be beyond the length of data$Phase, 
so you shouldn't use it as an index.


Duncan Murdoch


data$Phase[i+1]
}else{data$Phase[i]
}
}

I get the following error which I have not seen before when I paste the code
into R

Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
   missing value where TRUE/FALSE needed

I don't have much experience with such loops as I have tried to avoid using
them in the past. Can anyone identify the error(s) in the code I have
written or a simpler means of writing such a filter?

Thank you,
Nate


data=
 Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in combined for() and if() code

2010-12-28 Thread Uwe Ligges



On 28.12.2010 19:08, Nathan Miller wrote:

Hello,

I am trying to filter a data set like below so that the peaks in the Phase
value are more obvious and can be identified by a peak finding function
following the useful advise of Carl Witthoft. I have written the following

for(i in length(data$Phase)){


Nonsense: In this case the loop will only run once for i=length(data$Phase)

you probably want

for(i in seq_along(data$Phase)){



newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){


Nonsense:
1. if()... won't return any useful result.
2. i+1 is not within your data

Uwe Ligges



data$Phase[i+1]
}else{data$Phase[i]
}
}

I get the following error which I have not seen before when I paste the code
into R

Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
   missing value where TRUE/FALSE needed

I don't have much experience with such loops as I have tried to avoid using
them in the past. Can anyone identify the error(s) in the code I have
written or a simpler means of writing such a filter?

Thank you,
Nate


data=
 Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem applying McNemar's - Different values in SPSS and R

2010-12-28 Thread Marc Schwartz

On Dec 28, 2010, at 11:47 AM, Johannes Huesing wrote:

 Marc Schwartz marc_schwa...@me.com [Tue, Dec 28, 2010 at 06:30:59PM CET]:
 
 On Dec 28, 2010, at 11:05 AM, Manoj Aravind wrote:
 
 Hi friends,
 I get different values for McNemar's test in R and SPSS. Which one should i
 rely on when the p values differ.
 
 [...]
 
 
 The SPSS test appears to be an exact test, whereas the default R function 
 does not perform an exact test, so you are not comparing Apples to Apples...
 
 
 Indeed, binom.test(11, 14) renders the same p-value as SPSS, whereas 
 mcnemar.test() uses the approximation (|a_12 - a_21| - 1)²/(a_21 + a_12) 
 with the -1 removed if correct=FALSE.
 
 An old question of mine: Is there any reason not to use binom.test()
 other than historical reasons?


I may be missing the context of your question, but I frequently see exact 
binomial tests being used when one is comparing the presumptively known 
probability of some dichotomous characteristic versus that which is observed in 
an independent sample. For example, in single arm studies where one is 
comparing an observed event rate against a point estimate for a presumptive 
historical control.

I also see the use of exact binomial (Clopper-Pearson) confidence intervals 
being used when one wants to have conservative CI's, given that the nominal 
coverage of these are at least as large as requested. That is, 95% exact CI's 
will be at least that large, but in reality can tend to be well above that, 
depending upon various factors. This is well documented in various papers.

I generally tend to use Wilson CI's for binomial proportions when reporting 
analyses. I have my own code but these are implemented in various R functions, 
including Frank's binconf() in Hmisc.

HTH,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in combined for() and if() code

2010-12-28 Thread Duncan Murdoch

On 28/12/2010 1:13 PM, Uwe Ligges wrote:



On 28.12.2010 19:08, Nathan Miller wrote:

Hello,

I am trying to filter a data set like below so that the peaks in the Phase
value are more obvious and can be identified by a peak finding function
following the useful advise of Carl Witthoft. I have written the following

for(i in length(data$Phase)){


Nonsense: In this case the loop will only run once for i=length(data$Phase)


Yes, I missed that.



you probably want

for(i in seq_along(data$Phase)){



newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){


Nonsense:
1. if()... won't return any useful result.


if (cond) v1 else v2

does return a value (either v1 or v2).  So the construction

newphase = if (abs(data$Phase[i+1] 

will set newphase to a new value each time through the loop.  That's 
probably not what was intended...



2. i+1 is not within your data


That's the only one I saw.

Duncan Murdoch



Uwe Ligges



data$Phase[i+1]
}else{data$Phase[i]
}
}

I get the following error which I have not seen before when I paste the code
into R

Error in if (abs(data$Phase[i + 1] - data$Phase[i])   6) { :
missing value where TRUE/FALSE needed

I don't have much experience with such loops as I have tried to avoid using
them in the past. Can anyone identify the error(s) in the code I have
written or a simpler means of writing such a filter?

Thank you,
Nate


data=
  Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in combined for() and if() code

2010-12-28 Thread David Winsemius


On Dec 28, 2010, at 1:08 PM, Nathan Miller wrote:


Hello,

I am trying to filter a data set like below so that the peaks in the  
Phase
value are more obvious and can be identified by a peak finding  
function
following the useful advise of Carl Witthoft. I have written the  
following


for(i in length(data$Phase)){
newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){
data$Phase[i+1]
}else{data$Phase[i]
}
}

I get the following error which I have not seen before when I paste  
the code

into R

Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
 missing value where TRUE/FALSE needed

I don't have much experience with such loops as I have tried to  
avoid using

them in the past. Can anyone identify the error(s) in the code I have
written or a simpler means of writing such a filter?


Sometimes it's more informative to look at the data first. Here's a  
plot of the data with the first and second differences underneath


 plot(data, ylim=c(-5, max(data$Phase)) )
lines(data$Time[-1], diff(data$Phase) )
lines(data$Time[-(1:2)], diff(diff(data$Phase)), col=red)

Your data had rather flat-topped maxima. These maxima are defined by  
the  interval between the times when the first differences are zero  
(OR go from positive to negative)  AND the second differences are  
negative (OR zero).


There is a package on CRAN:

http://cran.r-project.org/web/packages/msProcess/index.html

  that purports to do peak finding. I would think the local maxima  
in you data might need some filtering and presumably the mass-spec  
people have need of that too.







Thank you,
Nate


data=
   Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gamma Lognormal Model

2010-12-28 Thread Louisa

Thank you Michael!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Gamma-Lognormal-Model-tp3165408p3166318.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] another superscript problem

2010-12-28 Thread Tyler Dean Rudolph
Part of the reason I was having difficulty is that I'm trying to add a
legend with more than one element:

plot(1,1)
obv = 5
txt = Pop mean

# this works
legend(topleft, legend=bquote(.(txt) == .(obv)*degree))

# but this doesn't
legend(topleft, legend=c(bquote(.(txt) == .(obv)*degree), Von Mises
distribution))

How can I go about using multiple legend elements with mathematical/latin
annotation in both?

Tyler


On Mon, Dec 27, 2010 at 8:22 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2010-12-27 16:51, David Winsemius wrote:


 On Dec 27, 2010, at 6:40 PM, T.D. Rudolph wrote:


 I've exceeded the maximum time I am willing to accept for solving
 simple
 problems so I thank all in advance for your assistance.

 I am trying to plot text combined with an object value and a
 superscript.

 obv = 5
 text = Population mean =
 ss = ^o # degrees

 Something like this (very naive so you get the idea):
 expression(text, obv, ss)

 paste(text, obv) # works ...but of course I either lose the value of
 obv or
 the superscript in the translation using expression, and bquote
 doesn't seem
 to accept the asterisk before the first element.


 I had trouble figuring out your real intent, since you have only been
 describing what didn't work but see if this his halfway there:

 plot(1,1)
   obv = 5
   text = Population mean =  # you should really avoid using function
 names for variables!
   text(.8,.8, bquote(.(text)~.(obv)^o) )

 The ^o seems a bit of a dodge but it looks ok so if you're happy, go


 Instead of ^o, use the word 'degree' (see ?plotmath)

  text(.8,.8, bquote(.(text)~.(obv)*degree) )

 and, personally, I would let R handle the '=' sign:

  txt - Pop mean
  text(1, 1.1, bquote(.(txt) == .(obv)*degree))

 Peter Ehlers

  with it.


 I am a little bungled by the varying syntax used for bquote and all
 the
 rest; sometimes R seems more complicated than it needs to be for a
 relatively simple problem (and for me this is one of those cases!)...

 Tyler
 --



 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in combined for() and if() code

2010-12-28 Thread Nathan Miller
Hi all,

I haven't solved the problem of filtering the data, but I have managed to
find all the peaks in the data despite their relatively flat nature using
peaks() in the IDPmisc package. It works really well for my data and the
ability to set a lower threshold for peaks to report is convenient as well.

Maybe I'll came back to the data filtering problem later.

Thanks for your help and comments,
Nate

On Tue, Dec 28, 2010 at 10:49 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Dec 28, 2010, at 1:08 PM, Nathan Miller wrote:

  Hello,

 I am trying to filter a data set like below so that the peaks in the Phase
 value are more obvious and can be identified by a peak finding function
 following the useful advise of Carl Witthoft. I have written the following

 for(i in length(data$Phase)){
 newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){
 data$Phase[i+1]
 }else{data$Phase[i]
 }
 }

 I get the following error which I have not seen before when I paste the
 code
 into R

 Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
  missing value where TRUE/FALSE needed

 I don't have much experience with such loops as I have tried to avoid
 using
 them in the past. Can anyone identify the error(s) in the code I have
 written or a simpler means of writing such a filter?


 Sometimes it's more informative to look at the data first. Here's a plot of
 the data with the first and second differences underneath

  plot(data, ylim=c(-5, max(data$Phase)) )
 lines(data$Time[-1], diff(data$Phase) )
 lines(data$Time[-(1:2)], diff(diff(data$Phase)), col=red)

 Your data had rather flat-topped maxima. These maxima are defined by the
  interval between the times when the first differences are zero (OR go from
 positive to negative)  AND the second differences are negative (OR zero).

 There is a package on CRAN:

 http://cran.r-project.org/web/packages/msProcess/index.html

   that purports to do peak finding. I would think the local maxima in
 you data might need some filtering and presumably the mass-spec people have
 need of that too.





 Thank you,
 Nate


 data=
   Time Phase
 1  0.000 15.18
 2  0.017 13.42
 3  0.034 11.40
 4  0.051 18.31
 5  0.068 25.23
 6  0.085 33.92
 7  0.102 42.86
 8  0.119 42.87
 9  0.136 42.88
 10 0.153 42.88
 11 0.170 42.87
 12 0.186 42.88
 13 0.203 42.88
 14 0.220 42.78
 15 0.237 33.50
 16 0.254 24.81
 17 0.271 17.20
 18 0.288 10.39
 19 0.305 13.97
 20 0.322 16.48
 21 0.339 14.75
 22 0.356 20.80
 23 0.373 25.79
 24 0.390 31.25
 25 0.407 39.89
 26 0.423 40.04
 27 0.440 40.05
 28 0.457 40.05
 29 0.474 40.05
 30 0.491 40.05
 31 0.508 40.06
 32 0.525 40.07
 33 0.542 32.23
 34 0.559 23.90
 35 0.576 17.86
 36 0.592 11.63
 37 0.609 12.78
 38 0.626 13.12
 39 0.643 10.93
 40 0.660 10.63
 41 0.677 10.82
 42 0.694 11.84
 43 0.711 20.44
 44 0.728 27.33
 45 0.745 34.22
 46 0.762 41.55
 47 0.779 41.55
 48 0.796 41.55
 49 0.813 41.53
 50 0.830 41.53
 51 0.847 41.52
 52 0.864 41.52
 53 0.880 41.53
 54 0.897 41.53
 55 0.914 33.07
 56 0.931 25.12
 57 0.948 19.25
 58 0.965 11.30
 59 0.982 12.48
 60 0.999 13.85
 61 1.016 13.62
 62 1.033 12.62
 63 1.050 19.39
 64 1.067 25.48
 65 1.084 31.06
 66 1.101 39.49
 67 1.118 39.48
 68 1.135 39.46
 69 1.152 39.45
 70 1.169 39.43
 71 1.185 39.42
 72 1.202 39.42
 73 1.219 39.41
 74 1.236 39.41
 75 1.253 37.39
 76 1.270 29.03
 77 1.287 20.61
 78 1.304 14.07
 79 1.321  9.12

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] another superscript problem

2010-12-28 Thread baptiste auguie
Hi,

this seems to work,

plot.new()
legend(topleft, legend=as.expression(c(bquote(.(txt) ==
.(obv)*degree), Von Mises distribution)))


HTH,

baptiste

On 28 December 2010 20:17, Tyler Dean Rudolph
tylerdeanrudo...@gmail.com wrote:
 legend(topleft, legend=c(bquote(.(txt) == .(obv)*degree), Von Mises
 distribution))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] batch file output

2010-12-28 Thread Joshua Wiley
On Tue, Dec 28, 2010 at 5:09 AM, Mikkel Grum mi2kelg...@yahoo.com wrote:
 I run a batch file with the following command in Windows XP:

 C:\R\R-2.12.1\bin\Rterm.exe --no-save --no-restore C:\users\me\file.R 
 C:\users\me\file.out 21

I'm a bit surprised this worked for you...did you customize your build
so that Rterm.exe is in \bin\ rather than a subfolder for its specific
architecture?

 Is there any way to get only the output of R in file.out, without getting all 
 the code from file.R too?

I did not see anyone else mention this, so I wanted to add that with R
CMD BATCH you can add the --slave argument to avoid needing to add
options(echo = FALSE) to all your scripts.  The --no-timing option
stops proc.time() from running at the end.

For example from the command prompt I can run 'sample.R' using 32 bit R:

C:\R\R-2.12.1\bin\i386\R CMD BATCH --slave --no-timing sample.R
sampleout.txt

HTH,

Josh

 Any help greatly appreciated,
 Mikkel

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levelplot blocks size

2010-12-28 Thread Greg Snow
Here is a basic example: 

tmp.df - expand.grid( x= 1:100, y=1:100 )
tmp.df$z - with(tmp.df, x+2*y)

library(lattice)
levelplot( z ~ x + y, data=tmp.df )

tx2 - with(tmp.df, cut(x, seq(0.5, 100.5, 10) ) )
ty2 - with(tmp.df, cut(y, seq(0.5, 100.5, 20) ) )

tmp.df2 - aggregate(tmp.df, list( tx2, ty2 ), mean )

levelplot( z ~ x + y, data=tmp.df2 )


Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of jonathan
 Sent: Monday, December 27, 2010 7:00 PM
 To: r-help@r-project.org
 Subject: Re: [R] levelplot blocks size
 
 
 Thanks for your help.
 
 Might you be able to explain in a little more detail how to use those
 functions to solve this specific problem?
 
 I'm happy to put in the work myself and have looked up those functions
 but
 am new to R and still a little unsure about how I would go about using
 those
 functions to solve my problem.
 
 Thanks,
 
 Jonathan
 --
 View this message in context: http://r.789695.n4.nabble.com/levelplot-
 blocks-size-tp3089972p3165638.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem applying McNemar's - Different values in SPSS and R

2010-12-28 Thread Johannes Huesing
Marc Schwartz marc_schwa...@me.com [Tue, Dec 28, 2010 at 07:14:49PM CET]:
[...]
  An old question of mine: Is there any reason not to use binom.test()
  other than historical reasons?
 

(I meant in lieu of the McNemar approximation, sorry if some
misunderstanding ensued).

  I may be missing the context of your question, but I frequently see
 exact binomial tests being used when one is comparing the
 presumptively known probability of some dichotomous characteristic
 versus that which is observed in an independent sample. For example,
 in single arm studies where one is comparing an observed event rate
 against a point estimate for a presumptive historical control.

In the McNemar context (as used by SPSS) the null hypothesis is p=0.5.

  I also see the use of exact binomial (Clopper-Pearson) confidence
 intervals being used when one wants to have conservative CI's, given
 that the nominal coverage of these are at least as large as
 requested. That is, 95% exact CI's will be at least that large, but
 in reality can tend to be well above that, depending upon various
 factors. This is well documented in various papers.

Confidence intervals are not that regularly used in the McNemar context, as the
conditional probability a  b given they are unequal is not that much an
interpretable quantity as is the event probability in a single arm study.

 I generally tend to use Wilson CI's for binomial proportions when
  reporting analyses. I have my own code but these are implemented in
  various R functions, including Frank's binconf() in Hmisc.

Thanks for the hint.
-- 
Johannes Hüsing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:johan...@huesing.name  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using lapply and split to plot up subsets of a vector

2010-12-28 Thread karmakiller

Hi,

I would like to be able to plot data from each of the sp.id on individual
plots. At the moment I can plot all the data on one graph with the following
commands but I cannot figure out how to get individual graph for each sp.id.

i- function(df)plot(lnbm,ln.o2con,data=df)
j- lapply(split(one,one$sp.id),i)

I have searched on the net and through the threads here but I cannot find
anything that matches what I am trying to do. Any help would be greatly
appreciated.

Thanx
-- 
View this message in context: 
http://r.789695.n4.nabble.com/using-lapply-and-split-to-plot-up-subsets-of-a-vector-tp3166634p3166634.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lapply and split to plot up subsets of a vector

2010-12-28 Thread Phil Spector
The data= argument to plot only makes sense if the first 
argument is a formula.  So if you change the plot command

in your function to

   plot(ln.o2con~lnbm,data=df)

you might get what you want.  But I would suggest you take a
look at the plot produced by

library(lattice)
xyplot(ln.o2con~lnbm|sp.id,data=one)

which might be more useful.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Tue, 28 Dec 2010, karmakiller wrote:



Hi,

I would like to be able to plot data from each of the sp.id on individual
plots. At the moment I can plot all the data on one graph with the following
commands but I cannot figure out how to get individual graph for each sp.id.

i- function(df)plot(lnbm,ln.o2con,data=df)
j- lapply(split(one,one$sp.id),i)

I have searched on the net and through the threads here but I cannot find
anything that matches what I am trying to do. Any help would be greatly
appreciated.

Thanx
--
View this message in context: 
http://r.789695.n4.nabble.com/using-lapply-and-split-to-plot-up-subsets-of-a-vector-tp3166634p3166634.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filling up holes

2010-12-28 Thread analys...@hotmail.com
I have a data frame with three columns

client ID | date | value


For each cilent ID I want to determine Min date and Max date and for
any dates in between that are missing I want to insert a row

Client ID | date| NA

Any help would be appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] linear regression for grouped data

2010-12-28 Thread Entropi ntrp
Hi,
I have been examining large data and need to do simple linear regression
with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y,  and  I need to
regress x on y for each distinct value of ID. Specifically, for the set of
data corresponding to each of the 4 values of ID (76,111,121,168) in the
below data, I should invoke linear regression 4 times. The challenge is
that, the length of the ID vector is around 2 and therefore linear
regression must be done automatically for each distinct value of ID.

   IDx y
 76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756 4.8
121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9  168
37739 29.7  168 37746 97.4
I was wondering whether there is an easy way to group data based on the
values of ID in R  so that linear regression can be done easily for each
group determined by each value of ID. Or, is the only way to construct
loops  with 'for' or 'while'  in which a matrix is generated for each
distinct value of ID  that stores corresponding values of x and y by
screening the entire ID vector?

Thanks in advance,

Yasin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear regression for grouped data

2010-12-28 Thread David Winsemius


On Dec 28, 2010, at 9:23 PM, Entropi ntrp wrote:


Hi,
I have been examining large data and need to do simple linear  
regression

with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y,  and  I  
need to
regress x on y for each distinct value of ID. Specifically, for the  
set of
data corresponding to each of the 4 values of ID (76,111,121,168) in  
the
below data, I should invoke linear regression 4 times. The challenge  
is

that, the length of the ID vector is around 2 and therefore linear
regression must be done automatically for each distinct value of ID.

  IDx y
76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111  
35756 4.8
121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727  
21.9  168

37739 29.7  168 37746 97.4


Let's say that is a dataframe named indat. Try:

 lapply(split(indat, as.factor(indat$ID)), function(df) {lm(y ~ x,  
data=df)} )


I was wondering whether there is an easy way to group data based on  
the
values of ID in R  so that linear regression can be done easily for  
each

group determined by each value of ID. Or, is the only way to construct
loops  with 'for' or 'while'  in which a matrix is generated for each
distinct value of ID  that stores corresponding values of x and y by
screening the entire ID vector?

Thanks in advance,

Yasin


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filling up holes

2010-12-28 Thread Bill.Venables
Dear 'analyst41' (it would be a courtesy to know who you are)

Here is a low-level way to do it.  

First create some dummy data

 allDates - seq(as.Date(2010-01-01), by = 1, length.out = 50) 
 client_ID - sample(LETTERS[1:5], 50, rep = TRUE)
 value - 1:50
 date - sample(allDates)
 clientData - data.frame(client_ID, date, value)

At this point clientData has 50 rows, with 5 clients, each with a sample of 
datas.  Everything is in random order execept value.

Now write a little function to fill out a subset of the data consisting of one 
client's data only:
 
 fixClient - function(cData) {
+   dateRange - range(cData$date)
+   dates - seq(dateRange[1], dateRange[2], by = 1)
+   fullSet - data.frame(client_ID = as.character(cData$client_ID[1]),
+ date = dates, value = NA)
+ 
+   fullSet$value[match(cData$date, dates)] - cData$value
+   fullSet  
+ }

Now split up the data, apply the fixClient function to each section and 
re-combine them again:

 allData - do.call(rbind,
+lapply(split(clientData, clientData$client_ID), fixClient))

Check:

 head(allData)
client_ID   date value
A.1 A 2010-01-0436
A.2 A 2010-01-0518
A.3 A 2010-01-06NA
A.4 A 2010-01-07NA
A.5 A 2010-01-08NA
A.6 A 2010-01-0949
 

Seems OK.  At this point the data are in sorted order by client and date, but 
that should not matter.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of analys...@hotmail.com
Sent: Wednesday, 29 December 2010 10:45 AM
To: r-help@r-project.org
Subject: [R] filling up holes

I have a data frame with three columns

client ID | date | value


For each cilent ID I want to determine Min date and Max date and for
any dates in between that are missing I want to insert a row

Client ID | date| NA

Any help would be appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear regression for grouped data

2010-12-28 Thread Bill.Venables
library(nlme)
lmList(y ~ x | factor(ID), myData)

This gives a list of fitted model objects. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Entropi ntrp
Sent: Wednesday, 29 December 2010 12:24 PM
To: r-help@r-project.org
Subject: [R] linear regression for grouped data

Hi,
I have been examining large data and need to do simple linear regression
with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y,  and  I need to
regress x on y for each distinct value of ID. Specifically, for the set of
data corresponding to each of the 4 values of ID (76,111,121,168) in the
below data, I should invoke linear regression 4 times. The challenge is
that, the length of the ID vector is around 2 and therefore linear
regression must be done automatically for each distinct value of ID.

   IDx y
 76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756 4.8
121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9  168
37739 29.7  168 37746 97.4
I was wondering whether there is an easy way to group data based on the
values of ID in R  so that linear regression can be done easily for each
group determined by each value of ID. Or, is the only way to construct
loops  with 'for' or 'while'  in which a matrix is generated for each
distinct value of ID  that stores corresponding values of x and y by
screening the entire ID vector?

Thanks in advance,

Yasin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear regression for grouped data

2010-12-28 Thread Dennis Murphy
Hi:

There are some advantages to taking a plyr approach to this type of problem.
The basic idea is to fit a linear model to each subgroup and save the
results in a list, from which you can extract what you want piece by piece.

library(plyr)

# One of those SAS style data sets...
 df - data.frame(matrix(scan(), ncol = 3, byrow = TRUE))
1: 76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756
4.8
16: 121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9
168
32: 37739 29.7  168 37746 97.4
37:
Read 36 items

# A little cleanup:
names(df) - c('ID', 'x', 'y')
df$ID - factor(df$ID)

# Fit a linear model to each sub-data frame identified by ID
# and send the results to a list object

# dlply takes a data frame as input and outputs a list
# the grouping variable is ID
# the argument d in the function is the sub-data frame of a given ID
lr1 - dlply(df, .(ID), function(d) lm(y ~ x, data = d))

# So you can do things like:

# Grab the model coefficients
# (input is a list, output is a data frame)
 ldply(lr1, function(m) m$coef)
   ID  (Intercept)   x
1  76  -11699.  0.32176123
2 111 680.6007 -0.01890034
3 1213900.5051 -0.10174534
4 168 -136322.4296  3.61371841

# export the R^2 values
 ldply(lr1, function(m) summary(m)$r.squared)
   IDV1
1  76 0.3718840
2 111 1.000
3 121 0.9367437
4 168 0.6993811

# Extract the residuals and predicted values to another list
 llply(lr1, function(m) cbind(m$resid, m$fitted))
$`76`
[,1] [,2]
1 -20.762884 36.56288
2  24.867175 42.03282
3  -4.104291 69.70429

$`111`
  [,1] [,2]
40 10.3
50  4.8

$`121`
[,1]  [,2]
6  0.4371678 15.562832
7 -0.4610869 15.461087
8  1.2610869  8.338913
9 -1.2371678  8.237168

$`168`
[,1] [,2]
10   9.57509 12.32491
11 -25.98953 55.68953
12  16.41444 80.98556

# Plot the residuals vs. fitted values for each model (don't blink :)
# the _ means that no object is returned; the plot is a side effect
l_ply(lr1, function(d) plot(resid(d) ~ fitted(d)))

These are just some examples; clearly, there is a lot more one could do with
this type of structure.

HTH,
Dennis

On Tue, Dec 28, 2010 at 6:23 PM, Entropi ntrp entropy...@gmail.com wrote:

 Hi,
 I have been examining large data and need to do simple linear regression
 with the data which is grouped based on the values of a particular
 attribute. For instance, consider three columns : ID, x, y,  and  I need to
 regress x on y for each distinct value of ID. Specifically, for the set of
 data corresponding to each of the 4 values of ID (76,111,121,168) in the
 below data, I should invoke linear regression 4 times. The challenge is
 that, the length of the ID vector is around 2 and therefore linear
 regression must be done automatically for each distinct value of ID.

   IDx y
  76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756 4.8
 121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9  168
 37739 29.7  168 37746 97.4
 I was wondering whether there is an easy way to group data based on the
 values of ID in R  so that linear regression can be done easily for each
 group determined by each value of ID. Or, is the only way to construct
 loops  with 'for' or 'while'  in which a matrix is generated for each
 distinct value of ID  that stores corresponding values of x and y by
 screening the entire ID vector?

 Thanks in advance,

 Yasin

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.