Re: [R] Finding distance matrix for categorical data

2010-06-14 Thread kapil mahant
Thanks Guys , 
I am able to generate the distance matrix for mixed column values ( categorical 
and ordinal ) using daisy function 

But can anyone tell me how to generate clusters out of it , The point being i 
dont know the number of cluster beforehand 

Let me give an overview of the problem i am trying to solve is 

Given a dataset , something like below 


var1 var2   var3 Size
element1-1   yesx present  100
element1-2   no y absent294
element1-3   maybe   x absent45


The first 3 variables being categorical and last one being ordinal 

I need to do the following 
1 )  Generate clusters out of it  ( let say they are training clusters ) 
 I am able to compute distance matrix ( using daisy ) , but not 
sure how to create unknown numbers of clusters ,  dbscan work on a distance 
matrix 
2 )  Once that is done i want to spread some new data points in the above plot 
space ( lets say these are test points ) 
3)   Find out which test points are lying within a boundary  of any above 
discovered training clusters   

If anyone know how to get this done then please let me know 
Its for an academic project and i am unable to make any progress 

Thanks and Regards
K




From: Ingmar Visser i.vis...@uva.nl

Sent: Fri, 11 June, 2010 2:19:33 PM
Subject: Re: [R] Finding distance matrix for categorical data

latent class analysis may be more appropriate depending on your hypotheses,
best, Ingmar



e:

All,

How can we find a distance matrix for categorical data

ie.  given a csv below

   var1 var2var3var4
element1-1   yesx a k
element1-2   no y b l
element1-3   maybe   y c  m

how can i compute the distance matrix between all the elements

Actually i need it to create clusters on top of it

Thanks  Regards
Kapil


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cforest and Random Forest memory use

2010-06-14 Thread Matthew OKane
Answers added below.

Thanks again,
Matt

On 11 June 2010 14:28, Max Kuhn mxk...@gmail.com wrote:

 Also, you have not said:

  - your OS: Windows Server 2003 64-bit
  - your version of R: 2.11.1 64-bit
  - your version of party: 0.9-9995



  - your code:  test.cf -(formula=badflag~.,data =
 example,control=cforest_control

 (teststat = 'max', testtype =
'Teststatistic', replace = FALSE, ntree = 500, savesplitstats = FALSE,mtry =
10))


 - what Large data set means:  1 million observations, 40+ variables,
 around 200MB
  - what very large model objects means - anything which breaks

 So... how is anyone suppose to help you?

 Max


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer() with no intercept

2010-06-14 Thread Meyners, Michael
John, 
Why would you want to fit the model without intercept if you seemingly need it? 
Anyway, I assume that the intercept from your first model just moves into the 
random effects -- you have intercepts there for worker and day, so any of these 
(or both) will absorb it. No surprise that the estimates for the covariates 
only differ slightly, it should be that way. 
What you plot (your second call to panel.lines) is not the correct model, as 
you omit the intercept from the work and day (which is 0 or at least pretty 
close to it if you include the overall intercept in your model). That's why 
your red line is on the top edge (note that the intercept is negative). 
I'm therefore not sure that the model without intercept makes a lot of sense, 
but you consider posting related questions rather to the mixed-models SIG, 
where you might get more erudite comments than from me.
HTH, Michael


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of array chip
Sent: Saturday, June 12, 2010 1:07
To: r-help@r-project.org
Subject: [R] lmer() with no intercept

Hi, I asked this before, but haven't got any response. So would like to have 
another try. thanks for help. Also tried twice to join the  model mailing list 
so that I can ask question there, but still haven't got permission to join that 
list yet.



===


Hi, I am wondering how I can specify no intercept in a mixed model using 
lmer(). 

Here is an example dataset attached (test.txt). There are 3 workers, in 5 
days, measured a response variable y on independent variable x. I want to 
use a quadratic term (x2 in the dataset) to model the relationship between y 
and x.

test-read.table(test.txt,sep='\t',header=T)

If I just simply use lm() and ignore worker and day, so that I can try both a 
linear regression with and without an intercept, here is what I get:

lm(y~x+x2, data=test)
Coefficients:
(Intercept)x   x2  
-1.77491040.1099160   -0.0006152

lm(y~x+x2-1, data=test)
Coefficients:
 x  x2  
0.0490097  -0.0001962

Now, I want to try mixed model considering worker and day as random effect. 

With an intercept:

lmer(y~x+x2+(1|worker)+(1|day), data=test)
Fixed effects:
  Estimate Std. Error t value
(Intercept) -1.324e+00  4.490e-01  -2.948
x1.117e-01  8.563e-03  13.041
x2  -6.357e-04  7.822e-05  -8.127

Without an intercept:

lmer(y~x+x2+(1|worker)+(1|day)-1, data=test)
Fixed effects:
 Estimate Std. Error t value
x   1.107e-01  8.528e-03  12.981
x2 -6.304e-04  7.805e-05  -8.077

It seems working fine. But if you look at the fixed effect coefficients of both 
mixed models, the coefficients for x and x2 are not much different, regardless 
of whether an intercept is included or not. This is not the case for simple 
linear regression using lm() on the top.

If I plot all 4 models in the following plot:

xyplot(y~x,groups=worker,test, col.line = grey, lwd = 2,
, panel = function(x,y) { 
panel.xyplot(x,y, type='p')
x-sort(x)
panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) 
panel.lines(x,0.1107*x-0.0006304*x*x,col='red') 
panel.lines(x,0.04901*x-0.0001962*x*x,col='blue') 
panel.lines(x,-1.7749+0.10992*x-0.0006152*x*x,col='green') 


  }) 

As you can see, the mixed model without intercept (red line) does not fit the 
data very well (it's at the top edge of the data, instead of in the middle of 
the data), so I guess I did something wrong here.

Can anyone make any suggestions?

Thanks

John


  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sharing experience - installing R Spatial Views

2010-06-14 Thread Rainer M Krug
On Sat, Jun 12, 2010 at 2:37 AM, Hendro Wibowo hendrohwib...@gmail.comwrote:

 Hi Guys,

 I would like to share my experience when installing the Spatial views
 packages for R. I could not install 32 packages which are parts of the
 Spatial views, and I use google-search and search to solve ALL those
 problems for about 2 days. I hope maybe somebody would benefit from my
 experience.

 I admitted that I do not have excellent programming skills at all. So,
 perhaps some of steps I did are not necessary to solve the problem at hands.
 But it works and I am happy.

 I was lucky if the problem I have was already solved by someone else, or if
 the problem is just a matter of installing another R packages that depends
 on other packages. Sometimes it has something to do not with R package but
 with a package that has to be installed in my operating system (in my case
 LinuxMint 8, based on ubuntu karmic). so I just guessed the nature of the
 problem, and try to solve it via already solved similar problem.

 i found many problems solved direct or indirect from Nabble - OSGeo FOSS4G
 websites. of course many problem are also solved from other website other
 than Nabble, but I forgot their address, as I google search again and again
 for all those problems. But I do appreciate everyone who post their
 problems, solved or not solved, so I can solved my own problem, either
 directly or indirectly (by using the idea and guessing from already solved
 problem).

 I found that we need to change from gcc version 4 to 3, in order to some
 packages to be installed (I forgot which one). The script i used to achieve
 that, taken from the internet, is placed at the bottom of this message.

 and finally below here are lists of problems and solutions of installing
 Spatial Views.
 

.
.
.
SNIP

The main complication here seems to be that packages have (obviously)
dependencies and that especially Task View packages have MANY dependencies
which can not be resolved by R but on the operating system (Linux and Mac -
Windows is a different story) level.

To make the installation easier, would it be possible to put these
dependencies on the website of the task views, and possibly even provide
commands for the most widely used Linux distros on how to install them? This
could save quite some time in installing those.

Cheers,

Rainer

-- 
NEW GERMAN FAX NUMBER!!!

Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:   +27 - (0)83 9479 042
Fax:+27 - (0)86 516 2782
Fax:+49 - (0)321 2125 2244
email:  rai...@krugs.de

Skype:  RMkrug
Google: r.m.k...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] random colour

2010-06-14 Thread suman dhara
Sir,
I want to plot 5 curve on a single graph. I want to give random colour on
it. How can I do this?

Regards,
Suman Dhara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using latticeExtra plotting confidence intervals

2010-06-14 Thread Joe P King
So if the categories was race (A and B) and I had a male and female nested in 
each group, that would give me 4 different data points but I want two data 
points within each panel.

Joe King
206-913-2912
j...@joepking.com
Never throughout history has a man who lived a life of ease left a name worth 
remembering. --Theodore Roosevelt



-Original Message-
From: Deepayan Sarkar [mailto:deepayan.sar...@gmail.com] 
Sent: Sunday, June 13, 2010 10:15 PM
To: Joe P King
Cc: r-help@r-project.org
Subject: Re: [R] using latticeExtra plotting confidence intervals

On Sun, Jun 13, 2010 at 10:10 AM, Joe P King j...@joepking.com wrote:
 I am wanting to plot a 95% confidence band using segplot, yet I am wanting
 to have groups. For example if I have males and females, and then I have
 them in different races, I want the racial groups in different panels. I
 have this minor code, completely made up but gets at what I am wanting, 4
 random samples and 4 samples of confidence, I know how to get A  B into one
 panel and CD in to another but how do I get the x axis to label them
 properly and have it categorized as two. I am not sure what to have to the
 left side of the formula. This is the example code:

(1) Your code results in

 length(categories)
[1] 2
 length(mu)
[1] 4

which makes the formula mu~ci.upper+ci.lower|categories meaningless.

(2) You are effectively plotting xyplot(mu ~ mu | categories), with
additional confidence intervals in one direction. I'm sure that's not
what you want, but it's not clear what it is that you do actually
want.

-Deepayan






 library(lattice)

 library(latticeExtra)



 sample1-rnorm(100,10,2)

 sample2-rnorm(100,50,3)

 sample3-rnorm(100,20,2)

 sample4-rnorm(100,40,1)

 mu1-mean(sample1)

 ci.upper1-mu1+2*2

 ci.lower1-mu1-2*2

 mu2-mean(sample2)

 ci.upper2-mu2+2*3

 ci.lower2-mu2-2*3

 mu3-mean(sample3)

 ci.upper3-mu3+2*2

 ci.lower3-mu3-2*2

 mu4-mean(sample4)

 ci.upper4-mu4+2*1

 ci.lower4-mu4-2*1

 categories-c(A,B)



 mu-cbind(mu1,mu2,mu3,mu4)

 ci.upper-cbind(ci.upper1,ci.upper2,ci.upper3,ci.upper4)

 ci.lower-cbind(ci.lower1,ci.lower2,ci.lower3,ci.lower4)

 segplot(mu~ci.upper+ci.lower|categories, centers = mu, horizontal=FALSE)



 I also tried this



 seq1-seq(1,4,1)

 segplot(seq1~ci.upper+ci.lower|categories, centers = mu,horizontal=FALSE)



 but it also gives poor x axis, I know this is probably an elementary problem
 but any help would be greatly appreciated.



 Heres my data structure, sorry for bombarding with all the code.



 structure(c(9.85647167881417, 50.1856561919426, 19.8477661576365,

 39.8575819498655, 13.8564716788142, 56.1856561919426, 23.8477661576365,

 41.8575819498655, 5.85647167881417, 44.1856561919426, 15.8477661576365,

 37.8575819498655), .Dim = c(1L, 12L), .Dimnames = list(NULL,

c(mu1, mu2, mu3, mu4, ci.upper1, ci.upper2, ci.upper3,

ci.upper4, ci.lower1, ci.lower2, ci.lower3, ci.lower4

)))

 ---

 Joe King, M.A.

 Ph.D. Student

 University of Washington - Seattle

 Office: 404A

 Miller Hall

 206-913-2912

  mailto:j...@joepking.com j...@joepking.com

 ---

 Never throughout history has a man who lived a life of ease left a name
 worth remembering. --Theodore Roosevelt




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: random colour

2010-06-14 Thread Petr PIKAL
Hi

Would you like to know which colour is for which curve? If yes use 
parameter col=1:5 in your plotting call. 
If not you can try col = sample(colours(), 5) instead

Regards
Petr


r-help-boun...@r-project.org napsal dne 14.06.2010 07:05:43:

 Sir,
 I want to plot 5 curve on a single graph. I want to give random colour 
on
 it. How can I do this?
 
 Regards,
 Suman Dhara
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta analysis with repeated measure-designs?

2010-06-14 Thread Gerrit Hirschfeld
Hi, 

thanks for the references I will try the sensitivitiy-analysis in R and try out 
winbugs if that does not work (little afraid of switching programmes). 

I also had an idea for a reasonable estimate of the correlations. Some studies 
report both results from paired t-tests and means and SDs, and thus allow to 
calculate two estimates for d one based on M and SD alone the other on t. The 
difference between the two estimates should be systematically related to the 
correlations of measures.

I will keep you posted, if I have a solution or hit a wall.

efachristo and dank je wel!

Gerrit


On 12.06.2010, at 15:59, Viechtbauer Wolfgang (STAT) wrote:

 Dear Gerrit,
 
 the most appropriate approach for data of this type would be a proper 
 multivariate meta-analytic model (along the lines of Kalaian  Raudenbush, 
 1996). Since you do not know the correlations of the reaction time 
 measurements across conditions for the within-subject designs, a simple 
 solution is to guestimate those correlations and then conduct sensitivity 
 analyses to make sure your conclusions do not depend on those guestimates.
 
 Best,
 
 --
 Wolfgang Viechtbauerhttp://www.wvbauer.com/
 Department of Methodology and StatisticsTel: +31 (0)43 388-2277
 School for Public Health and Primary Care   Office Location:
 Maastricht University, P.O. Box 616 Room B2.01 (second floor)
 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)
 
 
 Original Message
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Gerrit Hirschfeld Sent: Saturday, June 12, 2010 12:45
 To: r-help@r-project.org
 Subject: [R] meta analysis with repeated measure-designs?
 
 Dear all,
 
 I am trying to run a meta analysis of psycholinguistic reaction-time
 experiments with the meta package. The problem is that most of the
 studies have a within-subject designs and use repeated measures ANOVAs to
 analyze their data. So at present it seems that there are three
 non-optimal ways to run the analysis.
 
 1. Using metacont() to estimate effect sizes and standard errors. But as
 the different sores are dependent this would result in biased estimators
 (Dunlap, 1996). Suppose I had the correlations of the measures (which I
 do not) would there by an option to use them in metacont() ?
 
 2. Use metagen() with an effect size that is based on the reported F for
 the contrasts but has other disadvantages (Bakeman, 2005). The problem I
 am having with this is that I could not find a formular to compute the
 standard error of partial eta squared. Any Ideas?
 
 3. Use metagen() with r computed from p-values (Rosenthal, 1994) as
 effect size with the problem that sample-size affects p as much as effect
 size.
 
 Is there a fourth way, or data showing that correlations can be neglected
 as long as they are assumed to be similar in the studies?
 Any ideas are much apprecciated.
 
 best regards
 Gerrit
 
 __
 Gerrit Hirschfeld, Dipl.-Psych.
 
 Psychologisches Institut II
 Westfälische Wilhelms-Universität
 Fliednerstr. 21
 48149 Münster
 Germany
 
 psycholinguistics.uni-muenster.de
 GerritHirschfeld.de
 Fon.: +49 (0) 251 83-31378
 Fon.: +49 (0) 234 7960728
 Fax.: +49 (0) 251 83-34104
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
Gerrit Hirschfeld, Dipl.-Psych.

Psychologisches Institut II
Westfälische Wilhelms-Universität
Fliednerstr. 21
48149 Münster
Germany

psycholinguistics.uni-muenster.de
GerritHirschfeld.de
Fon.: +49 (0) 251 83-31378
Fon.: +49 (0) 234 7960728
Fax.: +49 (0) 251 83-34104

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to convert data frame row to vector?

2010-06-14 Thread Jörg Meyer

   Hi,
   I have a data frame that looks like this:

   ID
   Var
   Var2
   Var3
   xxx
   100
   909
   920
   yyy
   110
   720
   710
   zzz
   140
   680
   690

   I can load the file and produce a plot from values from one column.
data - read.csv(test.csv)
barplot(data$Var2)
   So far it's fine.
   The data frame however has more columns with integer values in reality and I
   want to plot from consecutive values of the SAME ROW.
   Something like:
barplot(data[1, 2:3])
   or more generic:
barplot(data[row.id, from.field:to.field])
   Unfortunately this no longer is a vector! I understand that it could contain
   non-numeric values which in my data frame is not the case.
   How can I convert a horizontal one-line-data-frame into a vector?
   Any help appreciated,
   best regards,
   Jörg
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random colour

2010-06-14 Thread Jim Lemon

On 06/14/2010 03:05 PM, suman dhara wrote:

Sir,
I want to plot 5 curve on a single graph. I want to give random colour on
it. How can I do this?


Hi Suman,

col-rgb(runif(5),runif(5),runif(5))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] discrete (binary) choice panel data

2010-06-14 Thread Bunny, lautloscrew.com
Dear all, 

I haven´t used R for panel data analysis so far, but now I am looking for a 
package respectively some starting point for binary choice panel data analysis 
in R. 
For starters most of my effects are individual and my dependent variable is 
just binary.  Thanks for any suggestions in advance!

best 

matt
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: How to convert data frame row to vector?

2010-06-14 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 14.06.2010 11:05:39:

 
Hi,
I have a data frame that looks like this:
 
ID
Var
Var2
Var3
xxx
100
909
920
yyy
110
720
710
zzz
140
680
690
 
I can load the file and produce a plot from values from one column.
 data - read.csv(test.csv)
 barplot(data$Var2)
So far it's fine.
The data frame however has more columns with integer values in 
reality and I
want to plot from consecutive values of the SAME ROW.
Something like:
 barplot(data[1, 2:3])
or more generic:
 barplot(data[row.id, from.field:to.field])

E.g. unlist
barplot(unlist(data[row.id, from.field:to.field]))

Regards
Petr


Unfortunately this no longer is a vector! I understand that it could 
contain
non-numeric values which in my data frame is not the case.
How can I convert a horizontal one-line-data-frame into a vector?
Any help appreciated,
best regards,
Jörg
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clustering algorithms don't find obvious clusters

2010-06-14 Thread Henrik Aldberg
Thank you Etienne, this seems to work like a charm. Also thanks to the rest
of you for your help.


Henrik

On 11 June 2010 13:51, Cuvelier Etienne ecuscim...@gmail.com wrote:



 Le 11/06/2010 12:45, Henrik Aldberg a écrit :

  I have a directed graph which is represented as a matrix on the form


 0 4 0 1

 6 0 0 0

 0 1 0 5

 0 0 4 0


 Each row correspond to an author (A, B, C, D) and the values says how many
 times this author have cited the other authors. Hence the first row says
 that author A have cited author B four times and author D one time. Thus
 the
 matrix represents two groups of authors: (A,B) and (C,D) who cites each
 other. But there is also a weak link between the groups. In reality this
 matrix is much bigger and very sparce but it still consists of distinct
 groups of authors.


 My problem is that when I cluster the matrix using pam, clara or agnes the
 algorithms does not find the obvious clusters. I have tried to turn it
 into
 a dissimilarity matrix before clustering but that did not help either.


 The layout of the clustering is not that important to me, my primary
 interest is the to get the right nodes into the right clusters.





 Hello Henrik,
 You can use a graph clustering using the igraph package.
 Example:

 library(igraph)
 simM-NULL
 simM-rbind(simM,c(0, 4, 0, 1))
 simM-rbind(simM,c(6, 0, 0, 0))
 simM-rbind(simM,c(0, 1, 0, 5))
 simM-rbind(simM,c(0, 0, 4, 0))
 G - graph.adjacency( simM,weighted=TRUE,mode=directed)
 plot(G,layout=layout.kamada.kawai)

 ### walktrap.community
 wt - walktrap.community(G, modularity=TRUE)
 wmemb - community.to.membership(G, wt$merges,
steps=which.max(wt$modularity)-1)

 V(G)$color - rainbow(3)[wmemb$membership+1]
 plot(G)

 I hope  it helps

 Etienne

  Sincerely


 Henrik

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Install Rmpi

2010-06-14 Thread Paco Pastor

Hi everyone

As I couldn't succeed with manual installation of Rmpi I decided to 
start again from the beginning. I removed R and MPICH in my Ubuntu Hardy 
installation. Then, to avoid any dependencies problems I have installed 
MPICH and R from synaptic, not from sources. But now I can't install Rmpi.


An error message appears when trying to install Rmpi, you can find in 
http://ubuntuone.com/p/71x/


Is it the time to upgrade to the latest Ubuntu version and build a new 
system?


Any help would be greatly appreciated.


--
---
Francisco Pastor
Meteorology department
Fundación CEAM
p...@ceam.es
http://www.ceam.es/ceamet - http://www.ceam.es
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90
---
Usuario Linux registrado: 363952

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Install Rmpi

2010-06-14 Thread Dirk Eddelbuettel
On Mon, Jun 14, 2010 at 01:12:05PM +0200, Paco Pastor wrote:
 Hi everyone

 As I couldn't succeed with manual installation of Rmpi I decided to  
 start again from the beginning. I removed R and MPICH in my Ubuntu Hardy  
 installation. Then, to avoid any dependencies problems I have installed  
 MPICH and R from synaptic, not from sources. But now I can't install 
 Rmpi.

All you need is

sudo apt-get install r-cran-rmpi

but that will reply on Open MPI. Any reason you need MPICH?

Dirk


 An error message appears when trying to install Rmpi, you can find in  
 http://ubuntuone.com/p/71x/

 Is it the time to upgrade to the latest Ubuntu version and build a new  
 system?

 Any help would be greatly appreciated.


 -- 
 ---
 Francisco Pastor
 Meteorology department
 Fundación CEAM
 p...@ceam.es
 http://www.ceam.es/ceamet - http://www.ceam.es
 Parque Tecnologico, C/ Charles R. Darwin, 14
 46980 PATERNA (Valencia), Spain
 Tlf. 96 131 82 27 - Fax. 96 131 81 90
 ---
 Usuario Linux registrado: 363952

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding an order for an hclust (dendrogram) object without intersections

2010-06-14 Thread Tal Galili
o.k,

I found a working solution for this.
If anyone in the future will face the problem, here is something that works
(Although it is probably not the best solution out there)

Best,
Tal


#--
problematic.tree -
structure(list(merge = structure(c(-3, -24, 1, -25, 4, -27, 5,
 -22, 8, 3, 9, -5, 10, 11, 12, 6, 14, 7, 18, 19, 20, 21, 22, 23,
24, 25, -13, -23, -14, -21, -20, -26, -19, -1, 2, -15, -7, -4,
 -9, -18, -6, -17, -12, 17, -2, -8, -10, -16, 13, 15, 16, -11), .Dim =
c(26L,
2L)), height = c(0.0833, 0.0867, 0.117,
 0.136507936507937, 0.220634920634921, 0.622,
0.674603174603175,
0.823, 1.06349206349206, 1.27698412698413, 1.37,
 2.00952380952381, 2.2975, 2.39, 2.686667,
2.9, 3.14736842105263, 3.55634920634921, 3.7921768707483, 3.84183673469388,
 3.93817373103087, 4.54464285714286, 4.81438464274599, 5.10895156778615,
5.36142237562854, 6.19122779197967), order = c(1L, 2L, 3L, 4L,
 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 11L), labels = 1:27), .Names =
c(merge,
 height, order, labels), class = hclust)
plot(problematic.tree )

order.a.tree - function(tree)
{
num.of.leafs - length(tree$order)
 matrix.to.order - NULL
 for(i in 1:(num.of.leafs))
{
 matrix.to.order - cbind(matrix.to.order ,  cutree(tree, k = i))
}
 id - seq_len(dim(matrix.to.order)[1])
matrix.to.order - data.frame(id, matrix.to.order)
 print(matrix.to.order)
require(doBy) # The ordering engine of this function is based on orderBy
from the doBy package
 ordered.matrix - eval(parse(text = paste(orderBy(~,
paste(colnames(matrix.to.order)[-1], collapse = +),   ,
data=matrix.to.order
 print(ordered.matrix)
 tree$order - (ordered.matrix$id)
 return(tree)
}
plot(order.a.tree(problematic.tree))
#--



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Sun, Jun 13, 2010 at 8:42 PM, Tal Galili tal.gal...@gmail.com wrote:

 o.k,
 I found an example where my algorithm can't fix the tree order.
 But I don't know how to resolve it.

 Here is the code to reproduce the problem:



 # 

 order.a.tree - function(tree)
 {
 num.of.leafs - length(tree$order)

 for(i in 1:(num.of.leafs))
 {
  tree$order -  order( cutree(tree, k = i))
 }
  return(tree)
 }

 problematic.tree -
 structure(list(merge = structure(c(-3, -24, 1, -25, 4, -27, 5,
  -22, 8, 3, 9, -5, 10, 11, 12, 6, 14, 7, 18, 19, 20, 21, 22, 23,
 24, 25, -13, -23, -14, -21, -20, -26, -19, -1, 2, -15, -7, -4,
  -9, -18, -6, -17, -12, 17, -2, -8, -10, -16, 13, 15, 16, -11), .Dim =
 c(26L,
 2L)), height = c(0.0833, 0.0867,
 0.117,
  0.136507936507937, 0.220634920634921, 0.622,
 0.674603174603175,
 0.823, 1.06349206349206, 1.27698412698413, 1.37,
  2.00952380952381, 2.2975, 2.39, 2.686667,
 2.9, 3.14736842105263, 3.55634920634921, 3.7921768707483,
 3.84183673469388,
  3.93817373103087, 4.54464285714286, 4.81438464274599, 5.10895156778615,
 5.36142237562854, 6.19122779197967), order = c(1L, 2L, 3L, 4L,
  5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 11L), labels = 1:27), .Names =
 c(merge,
  height, order, labels), class = hclust)
 plot(order.a.tree(problematic.tree ))

 # 


 Thanks,
 Tal




 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --




 On Sun, Jun 13, 2010 at 7:39 PM, Tal Galili tal.gal...@gmail.com wrote:

 Thanks Charles.

 In the meantime, I found out the following code does the trick.
 But I am wondering if:
 1) I might have made a mistake in it somewhere
 2) If there are other (smarter) ways of going about this.

 Here is the solution I wrote:



 # -

 order.a.tree - function(tree)
 {
 num.of.leafs - length(tree$order)
 for(i in 2:(num.of.leafs-1))
  {
 tree$order -  order( cutree(tree, k = i))
  }
 return(tree)
 }


 #Example:
 a - list()  # initialize empty object
 # define merging pattern:
 #negative numbers are leaves,
 #positive are merged clusters (defined by row number in $merge)
 a$merge - matrix(c(-1, -2,
   -3, -4,

[R] Html help

2010-06-14 Thread Murray Jorgensen

I have just installed R 2.11.1 on my XP laptop.

I like html help for browsing but text help for on-the-fly look-ups. I 
was a bit surprised when I was asked to choose between them during the 
installation. I chose text, thinking I could fix the html help later, 
which is what I am trying to do now.


Now when I ask for html help my browser goes to

'http://-ip-number-/doc/html/index.html'

instead of where I want on my computer:

C:\apps\R\R-2.11.1\doc\html\index.html

Now I can go where I want manually but then the package list on


C:\apps\R\R-2.11.1\doc\html\packages.html

does not include all the packages that I have installed and linked. I 
don't want to read my html help from the web because sometimes I am 
off-line or on a slow connection.


How do I go about getting a local set of html help files?

Cheers,   Murray Jorgensen

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: m...@waikato.ac.nz  majorgen...@ihug.co.nzFax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441   Mobile 021 0200 8350

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Large Data

2010-06-14 Thread Meenakshi

HI,

I want to import 1.5G CSV file in R.
But the following error comes:

'Victor allocation 12.4 size' 

How to read the large CSV file in R .

Any one can help me?

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Design of experiments for Choice-Based Conjoint Analysis (CBC)

2010-06-14 Thread A.D.

Hello,

I would like to know if there is any function in R which allows to make
designs of experiments for Choice-Based Conjoint studies ?

I have already checked the topic on  design of experiments with R  and
looked at the different libraries.
I tried to make my design with the optFedorov function but I haven't found
how it can allow to have balanced design (with the same number of case for
each level of the factors)...

So, if someone has been already in that case, any help would be appreciated
^^

Thanks.

A.D.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Design-of-experiments-for-Choice-Based-Conjoint-Analysis-CBC-tp2254077p2254077.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Design of experiments for Choice-Based Conjoint Analysis (CBC)

2010-06-14 Thread A.D.

Hello,

I would like to know if there is any function in R which allows to make
designs of experiments for Choice-Based Conjoint studies ?

I have already checked the topic on  design of experiments with R  and
looked at the different libraries.
I tried to make my design with the optFedorov function but I haven't found
how it can allow to have balanced design (with the same number of case for
each level of the factors)...

So, if someone has been already in that case, any help would be appreciated
^^

Thanks.

A.D. 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Design-of-experiments-for-Choice-Based-Conjoint-Analysis-CBC-tp2254142p2254142.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to change default help settings from factory default html

2010-06-14 Thread Katya Mauff
Hi all
 
Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)
 
I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?
 
Thanks
Katya


 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] list matching

2010-06-14 Thread Yuan Jian
Hello,
 
I could not find a clear solution for the follow question. please allow me 
to ask. thanks
 
mynames=cbind(c('a','b'),c(11,22))
lst=list(a=c(1,2), b=5)

now I try to combine mynames and lst:
a  1  11   
a  2  11
b  5  22
 
thanks
jian
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large Data

2010-06-14 Thread Joris Meys
And this one is only from last week. Please, read the posting guides carefully.

 Cheers
Joris


-- Forwarded message --
From: Joris Meys jorism...@gmail.com
Date: Sat, Jun 5, 2010 at 11:04 PM
Subject: Re: [R] What is the largest in memory data object you've
worked with in R?
To: Nathan Stephens nwsteph...@gmail.com
Cc: r-help r-help@r-project.org


You have to take some things into account :
- the maximum memory set for R might not be the maximum memory available
- R needs the memory not only for the dataset. Matrix manipulations
require frquently double of the amount of memory taken by the dataset.
- memory allocation is important when dealing with large datasets.
There is plenty of information about that
- R has some packages to get around memory problems with big datasets.

Read this discussione for example :
http://tolstoy.newcastle.edu.au/R/help/05/05/4507.html

and this page of Matthew Keller is a good summary too :
http://www.matthewckeller.com/html/memory.html

Cheers
Joris

On Sat, Jun 5, 2010 at 12:32 AM, Nathan Stephens nwsteph...@gmail.com wrote:
 For me, I've found that I can easily work with 1 GB datasets.  This includes
 linear models and aggregations.  Working with 5 GB becomes cumbersome.
 Anything over that, and R croaks.  I'm using a dual quad core Dell with 48
 GB of RAM.

 I'm wondering if there is anyone out there running jobs in the 100 GB
 range.  If so, what does your hardware look like?

 --Nathan

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi
meenakshichidamba...@gmail.com wrote:

 HI,

 I want to import 1.5G CSV file in R.
 But the following error comes:

 'Victor allocation 12.4 size'

 How to read the large CSV file in R .

 Any one can help me?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large Data

2010-06-14 Thread Joris Meys
http://www.google.com/#hl=ensource=hpq=R+big+data+setsaq=faqi=g1aql=oq=gs_rfai=fp=686584f57664

Cheers
Joris

On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi
meenakshichidamba...@gmail.com wrote:

 HI,

 I want to import 1.5G CSV file in R.
 But the following error comes:

 'Victor allocation 12.4 size'

 How to read the large CSV file in R .

 Any one can help me?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list matching

2010-06-14 Thread Henrique Dallazuanna
Try this:

cbind(mynames[rep(seq(nrow(mynames)), sapply(lst, length)),], unlist(lst))

On Mon, Jun 14, 2010 at 9:06 AM, Yuan Jian jayuan2...@yahoo.com wrote:

 Hello,

 I could not find a clear solution for the follow question. please allow me
 to ask. thanks

 mynames=cbind(c('a','b'),c(11,22))
 lst=list(a=c(1,2), b=5)

 now I try to combine mynames and lst:
 a  1  11
 a  2  11
 b  5  22

 thanks
 jian




[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG
Hi R help,

Hi R help,

Which is the easiest (most elegant) way to force aov to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
 DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
 summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since cars and driver are 
treated as numerical variables by aov.

BRUTE FORCE SOLUTION

Manually transforming cars and driver into categorical variables, I obtain 
the correct result:
 DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
 summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Which is the easiest (most elegant) way to force driver and cars from DATA 
to be treated as categorical variables by aov?
More generally, which is the easiest way to force aov  to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
 DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
 summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since cars and driver are 
treated as numerical variables by aov.

BRUTE FORCE SOLUTION

Manually transforming cars and driver into categorical variables, I obtain 
the correct result:
 DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
 summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Which is the easiest (most elegant) way to 

Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?

2010-06-14 Thread Martin Maechler
Dear Deepayan,

this is in reply to a message almost 6 months ago :

 Deepayan Sarkar deepayan.sar...@gmail.com
 on Sun, 17 Jan 2010 01:39:21 -0800 writes:

 On Sat, Jan 16, 2010 at 11:56 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 Marius Hofert wrote:
 
 Dear ExpeRts,
 
 I have the scatter plot matrix as given below. I would like the 
different
 sub-plots in the scatter plot matrix to be colored differently. How 
do I
 get all points shown in the upper-left plot (on position (1,1) in the
 scatter plot matrix) to be plotted in blue, and the points shown in the 
plot
 to the right (on position (1,2) in the scatter plot matrix) to be 
plotted in
 red? More generally, how can I provide a matrix of colors to be used by
 splom() such that all data points in the corresponding sub-plot of the
 scatter plot matrix are shown in the specified color?
 
 Cheers,
 
 Marius
 
 Here is the code:
 
 library(lattice)
 
 entrymat=matrix(0,nrow=3,ncol=3)
 entrymat[1,2]=black
 entrymat[1,3]=blue
 entrymat[2,3]=red
 entrymat=t(entrymat)
 
 splom(~iris[,1:3],superpanel=function(z,...){
 
  
mymat.df=data.frame(rows=as.vector(row(entrymat)),cols=as.vector(col(entrymat)),entries=as.vector(entrymat))
        mymat.df=subset(mymat.df,colsrows)
                with(mymat.df,{
                        panel.text(x=rows,y=cols,labels=entries)
                })
 
  panel.pairs(z,upper.panel=panel.splom,lower.panel=function(...){},...)
        },varnames=c(1,2,3)
 )
 
 I think that you will have to modify panel.pairs to get what
 you want. But I must admit that I can't see why you would
 want such a plot. What's achieved by having different
 colours in different subpanels? And you would lose the
 ability to colour groups differently (or things would become
 really complicated and messy).

 Thanks, I was going to say the same thing, except that it would be
(1)
 conceptually simpler just to add the 'i' and 'j' values as arguments
 to the panel function (the 'pargs' variable). The colors could then
 be passed through as part of the ... arguments, and the relevant entry
 extracted in the panel function.
(2)
 The other option is to keep a global counter and increment it inside
 the panel function, and choosing colors based on that counter and
 knowledge of the order in which panels are drawn. Not very elegant,
 but the least intrusive solution I can think of.

 -Deepayan

Against the R-forge version of lattice,
the following very small patch to  panel.pairs
would allow users to use '(1)' i.e., provide  panel functions
with (i,j) arguments directly.
I'm pretty sure that the change could not easily break existing
splom() usage.

--- R/splom.R   (revision 619)
+++ R/splom.R   (working copy)
@@ -291,7 +291,8 @@
y = z[subscripts, i]),
 ##panel.number = panel.number,
 ##packet.number = packet.number),
-  list(...))
+  list(...),
+  list(i = i, j = j))
 else
 c(list(x = z[subscripts, j],
y = z[subscripts, i],
@@ -299,7 +300,8 @@
subscripts = subscripts),
 ##panel.number = panel.number,
 ##packet.number = packet.number),
-  list(...))
+  list(...),
+  list(i = i, j = j))
 
 if (!(... %in% names(formals(panel
 pargs - pargs[intersect(names(pargs), 
names(formals(panel)))]


With the above change, 
a user could use a panel function with (i,j) arguments, 
and e.g. say

Cmat - outer(1:6,1:6, 
  function(i,j) rainbow(11, start=.12, end=.5)[i+j-1])

splom(~diag(6), ## for testing: superpanel=mypanel.pairs,
  panel=function(x,y,i,j,...){
   panel.fill(Cmat[i,j]); panel.splom(x,y,...)
   panel.text(.5,.5, paste((,i,,,j,),sep=))
  })


I think that would allow quite a bit more flexibility without
the need to explicitly hack  panel.pairs
(and having to maintain such a hack against the ever-enhancing
lattice).

Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Marc Schwartz
On Jun 13, 2010, at 10:20 PM, array chip wrote:

 Hi, this is not R technical question per se. I know there are many excellent 
 statisticians in this list, so here my questions: I have dataset with ~1800 
 observations and 50 independent variables, so there are about 35 samples per 
 variable. Is it wise to build a stable multiple logistic model with 50 
 independent variables? Any problem with this approach? Thanks
 
 John


The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25. 

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

  
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

  
http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?

2010-06-14 Thread Ivan Calandra
Hi,

See ?factor
e.g.: DATA$driver - factor(DATA$driver)
See also the level= argument if you want to change the order of your levels.

HTH,
Ivan

Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit :
 Hi R help,

 Hi R help,

 Which is the easiest (most elegant) way to force aov to treat numerical 
 variables as categorical ?

 Sincerely, Andrea Bernasconi DG

 PROBLEM EXAMPLE

 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.

 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

 The file tab0408.dat contains following DATA:

 DATA
  
 driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16

 Now

 summary( aov(MODEL, data=DATA) )
  
  Df Sum Sq Mean Sq F value Pr(F)
 cars 1   12.8  12.800  0.8889 0.3680
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634
 Residuals   10  144.0  14.400
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.

 BRUTE FORCE SOLUTION

 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:

 DATA_AB
  
 driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16

 summary( aov(MODEL, data=DATA_AB) )
  
  Df Sum Sq Mean Sq F value   Pr(F)
 cars 3 24   8.000 1.5 0.307174
 driver   3216  72.00013.5 0.004466 **
 additive 3 40  13.333 2.5 0.156490
 Residuals6 32   5.333
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 QUESTION

 Which is the easiest (most elegant) way to force driver and cars from 
 DATA to be treated as categorical variables by aov?
 More generally, which is the easiest way to force aov  to treat numerical 
 variables as categorical ?

 Sincerely, Andrea Bernasconi DG

 PROBLEM EXAMPLE

 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.

 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

 The file tab0408.dat contains following DATA:

 DATA
  
 driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16

 Now

 summary( aov(MODEL, data=DATA) )
  
  Df Sum Sq Mean Sq F value Pr(F)
 cars 1   12.8  12.800  0.8889 0.3680
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634
 Residuals   10  144.0  14.400
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.

 BRUTE FORCE SOLUTION

 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:

 DATA_AB
  
 driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16

 summary( aov(MODEL, data=DATA_AB) )
  
  Df Sum 

Re: [R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG
I think I found the solution !

 cc-factor(cars)
 dd-factor(driver)
 MODEL-y~cc+dd+additive
 summary(aov(MODEL,data=DATA))

On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote:

 Hi R help,
 
 Hi R help,
 
 Which is the easiest (most elegant) way to force aov to treat numerical 
 variables as categorical ?
 
 Sincerely, Andrea Bernasconi DG
 
 PROBLEM EXAMPLE
 
 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.
 
 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
 
 The file tab0408.dat contains following DATA:
  DATA
driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16
 
 Now
  summary( aov(MODEL, data=DATA) )
 Df Sum Sq Mean Sq F value Pr(F)  
 cars 1   12.8  12.800  0.8889 0.3680  
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634  
 Residuals   10  144.0  14.400 
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.
 
 BRUTE FORCE SOLUTION
 
 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:
  DATA_AB
driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16
  summary( aov(MODEL, data=DATA_AB) )
 Df Sum Sq Mean Sq F value   Pr(F)   
 cars 3 24   8.000 1.5 0.307174   
 driver   3216  72.00013.5 0.004466 **
 additive 3 40  13.333 2.5 0.156490   
 Residuals6 32   5.333
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
 QUESTION
 
 Which is the easiest (most elegant) way to force driver and cars from 
 DATA to be treated as categorical variables by aov?
 More generally, which is the easiest way to force aov  to treat numerical 
 variables as categorical ?
 
 Sincerely, Andrea Bernasconi DG
 
 PROBLEM EXAMPLE
 
 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.
 
 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
 
 The file tab0408.dat contains following DATA:
  DATA
driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16
 
 Now
  summary( aov(MODEL, data=DATA) )
 Df Sum Sq Mean Sq F value Pr(F)  
 cars 1   12.8  12.800  0.8889 0.3680  
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634  
 Residuals   10  144.0  14.400 
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.
 
 BRUTE FORCE SOLUTION
 
 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:
  DATA_AB
driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16
  summary( aov(MODEL, data=DATA_AB) )
 Df Sum Sq Mean Sq F 

[R] merging data frames

2010-06-14 Thread Assa Yeroslaviz
Hi,

is it possible to merge two data frames while preserving the row names of
the bigger data frame?

I have two data frames which  i would like to combine. While doing so I
always loose the row names. When I try to append this, I get the error
message, that I have non-unique names. This although I used unique command
on the data frame where the double inputs supposedly are

thanks for the help

Assa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Uwe Ligges



On 14.06.2010 14:11, Katya Mauff wrote:

Hi all

Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)

I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?



Yes, just type ?topic and your web brwoser opens and displays the help 
file. Note that the address beginning http://127.0.0.1; is on your 
local machine, hence you can stay offline.


Uwe Ligges



Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list matching

2010-06-14 Thread jim holtman
One thing you might do is to transform the data into a format that is
easier to combine; I like using 'merge':

 mynames=cbind(c('a','b'),c(11,22))
 lst=list(a=c(1,2), b=5)
 mynames
 [,1] [,2]
[1,] a  11
[2,] b  22
 lst
$a
[1] 1 2

$b
[1] 5

 mynames.df - as.data.frame(mynames)
 mynames.df
  V1 V2
1  a 11
2  b 22
 lst.s - stack(lst)
 lst.s
  values ind
1  1   a
2  2   a
3  5   b
 merge(mynames.df, lst.s, by.x=V1, by.y=ind)
  V1 V2 values
1  a 11  1
2  a 11  2
3  b 22  5



On Mon, Jun 14, 2010 at 8:06 AM, Yuan Jian jayuan2...@yahoo.com wrote:
 Hello,

 I could not find a clear solution for the follow question. please allow me 
 to ask. thanks

 mynames=cbind(c('a','b'),c(11,22))
 lst=list(a=c(1,2), b=5)

 now I try to combine mynames and lst:
 a  1  11
 a  2  11
 b  5  22

 thanks
 jian




        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging data frames

2010-06-14 Thread jim holtman
Put the rownames as another column in your dataframe so that it
remains with the data.  After merging, you can then use it as the
rownames

On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 Hi,

 is it possible to merge two data frames while preserving the row names of
 the bigger data frame?

 I have two data frames which  i would like to combine. While doing so I
 always loose the row names. When I try to append this, I get the error
 message, that I have non-unique names. This although I used unique command
 on the data frame where the double inputs supposedly are

 thanks for the help

 Assa

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Prof Brian Ripley

On Mon, 14 Jun 2010, Katya Mauff wrote:


Hi all

Apologies if this is a trivial question- I have searched the lists 
and the online help files etc but have not managed to find anything. 
I recently downloaded the latest version of R, which has the help 
type set to htmlhelp as default (according to 
http://127.0.0.1:18380/library/utils/html/help.html)


Not so for the latest R (2.11.1) compiled with the default settings. 
The help(help) page in 2.11.1 actually says


 The ‘factory-fresh’ default is text help except from the Mac OS
 GUI, which uses HTML help displayed in its own browser window.

However, the Windows installer allows you to set the default type of 
help, and its default is HTML help in R = 2.10.0.


I would very much like to be able to access the help files when I am 
offline by typing ?topic etc as I used to with the previous R 
version. Any suggestions?


You should be able to use HTML help offline: it is running locally on 
your machine (127.0.0.1 is the loopback interface).  (It is 
conceivable that your unstated OS disables the loopback interface when 
'offline', but we have not come across an instance of this.)


But ?help tells you how select other forms of help by default via

  help_type = getOption(help_type))

and ?'?' says

 This is a shortcut to ‘help’ and uses its default type of help.




Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Joris Meys
Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out the curse of
dimensionality(Hughes effect) which starts to play rather quickly.

The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Jun 13, 2010, at 10:20 PM, array chip wrote:

 Hi, this is not R technical question per se. I know there are many excellent 
 statisticians in this list, so here my questions: I have dataset with ~1800 
 observations and 50 independent variables, so there are about 35 samples per 
 variable. Is it wise to build a stable multiple logistic model with 50 
 independent variables? Any problem with this approach? Thanks

 John


 The general rule of thumb is to have 10-20 'events' per covariate degree of 
 freedom. Frank has suggested that in some cases that number should be as high 
 as 25.

 The number of events is the smaller of the two possible outcomes for your 
 binary dependent variable.

 Covariate degrees of freedom refers to the number of columns in the model 
 matrix. Continuous variables are 1, binary factors are 1, K-level factors are 
 K - 1.

 So if out of your 1800 records, you have at least 500 to 1000 events, 
 depending upon how many of your 50 variables are K-level factors and whether 
 or not you need to consider interactions, you may be OK. Better if towards 
 the high end of that range, especially if the model is for prediction versus 
 explanation.

 Two excellent references would be Frank's book:

  http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

 and Steyerberg's book:

  http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

 to assist in providing guidance for model building/validation techniques.

 HTH,

 Marc Schwartz

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Katya Mauff
Hi-I have tried that offline, -my browser opens and says: Offline mode
Firefox is currently in offline mode and can't browse the Web.
Uncheck Work Offline in the File menu, then try again
 
I can access ?help that way, or pages I've been to before having gone offline, 
but nothing new.

 Uwe Ligges lig...@statistik.tu-dortmund.de 2010/06/14 03:29 PM 


On 14.06.2010 14:11, Katya Mauff wrote:
 Hi all

 Apologies if this is a trivial question- I have searched the lists and the 
 online help files etc but have not managed to find anything. I recently 
 downloaded the latest version of R, which has the help type set to htmlhelp 
 as default (according to http://127.0.0.1:18380/library/utils/html/help.html)

 I would very much like to be able to access the help files when I am offline 
 by typing ?topic etc as I used to with the previous R version. Any 
 suggestions?


Yes, just type ?topic and your web brwoser opens and displays the help 
file. Note that the address beginning http://127.0.0.1; is on your 
local machine, hence you can stay offline.

Uwe Ligges


 Thanks
 Katya




 ###
 UNIVERSITY OF CAPE TOWN

 This e-mail is subject to the UCT ICT policies and e-mail disclaimer 
 published on our website at 
 http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 
 21 650 4500. This e-mail is intended only for the person(s) to whom it is 
 addressed. If the e-mail has reached you in error, please notify the author. 
 If you are not the intended recipient of the e-mail you may not use, 
 disclose, copy, redirect or print the content. If this e-mail is not related 
 to the business of UCT it is sent by the sender in the sender's individual 
 capacity.

 ###


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Html help

2010-06-14 Thread Joris Meys
If the IP number is something like 127.0.0.1:x then you are on
your local computer.
Cheers
Joris

On Mon, Jun 14, 2010 at 1:33 PM, Murray Jorgensen m...@waikato.ac.nz wrote:
 I have just installed R 2.11.1 on my XP laptop.

 I like html help for browsing but text help for on-the-fly look-ups. I was a
 bit surprised when I was asked to choose between them during the
 installation. I chose text, thinking I could fix the html help later, which
 is what I am trying to do now.

 Now when I ask for html help my browser goes to

 'http://-ip-number-/doc/html/index.html'

 instead of where I want on my computer:

 C:\apps\R\R-2.11.1\doc\html\index.html

 Now I can go where I want manually but then the package list on


 C:\apps\R\R-2.11.1\doc\html\packages.html

 does not include all the packages that I have installed and linked. I don't
 want to read my html help from the web because sometimes I am off-line or on
 a slow connection.

 How do I go about getting a local set of html help files?

 Cheers,   Murray Jorgensen

 --
 Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
 Department of Statistics, University of Waikato, Hamilton, New Zealand
 Email: m...@waikato.ac.nz  majorgen...@ihug.co.nz        Fax 7 838 4155
 Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove last char of a text string

2010-06-14 Thread glaporta

Dear R experts,
is there a simple way to remove the last char of a text string?
substr() function use as parameter start end only... but my strings are of
different length...
01asap05a - 01asap05
02ee04b - 02ee04
Thank you all,
Gianandrea
-- 
View this message in context: 
http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove last char of a text string

2010-06-14 Thread Gustaf Rydevik
On Mon, Jun 14, 2010 at 3:47 PM, glaporta glapo...@freeweb.org wrote:

 Dear R experts,
 is there a simple way to remove the last char of a text string?
 substr() function use as parameter start end only... but my strings are of
 different length...
 01asap05a - 01asap05
 02ee04b - 02ee04
 Thank you all,
 Gianandrea
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
 Sent from the R help mailing list archive at Nabble.com.


It's not terribly elegant, but this works:

orig.text-c(01asap05a,02ee04b)
substr(orig.text,1,nchar(orig.text)-1)

Regards,
Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove last char of a text string

2010-06-14 Thread Sarah Goslee
Sure. You can use nchar() to find out how long the string is.

 teststring - 01asap05a
 substr(teststring, 1, nchar(teststring)-1)
[1] 01asap05

On Mon, Jun 14, 2010 at 9:47 AM, glaporta glapo...@freeweb.org wrote:

 Dear R experts,
 is there a simple way to remove the last char of a text string?
 substr() function use as parameter start end only... but my strings are of
 different length...
 01asap05a - 01asap05
 02ee04b - 02ee04
 Thank you all,
 Gianandrea


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Uwe Ligges



On 14.06.2010 15:39, Katya Mauff wrote:

Hi-I have tried that offline, -my browser opens and says: Offline mode
Firefox is currently in offline mode and can't browse the Web.
Uncheck Work Offline in the File menu, then try again



Well, go online with Firefox which means firefox can access the R help 
pages afterwards. You do not need to go really online with your machine.


Uwe



I can access ?help that way, or pages I've been to before having gone offline, 
but nothing new.


Uwe Liggeslig...@statistik.tu-dortmund.de  2010/06/14 03:29 PM



On 14.06.2010 14:11, Katya Mauff wrote:

Hi all

Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)

I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?



Yes, just type ?topic and your web brwoser opens and displays the help
file. Note that the address beginning http://127.0.0.1; is on your
local machine, hence you can stay offline.

Uwe Ligges



Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove last char of a text string

2010-06-14 Thread Henrique Dallazuanna
Try:

 gsub(.$, , c('01asap05a', '02ee04b'))

On Mon, Jun 14, 2010 at 10:47 AM, glaporta glapo...@freeweb.org wrote:


 Dear R experts,
 is there a simple way to remove the last char of a text string?
 substr() function use as parameter start end only... but my strings are of
 different length...
 01asap05a - 01asap05
 02ee04b - 02ee04
 Thank you all,
 Gianandrea
 --
 View this message in context:
 http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread Red Roo
Looking for a recommended package that handles prime number computations.

Tried the following unsuccessfully:
primeFactors() in the R.basic package failed to install.

primes() and primlist are broken in Schoolmath pkg on CRAN.
My analysis can be found here http://j.mp/9BNI9q
Not sure what the procedure is for getting things fixed, so I've cross-posted 
to r-dev as well.
--njg

TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/
Follow me on Twitter http://twitter.com/DrQz
PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cforest and Random Forest memory use

2010-06-14 Thread Max Kuhn
The first thing that I would recommend is to avoid the formula
interface to models. The internals that R uses to create matrices
form a formula+data set are not efficient. If you had a large number
of variables, I would have automatically pointed to that as a source
of issues. cforest and ctree only have formula interfaces though, so
you are stuck on that one. The randomForest package has both
interfaces, so that might be better.

Probably the issue is the depth of the trees. With that many
observations, you are likely to get extremely deep trees. You might
try limiting the depth of the tree and see if that has an effect on
performance.

We run into these issues with large compound libraries; in those cases
we do whatever we can to avoid ensembles of trees or kernel methods.
If you want those, you might need to write your own code that is
hyper-efficient and tuned to your particular data structure (as we
did).

On another note... are this many observations really needed? You have
40ish variables; I suspect that 1M points are pretty densely packed
into 40-dimensional space. Do you loose much by sampling the data set
or allocating a large portion to a test set? If you have thousands of
predictors, I could see the need for so many observations, but I'm
wondering if many of the samples are redundant.

Max

On Mon, Jun 14, 2010 at 3:45 AM, Matthew OKane mlok...@gmail.com wrote:
 Answers added below.
 Thanks again,
 Matt

 On 11 June 2010 14:28, Max Kuhn mxk...@gmail.com wrote:

 Also, you have not said:

  - your OS: Windows Server 2003 64-bit
  - your version of R: 2.11.1 64-bit
  - your version of party: 0.9-9995



  - your code:  test.cf -(formula=badflag~.,data =
 example,control=cforest_control

                                              (teststat = 'max', testtype =
 'Teststatistic', replace = FALSE, ntree = 500, savesplitstats = FALSE,mtry =
 10))

  - what Large data set means:  1 million observations, 40+ variables,
 around 200MB
  - what very large model objects means - anything which breaks

 So... how is anyone suppose to help you?

 Max





-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?

2010-06-14 Thread Deepayan Sarkar
On Mon, Jun 14, 2010 at 6:24 PM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Dear Deepayan,

 this is in reply to a message almost 6 months ago :

 Deepayan Sarkar deepayan.sar...@gmail.com

[...]

     Thanks, I was going to say the same thing, except that it would be
 (1)
     conceptually simpler just to add the 'i' and 'j' values as arguments
     to the panel function (the 'pargs' variable). The colors could then
     be passed through as part of the ... arguments, and the relevant entry
     extracted in the panel function.
 (2)
     The other option is to keep a global counter and increment it inside
     the panel function, and choosing colors based on that counter and
     knowledge of the order in which panels are drawn. Not very elegant,
     but the least intrusive solution I can think of.

     -Deepayan

 Against the R-forge version of lattice,
 the following very small patch to  panel.pairs
 would allow users to use '(1)' i.e., provide  panel functions
 with (i,j) arguments directly.
 I'm pretty sure that the change could not easily break existing
 splom() usage.

 --- R/splom.R   (revision 619)
 +++ R/splom.R   (working copy)
 @@ -291,7 +291,8 @@
                                    y = z[subscripts, i]),
  ##                                    panel.number = panel.number,
  ##                                    packet.number = packet.number),
 -                              list(...))
 +                              list(...),
 +                              list(i = i, j = j))
                         else
                             c(list(x = z[subscripts, j],
                                    y = z[subscripts, i],
 @@ -299,7 +300,8 @@
                                    subscripts = subscripts),
  ##                                    panel.number = panel.number,
  ##                                    packet.number = packet.number),
 -                              list(...))
 +                              list(...),
 +                              list(i = i, j = j))

                     if (!(... %in% names(formals(panel
                         pargs - pargs[intersect(names(pargs), 
 names(formals(panel)))]

Done in r-forge svn.

-Deepayan



 With the above change,
 a user could use a panel function with (i,j) arguments,
 and e.g. say

 Cmat - outer(1:6,1:6,
              function(i,j) rainbow(11, start=.12, end=.5)[i+j-1])

 splom(~diag(6), ## for testing: superpanel=mypanel.pairs,
      panel=function(x,y,i,j,...){
               panel.fill(Cmat[i,j]); panel.splom(x,y,...)
               panel.text(.5,.5, paste((,i,,,j,),sep=))
      })


 I think that would allow quite a bit more flexibility without
 the need to explicitly hack  panel.pairs
 (and having to maintain such a hack against the ever-enhancing
 lattice).

 Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Claudia Beleites

Dear all,

(this first part of the email I sent to John earlier today, but forgot to put it 
to the list as well)

Dear John,

 Hi, this is not R technical question per se. I know there are many excellent
 statisticians in this list, so here my questions: I have dataset with ~1800
 observations and 50 independent variables, so there are about 35 samples per
 variable. Is it wise to build a stable multiple logistic model with 50
 independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with far less than 1800 samples and 
far more variates (e.g. 75 patients / 256 spectral measurement channels). Though 
I have many measurements per sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really _independent_ test data and 
measure the stability according to what your stability needs are (e.g. stable 
parameters or stable predictions?).




(From here on reply to Joris)

 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
No doubt.

 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow substantially slower, if the model 
has appropriate restrictions. I remember the recommendation of at least 5 
samples per class and variate for linear classification models. I.e. not to get 
a good model, but to have a reasonable chance of getting a stable model.


 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model,
Am I wrong thinking that there may be a substantial difference between stability 
of predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples with physical-chemical 
explanation that yield the same prediction with different parameters (of course 
because of correlation).


 as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on the meaning of the variables. 
E.g. for the spectra a set of model parameters may be interpreted like spectra 
or difference spectra. Of course this has to do with the fact, that a parallel 
coordinate plot is the more natural view of spectra compared to a point in so 
many dimensions.


 and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know beforehand to be without information are 
already discarded.
The dimensionality is then further reduced in a data-driven way (e.g. by PCA or 
PLS). The model is built in the reduced space.


How much less samples are actually needed, considering the fact that the 
dimension reduction is a model estimated on the data?
...which of course also means that the honest validation embraces the 
data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain the 10^xy samples required for 
comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore rows missing arguments of a function when creating a function?

2010-06-14 Thread Joris Meys
Ah, I overlooked that possibility.

You can do following :

not - attr(fm$model,na.action)
if( ! is.null(not)){   # only drop the NA values if there are any left
out of the model
   cluster - cluster[-not]
   dat - dat[-not,]
}
with(dat,{


On Mon, Jun 14, 2010 at 4:30 PM, edmund jones edmund.j.jo...@gmail.com wrote:
 Thanks a lot!

 So, what you propose works perfectly (the cluster variable is indeed a
 vector), except in the case where I have no missing values in my regression.
 With the following function:

 cl2 - function(dat,fm, cluster){
 attach(dat, warn.conflicts = F)
 require(sandwich)
 require(lmtest)
 not - attr(fm$model,na.action)
 cluster - cluster[-not]
 with( dat[-not,] ,{
 M - length(unique(cluster))
 N - length(cluster)
 K - fm$rank
 dfc - (M/(M-1))*((N-1)/(N-K))
 uj - apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
 vcovCL - dfc*sandwich(fm, meat=crossprod(uj)/N)
 coeftest(fm, vcovCL)
 }
 )
 }

 If I have no missing values in the arguments of fm, get the message
 error in -not: invalid argument to unary operator

 In your example:


 x - rnorm(100)
 y - c(rnorm(90),NA,rnorm(9))
 test - lm(x~y)
 str(test)
 List of 13
  ... (tons of information)
  $ model        :'data.frame':  99 obs. of  2 variables:
  ... (more tons of information)
  ..- attr(*, na.action)=Class 'omit'  Named int 91
  .. .. ..- attr(*, names)= chr 91
  - attr(*, class)= chr lm

 Now we know that we can do :
 not -attr(test$model,na.action)
 y[-not]

 If I have, for example, y - rnorm(100), I also get the error:
 error in -not: invalid argument to unary operator


 In my database:

  female    income transport lunch dist reg_f
 4900  0 18.405990 0 0   75   750
 4901  0    NA 0    NA   75   753
 4902  1    NA 0 1   75   752
 4903  1    NA 0 1   75   751
 4904  1 69.678340 1 0   74   740
 4905  0 57.953230 1 0   73   730
 4906  1 85.835130 0 1   68   680
 4907  0 81.952980 0 0   75   750
 4908  1 46.837490 1 0   74   740
 4909  0    NA 1 0    5    52
 4910  1 65.041360 0 1   75   750
 4911  0 77.451870 1 0   75   750
 4912  0 96.148590 1 0   75   750
 4913  0 64.510410 0 0   74   740
 4914  0 69.391230 0 0   75   750
 4915  0  4.804243 0 1   65   650
 4916  0    NA 0 0   75   751
 4917  1    NA 0 0   75   751
 4918  1    NA 0 1   40   401
 4919  1 49.920750 0 1   76   760
 4920  0    NA 0 1   76   763
 4921  0 10.187910 0 0   77   770
 4922  0 14.839710 0 1   77   770
 4923  1 32.041000 0 0   77   770
 4924  0 85.639440 0 0   77   770
 4925  1 86.308410 0 0   68   680
 4926  0 79.223910 0 0    7    70
 4927  0 81.825800 0 0   78   780
 4928  0 31.931000 0 1   37   370
 4929  0 53.282310 0 1   41   410
 4930  1 31.312910 1 1   25   250
 4931  1 50.478870 0 1   25   250
 4932  0    NA 0 0   66   662
 4933  1 58.156940 0 1   31   310
 4934  0    NA 0 1    1    13
 4935  1    NA 0 1    1    12
 4936  1 59.149180 0 0    3    30
 4937  1  5.400807 0 1    5    50
 4938  1 76.828630 0 0    6    60
 4939  1 73.488300 0 1   63   630
 4940  0  6.529074 0 1    6    60
 4941  0    NA 0 0    6    61
 4942  1 70.128530 0 0    3    30
 4943  0    NA 0 1   53   531
 4944  1 75.715350 1 0    6    60
 4945  0  8.623850 0 1   24   240
 4946  1 79.062470 0 1   62   620
 4947  0 83.863370 1 1   11   110
 4948  0 58.904450 0 1   62   620
 4949  0 88.500290 0 0    9    90
 4950  0    NA 1 1    9    90
 4951 NA    NA 0 1   15   151

 It works perfectly if I do
 cl2(testdata, lm(income ~ transport + reg_f ), female)

 but not for cl2(testdata, lm(dist ~ transport + reg_f ), female)
 Or any other case about when the arguments of the lm function have no
 missing values.

 How can I tell R to do this only if there is a missing problem?

 Thanks a lot for your help! Working on your previous replies has been very
 helpful in understanding how R works.

 Cheers,
 Edmund.


 2010/6/13 Joris Meys jorism...@gmail.com

  Next to that, Off course you have to use the right  indices for
 cluster, but as I have no clue what the dist is, I just put there
 something. So if it is a matrix, naturally you'll get errors. If I
 make a vector cluster with the same length as the 

[R] using R to draw over a distribution.

2010-06-14 Thread SHANE MILLER, BLOOMBERG/ 731 LEXIN
Hi,

Suppose I analyze a log to create a histogram:

event E1 occurred N1 times
event E2 occurred N2 times
  ...
  ... for m total events
  ...
event Em occurred Nm times

The total number of occurrences is: T = SumNj
j=1..m

I want to give this histogram to R and ask it to produce T random events such 
that approximately N1 instances of E1 are drawn, N2 instances of E2 drawn, and 
so forth.

Regards,
Shane
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using R to draw over a distribution.

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 10:42 AM, SHANE MILLER, BLOOMBERG/ 731 LEXIN wrote:


Hi,

Suppose I analyze a log to create a histogram:

event E1 occurred N1 times
event E2 occurred N2 times
 ...
 ... for m total events
 ...
event Em occurred Nm times

The total number of occurrences is: T = SumNj
   j=1..m

I want to give this histogram to R and ask it to produce T random  
events such that approximately N1 instances of E1 are drawn, N2  
instances of E2 drawn, and so forth.


?table  # or perhaps use the fact that hist() will return a table of a  
particular type.

?sample  # from the events with prob = frequencies/T

(If you insist on constraining the total count to be the observed sum  
it will no longer be random with m degrees of freedom. And there are  
many postings on how to do this in the archives.)


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Robert A LaBudde
I think the real issue is why the fit is being 
done. If it is solely to interpolate and condense 
the dataset, the number of variables is not an important issue.


If the issue is developing a model that will 
capture causality, it is hard to believe that can 
be accomplished with 50+ variables. With this 
many, some kind of hunt would have to be done, 
and the resulting model would not be real stable. 
It would be better perhaps to first reduce the 
variable set by, say, principal components 
analysis, so that a reasonable sized set results.


If a stable and meaningful model is the goal, 
each term in the final model should be plausibly causal.


At 10:36 AM 6/14/2010, Claudia Beleites wrote:

Dear all,

(this first part of the email I sent to John 
earlier today, but forgot to put it to the list as well)

Dear John,

 Hi, this is not R technical question per se. 
I know there are many excellent

 statisticians in this list, so here my questions: I have dataset with ~1800
 observations and 50 independent variables, so 
there are about 35 samples per

 variable. Is it wise to build a stable multiple logistic model with 50
 independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with 
far less than 1800 samples and far more variates 
(e.g. 75 patients / 256 spectral measurement 
channels). Though I have many measurements per 
sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really 
_independent_ test data and measure the 
stability according to what your stability needs 
are (e.g. stable parameters or stable predictions?).




(From here on reply to Joris)

 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
No doubt.

 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow 
substantially slower, if the model has 
appropriate restrictions. I remember the 
recommendation of at least 5 samples per class 
and variate for linear classification models. 
I.e. not to get a good model, but to have a 
reasonable chance of getting a stable model.


 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model,
Am I wrong thinking that there may be a 
substantial difference between stability of 
predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples 
with physical-chemical explanation that yield 
the same prediction with different parameters 
(of course because of correlation).


 as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on 
the meaning of the variables. E.g. for the 
spectra a set of model parameters may be 
interpreted like spectra or difference spectra. 
Of course this has to do with the fact, that a 
parallel coordinate plot is the more natural 
view of spectra compared to a point in so many dimensions.


 and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know 
beforehand to be without information are already discarded.
The dimensionality is then further reduced in a 
data-driven way (e.g. by PCA or PLS). The model is built in the reduced space.


How much less samples are actually needed, 
considering the fact that the dimension 
reduction is a model estimated on the data?
...which of course also means that the honest 
validation embraces the data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain 
the 10^xy samples required for comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università 

Re: [R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread Gavin Simpson
On Mon, 2010-06-14 at 07:16 -0700, Red Roo wrote:
 Looking for a recommended package that handles prime number computations.
 
 Tried the following unsuccessfully:
 primeFactors() in the R.basic package failed to install.
 
 primes() and primlist are broken in Schoolmath pkg on CRAN.
 My analysis can be found here http://j.mp/9BNI9q
 Not sure what the procedure is for getting things fixed, so I've
 cross-posted to r-dev as well.

Neither place is correct. This has *nothing* to do with R. Address bug
reports directly to the package maintainer, details of which can be
found here:

http://cran.r-project.org/web/packages/schoolmath/index.html

which is what is requested in the posting guide...

HTH

G

 --njg
 
 TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/
 Follow me on Twitter http://twitter.com/DrQz
 PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Marc Schwartz
Joris,

There are two separate issues here:

1. Can you consider an LR model with 50 covariates?

2. Should you have 50 covariates in your LR model?


The answer to 1 is certainly yes, given what I noted below as a general working 
framework. I have personally been involved with the development and validation 
of LR models with ~35 covariates, albeit with notably larger datasets than 
discussed below, because the models are used for prediction. In fact, the 
current incarnations of those same models, now 15 years later, appear to have 
40 covariates and are quite stable. The interpretation of the models by both 
statisticians and clinicians is relatively straightforward.

The answer to 2 gets into the subject matter that you raise, which is to 
consider other factors beyond the initial rules of thumb for minimum sample 
size. These get into reasonable data reduction methods, the consideration of 
collinearity, subject matter expertise, sparse data, etc.

The issues raised in number 2 are discussed in the two references that I noted.

Two additional references that might be helpful here on the first point are:

P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein. A 
simulation study of the number of events per variable in logistic regression 
analysis. J Clin Epi, 49:1373–1379, 1996. 

E. Vittinghoff and C. E. McCulloch. Relaxing the rule of ten events per 
variable in logistic and Cox regression. Am J Epi, 165:710–718, 2006.


Regards,

Marc

On Jun 14, 2010, at 8:38 AM, Joris Meys wrote:

 Hi,
 
 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
 
 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.
 
 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model, as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible, and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.
 
 Cheers
 Joris
 
 On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Jun 13, 2010, at 10:20 PM, array chip wrote:
 
 Hi, this is not R technical question per se. I know there are many 
 excellent statisticians in this list, so here my questions: I have dataset 
 with ~1800 observations and 50 independent variables, so there are about 35 
 samples per variable. Is it wise to build a stable multiple logistic model 
 with 50 independent variables? Any problem with this approach? Thanks
 
 John
 
 
 The general rule of thumb is to have 10-20 'events' per covariate degree of 
 freedom. Frank has suggested that in some cases that number should be as 
 high as 25.
 
 The number of events is the smaller of the two possible outcomes for your 
 binary dependent variable.
 
 Covariate degrees of freedom refers to the number of columns in the model 
 matrix. Continuous variables are 1, binary factors are 1, K-level factors 
 are K - 1.
 
 So if out of your 1800 records, you have at least 500 to 1000 events, 
 depending upon how many of your 50 variables are K-level factors and whether 
 or not you need to consider interactions, you may be OK. Better if towards 
 the high end of that range, especially if the model is for prediction versus 
 explanation.
 
 Two excellent references would be Frank's book:
 
  
 http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/
 
 and Steyerberg's book:
 
  
 http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/
 
 to assist in providing guidance for model building/validation techniques.
 
 HTH,
 
 Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do faster ANOVAS

2010-06-14 Thread melissa
Thank you, with the matrix for the responses (here my 101 timepoints), it takes 
less than 30 minutes for 1000 pemutations, whereas before it takes 2h30!

Best regards,

Mélissa





 Message du 10/06/10 18:52
 De : Douglas Bates 
 A : melissa 
 Copie à : r-help@r-project.org
 Objet : Re: [R] do faster ANOVAS
 
 
 The lm and aov functions can take a matrix response allowing you to
 fit all of the responses for a single attribute simultaneously.
 
 
 On Thu, Jun 10, 2010 at 8:47 AM, melissa wrote:
  Dear all R users,
  I want to realize 800 000 ANOVAS and to store Sum of Squares of the 
  effects. Here is an extract of my table data
  Product attribute subject rep t1 t2 t3 … t101
  P1 A1 S1 R1 1 0 0 … 1
  I want to realize 1 ANOVA per timepoint and per attribute, there are 101 
  timepoints and 8 attributes so I want to realize 808 ANOVAS. This will be 
  an ANOVA with two factors :
  Here is one example:
  Aov(t1~Subject*Product,data[data$attribute==”A1”,])
  I want to store for each ANOVA SSprod,SSsujet,SSerreur,SSinter and SStotal.
  In fact I want the result in several matrices:
  Ssprod matrice:
  T1 t2 t3 t4 … t101
  A1 ssprod(A1,T1)
  A2
  A3
  …
  A8
  So I would like a matrice like that for ssprod, ssujet,sserreur,ssinter and 
  sstotal.
  And this is for one permutation, and I want to do 1000 permutations
  Here is my code:
  SSmatrixglobal-function(k){
 
  daten.temp-data
  daten.temp$product=permutations[[k]]
  listmat-apply(daten.temp[,5:105],2,function(x,y){
  tab2-as.data.frame(cbind(x,y))
  tab.class-by(tab2[,1:3],tab2[,4],function(x){
  f - formula(paste(names(x)[1],~,names(x)[2],*,names(x)[3],sep=))
  anovas - aov(f, data=x)
  anovas$call$formula -f
  s1 - summary(anovas)
  qa - s1[[1]][,2]
  return(qa)
  })
  return(tab.class)
  },y=daten.temp[,1:3]
  )
  ar - 
  array(unlist(listmat),dim=c(length(listmat[[1]][[1]]),length(listmat[[1]]),length(listmat)))
  l=lapply(1:4,function(i) ar[i,,])
  sssujet=l[[1]]
  ssprod=l[[2]]
  ssinter=l[[3]]
  sserreur=l[[4]]
  ss=rbind(sssujet,ssprod,ssinter,sserreur,sstotal)
  ss=as.data.frame(ss)
  sqlSave(channel,ss,SS1000,append=T)
  rm(ss,numperm,daten.temp)
  }
 
  system.time(por - lapply(c(1:1000), SSmatrixglobal))
 
  But it takes time about 90seconds for a permutation so *1000, how can I do 
  in order to do faster ANOVAS?
 
  Many thanks
  Best regards
  Mélissa
 
  PS: I think that I can gain a lot of time in the aov function but I don't 
  know how to do
  [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xtable with Sweave

2010-06-14 Thread Silvano

Hi,

I'm using Sweave to prepare a descriptive report.
Are at least 20 tables built with xtable command of kind:

echo=F, results=hide=
q5 = factor(Q5, label=c(Não, Sim))
(q5.tab = cbind(table(q5)))
@

echo=F, results=tex=
xtable(q5.tab, align=l|c, caption.placement = top, 
table.placement='H')

@

I'm getting the following message:

Too many unprocessed floats

in Latex file.

How to avoid these messages appearing?

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lme command

2010-06-14 Thread Enrico Colosimo
Hi,

I am doing a longitudinal data set fit using lme.
I used two forms of the lme command and I am
getting two different outputs.

FIRST
out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random=(list(ident=~Idade+Idade2)))

SECOND
out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random= ~Idade+Idade2|ident,data=dados)

I got weird results from the first one and could not understand the
reason of it. All the results are
exactly the same but  the intercetp, and the two main terms sexo
(gender) and status (treatment).
That differences made a lot of difference in the final results.

Anybody can tell me what is the differences  between them?
Thanks.

Enrico.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in Linux: problem with special characters

2010-06-14 Thread daniel fernandes

Hi,

 

First of all, thank you for you reply. It was very helpfull.

 

I have another problem: I have changed the locale to pt_pt.iso885...@euro. Now 
the problem that I reported earlier doesn’t occur .

 

print(dúvida)
[1] dúvida

 

My system information now is the following:

 

Sys.getlocale()
[1] 
lc_ctype=pt_pt.iso885...@euro;LC_NUMERIC=C;lc_time=pt_pt.iso885...@euro;lc_collate=pt_pt.iso885...@euro;
LC_MONETARY=C;lc_messages=pt_pt.iso885...@euro;lc_paper=pt_pt.iso885...@euro;LC_NAME=C;LC_ADDRESS=C;
LC_TELEPHONE=C;lc_measurement=pt_pt.iso885...@euro;LC_IDENTIFICATION=C
 

 

[daniel.fernan...@pt-lnx13 ~]$ locale


 

lang=pt_pt.iso885...@euro
lc_ctype=pt_pt.iso885...@euro
lc_numeric=pt_pt.iso885...@euro
lc_time=pt_pt.iso885...@euro
lc_collate=pt_pt.iso885...@euro
lc_monetary=pt_pt.iso885...@euro
lc_messages=pt_pt.iso885...@euro
lc_paper=pt_pt.iso885...@euro
lc_name=pt_pt.iso885...@euro
lc_address=pt_pt.iso885...@euro
lc_telephone=pt_pt.iso885...@euro
lc_measurement=pt_pt.iso885...@euro
lc_identification=pt_pt.iso885...@euro
LC_ALL=
 
However, if I utilise the tcltk package and try to build a frame with the label 
“dúvida” the string appears incorrect. My guess is that this problem is also 
related with the locales. Does Tcl/tk utilise a different locale?

 

 

Thanks in advance,

 

Daniel

 


 
 Date: Fri, 11 Jun 2010 17:54:34 -0400
 From: murdoch.dun...@gmail.com
 To: danielpas...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] R in Linux: problem with special characters
 
 daniel fernandes wrote:
 
 
 
  Hi,
  
  I’m working with the 64 bit version of R 2.11.0 for Linux. My session info 
  is:
  
  R version 2.11.0 (2010-04-22)
  x86_64-redhat-linux-gnu
  
  locale:
  [1] C
  
  attached base packages:
  [1] stats graphics grDevices utils datasets methods base 
  
  
  When I try to print words with special characters the result is that the 
  expression printed has some kind of code substituting the special 
  character. For example, if I run print(“dúvida”) the result is:
  
   print(dúvida)
  [1] d\372vida
  
  This as problem has something to do with the locale settings? If I run the 
  locale command in the Linux server, I get:
  
 
 Yes, it's your locale settings. The C locale doesn't support the ú 
 character in your string, and displays it in octal.
 
 Duncan Murdoch
  
  [daniel.fernan...@pt-lnx13 ~]$ locale
  LANG=pt_PT.UTF-8
  LC_CTYPE=C
  LC_NUMERIC=C
  LC_TIME=C
  LC_COLLATE=C
  LC_MONETARY=C
  LC_MESSAGES=C
  LC_PAPER=C
  LC_NAME=C
  LC_ADDRESS=C
  LC_TELEPHONE=C
  LC_MEASUREMENT=C
  LC_IDENTIFICATION=C
  LC_ALL=C
  
  Thanks in advance for your help,
  
  Daniel
  
 
 
  TRANSFORME SUAS FOTOS EM EMOTICONS PARA O MESSENGER. CLIQUE AQUI E VEJA 
  COMO. 
  _
  VEJA SEUS EMAILS ONDE QUER QUE VOCÊ ESTEJA, ACESSE O HOTMAIL PELO SEU 
  CELULAR AGORA.
 
  =Live_Hotmailutm_medium=Taglineutm_content=VEJASEUSEM84utm_campaign=MobileServices
  [[alternative HTML version deleted]]
 
  
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
  
_
ACESSE O MESSENGER DO SEU CELULAR AGORA MESMO. CLIQUE E VEJA AQUI UM PASSO A 
PASSO.

rce=Live_Hotmailutm_medium=Taglineutm_content=ACESSEOMES83utm_campaign=MobileServices
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MICE Package and LMER

2010-06-14 Thread KDT

Hi R users,

I am estimating a multilevel model using lmer.  My dataset has missing
values and I am using MICE package to make Multiple Imputations. 
Everything works good until i reach the POOLING stage using the pool()
function. I am able to get a summary of the pooled fixed effects but not the
random effects. No errors or warnings are given to me.
I checked the help file in R and the developers of MICE noted that The
function pools also estimates obtained with lme() and lmer(), BUT only the
fixed part of the model.

Does anyone have any ideas on how I can get a summary of pooled random
effects? 

Below is my code
imp-mice(mydata,m=3, imputationMethod =c(,,,logreg),maxit=2, pri=F) 
model - with(data=imp, lmer(miss~ sex + age + (1|id) + (1|sch),
family=binomial(link=logit)))
result-pool(model)
summary(result)

Thanks
Trevor
-- 
View this message in context: 
http://r.789695.n4.nabble.com/MICE-Package-and-LMER-tp2254504p2254504.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recursive function

2010-06-14 Thread n.via...@libero.it

Dear list,
I have the following problem, what i'm trying to do is to built a function 
which does the following calculationg in a recursive way:


I have a data frame more or less like this:

variableyear DELTA

EC01 2006/
EC01 2007   10
EC01 20085
EC01 20099


And then I have at time 2009  a variable called R_EC01(2009)=5
What I have to do is to construct the R_EC01 time series by starting from the 
2009 value:
R_EC01(2008)=R_EC01(2009)-DELTA(2009)
R_EC01(2007)=R_EC01(2008)-DELTA(2008)
R_EC01(2006)=R_EC01(2007)-DELTA(2007)


In terms of number, the results that i should get are:
R_EC01(2008)=5-9=-4

R_EC01(2007)=-4-5=-9
R_EC01(2006)=-9-10=-19
so my data frame should looks like this
SERIESYEAR value

R_EC01   2006  -19

R_EC012007   -9

R_EC012008   -4

R_EC01 2009   5
Anyone Knows hot to do it??
My dataframe is not set as a time series...


Thanks a lot!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme command

2010-06-14 Thread Joshua Wiley
Hello Enrico,

One thing I notice between your two calls is that in the second you
specify data=dados, but you do not in the first.  When I try to do
something similar to your formulae using one of my longitudinal
datasets, I get the same results whether or not I put the formula for
random in a list.  Perhaps you could provide some sample data that
shows what is happening?

One other comment, rather than specifically adding each term and
interaction, you can use this shorthand:

Altura~(Idade+Idade2)*(sexo+status)

Best regards,

Josh



On Mon, Jun 14, 2010 at 6:37 AM, Enrico Colosimo enrico...@gmail.com wrote:
 Hi,

 I am doing a longitudinal data set fit using lme.
 I used two forms of the lme command and I am
 getting two different outputs.

 FIRST
 out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
 random=(list(ident=~Idade+Idade2)))

 SECOND
 out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
 random= ~Idade+Idade2|ident,data=dados)

 I got weird results from the first one and could not understand the
 reason of it. All the results are
 exactly the same but  the intercetp, and the two main terms sexo
 (gender) and status (treatment).
 That differences made a lot of difference in the final results.

 Anybody can tell me what is the differences  between them?
 Thanks.

 Enrico.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student
Health Psychology
University of California, Los Angeles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prime Numbers Pkgs

2010-06-14 Thread Karl-Dieter Crisman
 Looking for a recommended package that handles prime number computations.

I'm not sure whether this would be helpful to you, but Sage
(http://www.sagemath.org) has excellent number theory support and
several ways to interface with R (which is included in the
distribution of Sage).  I use it myself to access several R packages I
need to do analysis on data I produce in Sage/Python, and then send
the data back to Sage for further processing.

Of course, this is not an R package, so depending on how much you need
such things from within R itself it may or may not help you out.

HTH,
Karl-Dieter Crisman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unqiue problem

2010-06-14 Thread jim holtman
Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 Hello everybody,

 I have a a matrix of 2 columns and over 27k rows.
 some of the rows are double , so I tried to remove them with the command
 unique():

 Workbook5 - read.delim(file =  Workbook5.txt)
 dim(Workbook5)
 [1] 27748     2
 Workbook5 - unique(Workbook5)
 dim(Workbook5)
 [1] 20101     2

 it removed a lot of line, but unfortunately not all of them. I wanted to add
 the row names to the matrix and got this error message:
 rownames(Workbook5) - Workbook5[,1]
 Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,  :
  duplicate 'row.names' are not allowed
 In addition: Warning message:
 non-unique values when setting 'row.names': ‘A_51_P102339’,
 ‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
 ‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
 ‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
 ‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
 ‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
 ‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
 ‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
 ‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
 ‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
 ‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
 ‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
 ‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
 ‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
 ‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
 ‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
 ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

 Is there a better way to discard the duplicataions in the text file (Excel
 file is the origin).

 R.version
               _
 platform       x86_64-apple-darwin9.8.0
 arch           x86_64
 os             darwin9.8.0
 system         x86_64, darwin9.8.0
 status         Patched
 major          2
 minor          11.1
 year           2010
 month          06
 day            03
 svn rev        52201
 language       R
 version.string R version 2.11.1 Patched (2010-06-03 r52201)

 THX

 Assa

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Charles C. Berry

On Mon, 14 Jun 2010, Joris Meys wrote:


Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out the curse of
dimensionality(Hughes effect) which starts to play rather quickly.



Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:


set.seed(54321)
dat - as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y - rbinom(1800,1,plogis(rowSums(dat)/7))
fit - glm(y~., dat, family=binomial)
1/7 # the true coef

[1] 0.1428571

sd(coef(fit)) # roughly, the common standard error

[1] 0.05605597

colMeans(coef(summary(fit))) # what glm() got

  Estimate Std. Errorz value   Pr(|z|)
0.14590836 0.05380666 2.71067328 0.06354820

# a trickier case:
set.seed(54321)
dat - as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y - rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
fit - glm(y~., dat, family=binomial)
colMeans(coef(summary(fit)))

   Estimate  Std. Error z valuePr(|z|)
0.982944012 0.119063631 8.222138491 0.008458002




Finer examination of the latter fit will show some values that differ too 
far from 1.0 to agree with the asymptotic std err.



sd(coef(fit)) # rather bigger than 0.119

[1] 0.1827462

range(coef(fit))

[1] -0.08128586  1.25797057

And near separability may be playing here:


cbind(

+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
 [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183




Recall that the observed information contains a factor of

plogis( predict(fit) )* plogis( -predict(fit))


hist(plogis( predict(fit) )* plogis( -predict(fit)))


So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on 
what the setup is.


I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with 
correlated covariates and/or one or more highly influential covariates) 
and that the OPs case is not one of them.



HTH,

Chuck


The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:

On Jun 13, 2010, at 10:20 PM, array chip wrote:


Hi, this is not R technical question per se. I know there are many excellent 
statisticians in this list, so here my questions: I have dataset with ~1800 
observations and 50 independent variables, so there are about 35 samples per 
variable. Is it wise to build a stable multiple logistic model with 50 
independent variables? Any problem with this approach? Thanks

John



The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25.

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

?http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

?http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

Re: [R] points marking

2010-06-14 Thread Greg Snow
I don't think that I would use a barplot as the base, but rather just set up 
the graph and add the lines where I wanted them.  I still don't understand what 
you want your graph to look like, or what question you are trying to answer 
with it (part may be a language barrier).  If you can give us a better example 
of what you are trying to accomplish, or a better description of what your data 
is like and what you are trying to get from the graph, we will have a better 
chance of being able to help you.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: khush  [mailto:bioinfo.kh...@gmail.com]
Sent: Saturday, June 12, 2010 5:38 AM
To: Greg Snow
Cc: r-help@r-project.org; Petr PIKAL
Subject: Re: [R] points marking

Hi,

Well Thanks for letting me know that pch is of no use with segments petr. I am 
using lend as it suits to me more as gregory suggested , but I am not getting 
imite???  think I try to fix it with some other method also, as I have to deal 
more with the symbols in this case, But I want to the know one thing from you 
guys that the way I am using the code is good enough to start, as I am not much 
familiar with this suff or its dirty way to handle such task. please let me 
know.

Thanks gregory and petr.

Thank you
Jeet


On Fri, Jun 11, 2010 at 9:07 PM, Greg Snow 
greg.s...@imail.orgmailto:greg.s...@imail.org wrote:
Those graphs look like chromosome maps, if so, you may want to look into the 
bioconductor project, they may have some prewritten functions to do this.  If 
not, the lend argument (see ?par) may be something to look at.  If you really 
want points and segments you will need to plot the points with the points 
function and the segments separately.  Segments can take vectors, so you don't 
need to separate things into multiple calls.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.orgmailto:greg.s...@imail.org
801.408.8111

From: khush  
[mailto:bioinfo.kh...@gmail.commailto:bioinfo.kh...@gmail.com]
Sent: Friday, June 11, 2010 12:00 AM
To: Greg Snow
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] points marking

Dear Gregory ,

Thnaks for your reply and help. I am explaining you my problems again, below  
is my script for the same .

Dom -c (195,568,559)

fkbp - barplot (Dom, col=black, xlab=, border = NA, space = 7, 
xlim=c(0,650), ylim =c(0, 87), las = 2, horiz = TRUE)

axis (1, at = seq(0,600,10), las =2)

1. ==Segments 1=

segments(164,7.8,192,7.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(45,15.8,138,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(160,15.8,255,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(277,15.8,378,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(51,23.8,145,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(167,23.8,262,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(284,23.8,381,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20)

2. ==Segments 2 ==
segments(399,15.8,432,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(448,15.8,475,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(486,15.8,515,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(401,23.8,434,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(450,23.8,475,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(486,23.8,517,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)

I solved one part of my query i.e to mark points from one positions to other is 
ok and I found that its working fine but I have another issue now, as I am 
using using two segments data 1 and 2 , although I want to draw different 
shapes for segmants 2 as I am giving pch=21, but I it seems to give a solid 
line for both. I want to draw different shapes for every chunk of segments i.e 
is the whole point.

I want to make script which can generate such figures, below is link to one of 
the tool.
http://www.expasy.ch/tools/mydomains/

Thank you

Jeet
On Thu, Jun 10, 2010 at 11:10 PM, Greg Snow 
greg.s...@imail.orgmailto:greg.s...@imail.org wrote:
Your question is not really clear, do either of these examples do what you want?

 with(anscombe, plot(x1, y2, ylim=range(y2,y3)) )
 with(anscombe, points(x1, y3, col='blue', pch=2) )
 with(anscombe, segments(x1, y2, x1, y3, col=ifelse( y2y3, 'green','red') ) )


 with(anscombe, plot(x1, y2, ylim=range(y2,y3), type='n') )
 with(anscombe[order(anscombe$x1),], polygon( c( x1,rev(x1) ), c(y2, rev(y3)), 
col='grey' ) )



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.orgmailto:greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-mailto:r-help-boun...@r-
 

[R] Revolutions Blog: May Roundup

2010-06-14 Thread David Smith
I write about R every weekday at the Revolutions blog:
 http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month
of particular interest to readers of r-help.

http://bit.ly/dn7DgR linked to 13 videos for learning R, from the
basics (What is R?) to more advanced topics.

http://bit.ly/cJUhiY noted the release of R 2.11.1.

http://bit.ly/d53tvn announced that Revolution Analytics makes its
software available free of charge to the academic community.

http://bit.ly/9xCQ83 noted how statistical inference was used to
accurately estimate German tank forces in WWII, and linked to an R
simulation to verify the calculation.

http://bit.ly/b5puKD announced a new website for the R community,
www.inside-R.org, sponsored by Revolution Analytics.

http://bit.ly/9r85bd linked to a video of economist JD Long explaining
why he uses R.

http://bit.ly/cySMgE linked to Jeromy Anglim's explanation of the
abbreviated names of 150 R functions.

http://bit.ly/cEceU8 announced a webinar I gave, Introduction to
Revolution R. The live event has passed now, but you can download
slides and watch a replay at this link.

http://bit.ly/brs2s2 recapped some recent news articles mentioning R
and Revolution, in Forbes, The Register, PC World and elsewhere.

http://bit.ly/9UDgOL linked to an analysis in R on predicting the
outcome of a series of baseball games.

http://bit.ly/bJYW9v linked to some materials from the CloudAsia
conference on parallel computing in R for life sciences.

http://bit.ly/dfb4PA provided a tip on keeping the console window
active in the R Productivity Environment GUI.

http://bit.ly/aeHO7B announced a webinar on integrating R-based graphs
and computations with business intelligence dashboards. (The live
event has passed, but you can download slides and a replay at this
link.)

http://bit.ly/dqr1hc linked to another method of mapping your Twitter
social network with R.

http://bit.ly/bVI6e9 linked to a video by JD Long on using the Amazon
Elastic Map-Reduce system to run large-scale map-reduce calculations
in the cloud with R.

http://bit.ly/auZ7N8 announced Revolution Analytic's development
roadmap for 2010.

There are new R user groups in Boston (http://bit.ly/cHedm0) and
Atlanta (http://bit.ly/aVo1cI). Also, http://bit.ly/a82GAf noted the
list of local R User Groups worldwide at MeetUp.com, and how you can
request a new group in your area.

Other non-R-related stories in the past month included Google's new
BigQuery and Prediction API tools (http://bit.ly/bfEeLm), the effects
of volcanic ash on a modern city (http://bit.ly/9qWfQf) and (on a
lighter note) the dating equation (http://bit.ly/9LR28N) and a neat,
practical optical illusion (http://bit.ly/dsmyov).

The R Community Calendar has also been updated at:
http://blog.revolutionanalytics.com/calendar.html

If you're looking for more articles about R, you can find summaries
from previous months at http://blog.revolutionanalytics.com/roundups/.
Join the REvolution mailing list at
http://revolutionanalytics.com/newsletter to be alerted to new
articles on a monthly basis.

As always, thanks for the comments and please keep sending suggestions
to me at da...@revolutionanalytics.com . Don't forget you can also
follow the blog using an RSS reader like Google Reader, or by
following me on Twitter (I'm @revodavid).

Cheers,
# David

--
David M Smith da...@revolutionanalytics.com
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Html help

2010-06-14 Thread Duncan Murdoch

Murray Jorgensen wrote:

I have just installed R 2.11.1 on my XP laptop.

I like html help for browsing but text help for on-the-fly look-ups. I 
was a bit surprised when I was asked to choose between them during the 
installation. I chose text, thinking I could fix the html help later, 
which is what I am trying to do now.


Now when I ask for html help my browser goes to

'http://-ip-number-/doc/html/index.html'

instead of where I want on my computer:

C:\apps\R\R-2.11.1\doc\html\index.html

Now I can go where I want manually but then the package list on


C:\apps\R\R-2.11.1\doc\html\packages.html

does not include all the packages that I have installed and linked. I 
don't want to read my html help from the web because sometimes I am 
off-line or on a slow connection.


How do I go about getting a local set of html help files?

  


Since 2.10.0, HTML help is generated on demand.  It doesn't go off your 
local computer, it works locally.  This saves a bit of space (the HTML 
is generated from the same source as the text is generated from), but 
the main point is that it allows help pages to contain dynamic content.  
For example, Romain Francois posted some demo code a while ago to allow 
the display of graphics generated by R within help pages.  
(Unfortunately it depended on a particular browser feature not supported 
by Internet Explorer, so I'm going to need to put together something 
less elegant, but that's life.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xtable with Sweave

2010-06-14 Thread Marc Schwartz
On Jun 14, 2010, at 8:09 AM, Silvano wrote:

 Hi,
 
 I'm using Sweave to prepare a descriptive report.
 Are at least 20 tables built with xtable command of kind:
 
 echo=F, results=hide=
 q5 = factor(Q5, label=c(Não, Sim))
 (q5.tab = cbind(table(q5)))
 @
 
 echo=F, results=tex=
 xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H')
 @
 
 I'm getting the following message:
 
 Too many unprocessed floats
 
 in Latex file.
 
 How to avoid these messages appearing?


Hi,

That is an error message from 'latex' indicating that you may have too many 
float tables without sufficient separation (eg. new pages, text in between, 
etc.) and/or conflicts in table placement.

You might have a look at the relevant TeX FAQ here:

  http://www.tex.ac.uk/cgi-bin/texfaq2html?label=tmupfl

Also, while I don't use xtable(), I do believe that you have to call it using 
the print method directly to specify non-default arguments:

  print(xtable(q5.tab, align = l|c), caption.placement = top, 
table.placement = 'H')

See the help pages for ?xtable and ?print.table, including the last examples in 
the former.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] script development for Unconditional Density and Probability estimation

2010-06-14 Thread Steve_Friedman

Hello,

I'd like to automate this script a bit more and cycle several
parameters(both the species and the metric).  For example where AnnualDepth
occurs, I need to process about 12 metrics so instead of writing this
entire script 12 times once for each metric  I'd like to be able to
automatically get another metric.

Any suggestion will be greatly appreciated.

Currently running Windows XP, R 2.11.1

###

Marsh - cbind(SoilVegHydro, vegcode)
AnnualDepth - Marsh[,'meanAnnualDepthAve']

cattail_0 - Marsh[,'cattail'] == '0'  # no   need to run for 8
species, automate if possible
 cattail_1 - Marsh[,'cattail'] == '1' # yes need to run for 8
species

spbase.rate.d1 - sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) )
annualDepth.density - density(AnnualDepth)# this line needs either
interactively defined or automatically cycle thru a number of metrics

cattail.d0.density - density(AnnualDepth[cattail_0])
cattail.d1.density - density(AnnualDepth[cattail_1])

approxfun (cattail.d0.density$x, cattail.d0.density$y) - cattail.d0.f
approxfun (cattail.d1.density$x, cattail.d1.density$y) - cattail.d1.f

p.d.given.AnnualDepth - function(AnnualDepth, spbase.rate.d1)
 {
p1 - cattail.d1.f(AnnualDepth) * spbase.rate.d1
p0 - cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1)
   p1/(cattail_0+cattail_1)
}

x - 1:1292
y - p.d.given.AnnualDepth(x, spbase.rate.d1)
plot (x, y, type='l', col='red', xlab='Mean Annual Depth',
main=c(Cattail), ylab='estimated\nProbability(cattail|AnnualDepth)')

plot (cattail.d0.density, col ='red', lty= 1, main = )
  lines(cattail.d1.density, col = 'blue', lty=1)
  lines(annualDepth.density , col='black', lty = 1)
  legend(2000, 0.0023, c(No Cattail, Cattail, Mean Annual Depth),
  col=c(red, blue, black),lty=c(1))




#

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count of unique factors within another factor

2010-06-14 Thread Birdnerd

Hi Dennis,

Thanks for this suggestion (which I got to run!), as this code makes
intuitive sense, whereas not all the other suggestions were that
straightforward. I'm relatively new to programming in R and am very
appreciative that you and others take time to help out where you can.

Sincerely,
Sarah



On Sun, Jun 13, 2010 at 8:47 PM, Dennis Murphy [via R] 
ml-node+2253845-1393472685-291...@n4.nabble.comml-node%2b2253845-1393472685-291...@n4.nabble.com
 wrote:

 Hi:

 Another possibility:

 as.data.frame(with(data[!duplicated(data), ], table(unit))
   unit Freq
 1  1233
 2  3454

 HTH,
 Dennis

 On Sun, Jun 13, 2010 at 9:07 AM, Birdnerd [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2253845i=0
 wrote:

 
  I have a data frame with two factors (sampling 'unit', 'species'). I want

  to
  calculate the number of unique 'species' per 'unit.' I can calculate the
  number of unique values for each variable separately, but can't get a
 count
  for each ‘unit’.
 
   data=read.csv(C:/Desktop/sr_sort_practice.csv)
   attach(data)
 
   data[1:10,]
unit species
  1   123ACMA
  2   123LIDE
  3   123LIDE
  4   123SESE
  5   123SESE
  6   123SESE
  7   345HEAR
  8   345LOHI
  9   345QUAG
  10  345TODI…..
 
   sr.unique- lapply (data, unique)
  $unit
  [1] 123 345 216
  $species
   [1] ACMA  LIDE  SESE  HEAR  LOHI  QUAG  TODI  UMCA  ARSP  LIDE
 
   sapply (sr.unique,length)
 unit species
   3  10
 
  Then, I get stuck here because this unique species count is not given for

  each ‘unit’.
  What I'd like to get is:
 
  unit species
  1233
  3454
  216--
 
  Thanks--
 
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.htmlhttp://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html?by-user=tby-user=t
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email] 
  http://user/SendEmail.jtp?type=nodenode=2253845i=1mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 [[alternative HTML version deleted]]


 __
 [hidden email] http://user/SendEmail.jtp?type=nodenode=2253845i=2mailing 
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  View message @
 http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253845.html
 To unsubscribe from Count of unique factors within another factor, click
 here (link removed) .





-- 
Sarah E. Haas
haaszool...@gmail.com

Center for Applied Geographic Information Science (CAGIS)
Department of Geography and Earth Sciences
University of North Carolina at Charlotte
9201 University City Blvd.
Charlotte, NC 28223, USA
http://www.gis.uncc.edu/

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2254591.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xtable with Sweave

2010-06-14 Thread James W. MacDonald

Hi Silvano,

Silvano wrote:

Hi,

I'm using Sweave to prepare a descriptive report.
Are at least 20 tables built with xtable command of kind:

echo=F, results=hide=
q5 = factor(Q5, label=c(Não, Sim))
(q5.tab = cbind(table(q5)))
@

echo=F, results=tex=
xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H')
@

I'm getting the following message:

Too many unprocessed floats

in Latex file.

How to avoid these messages appearing?


Not really an R-help question, and easily answered using our friend google.

Anyway, you need to cause the floats to be processed before you have 
'too many'. This is done by adding a \clearpage to the LaTeX portion of 
your document every so often.


Best,

Jim




--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] script development for Unconditional Density and Probability estimation

2010-06-14 Thread jim holtman
First start by putting it in a function so you can specify the
parameters you want to change.

On Mon, Jun 14, 2010 at 11:54 AM,  steve_fried...@nps.gov wrote:

 Hello,

 I'd like to automate this script a bit more and cycle several
 parameters(both the species and the metric).  For example where AnnualDepth
 occurs, I need to process about 12 metrics so instead of writing this
 entire script 12 times once for each metric  I'd like to be able to
 automatically get another metric.

 Any suggestion will be greatly appreciated.

 Currently running Windows XP, R 2.11.1

 ###

 Marsh - cbind(SoilVegHydro, vegcode)
 AnnualDepth - Marsh[,'meanAnnualDepthAve']

 cattail_0 - Marsh[,'cattail'] == '0'  # no           need to run for 8
 species, automate if possible
  cattail_1 - Marsh[,'cattail'] == '1' # yes         need to run for 8
 species

 spbase.rate.d1 - sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) )
 annualDepth.density - density(AnnualDepth)    # this line needs either
 interactively defined or automatically cycle thru a number of metrics

 cattail.d0.density - density(AnnualDepth[cattail_0])
 cattail.d1.density - density(AnnualDepth[cattail_1])

 approxfun (cattail.d0.density$x, cattail.d0.density$y) - cattail.d0.f
 approxfun (cattail.d1.density$x, cattail.d1.density$y) - cattail.d1.f

 p.d.given.AnnualDepth - function(AnnualDepth, spbase.rate.d1)
     {
        p1 - cattail.d1.f(AnnualDepth) * spbase.rate.d1
        p0 - cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1)
       p1/(cattail_0+cattail_1)
    }

 x - 1:1292
 y - p.d.given.AnnualDepth(x, spbase.rate.d1)
 plot (x, y, type='l', col='red', xlab='Mean Annual Depth',
 main=c(Cattail), ylab='estimated\nProbability(cattail|AnnualDepth)')

 plot (cattail.d0.density, col ='red', lty= 1, main = )
  lines(cattail.d1.density, col = 'blue', lty=1)
  lines(annualDepth.density , col='black', lty = 1)
  legend(2000, 0.0023, c(No Cattail, Cattail, Mean Annual Depth),
      col=c(red, blue, black),lty=c(1))




 #

 Steve Friedman Ph. D.
 Spatial Statistical Analyst
 Everglades and Dry Tortugas National Park
 950 N Krome Ave (3rd Floor)
 Homestead, Florida 33034

 steve_fried...@nps.gov
 Office (305) 224 - 4282
 Fax     (305) 224 - 4147

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unqiue problem

2010-06-14 Thread Assa Yeroslaviz
I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of getting
the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote:

 Your process does remove all the duplicate entries based on the
 content of the two columns.  After you do this, there are still
 duplicate entries in the first column that you are trying to use as
 rownames and therefore the error.  Why to you want to use non-unique
 entries as rownames?  Do you really need the row names, or should you
 only be keeping unique values for the first column?

 On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote:
  Hello everybody,
 
  I have a a matrix of 2 columns and over 27k rows.
  some of the rows are double , so I tried to remove them with the command
  unique():
 
  Workbook5 - read.delim(file =  Workbook5.txt)
  dim(Workbook5)
  [1] 27748 2
  Workbook5 - unique(Workbook5)
  dim(Workbook5)
  [1] 20101 2
 
  it removed a lot of line, but unfortunately not all of them. I wanted to
 add
  the row names to the matrix and got this error message:
  rownames(Workbook5) - Workbook5[,1]
  Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,
  :
   duplicate 'row.names' are not allowed
  In addition: Warning message:
  non-unique values when setting 'row.names': ‘A_51_P102339’,
  ‘A_51_P102518’, ‘A_51_P103435’, 
  ‘A_51_P103465’,
  ‘A_51_P103594’, ‘A_51_P104409’, 
  ‘A_51_P104718’,
  ‘A_51_P105869’, ‘A_51_P106428’, 
  ‘A_51_P106799’,
  ‘A_51_P107176’, ‘A_51_P107959’, 
  ‘A_51_P108767’,
  ‘A_51_P109258’, ‘A_51_P109708’, 
  ‘A_51_P110341’,
  ‘A_51_P111757’, ‘A_51_P112427’, 
  ‘A_51_P112662’,
  ‘A_51_P113672’, ‘A_51_P115018’, 
  ‘A_51_P116496’,
  ‘A_51_P116636’, ‘A_51_P117666’, 
  ‘A_51_P118132’,
  ‘A_51_P118168’, ‘A_51_P118400’, 
  ‘A_51_P118506’,
  ‘A_51_P119315’, ‘A_51_P120093’, 
  ‘A_51_P120305’,
  ‘A_51_P120738’, ‘A_51_P120785’, 
  ‘A_51_P121134’,
  ‘A_51_P121359’, ‘A_51_P121412’, 
  ‘A_51_P121652’,
  ‘A_51_P121724’, ‘A_51_P121829’, 
  ‘A_51_P122141’,
  ‘A_51_P122964’, ‘A_51_P123422’, 
  ‘A_51_P123895’,
  ‘A_51_P124008’, ‘A_51_P124719’, 
  ‘A_51_P125648’,
  ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]
 
  Is there a better way to discard the duplicataions in the text file
 (Excel
  file is the origin).
 
  R.version
_
  platform   x86_64-apple-darwin9.8.0
  arch   x86_64
  os darwin9.8.0
  system x86_64, darwin9.8.0
  status Patched
  major  2
  minor  11.1
  year   2010
  month  06
  day03
  svn rev52201
  language   R
  version.string R version 2.11.1 Patched (2010-06-03 r52201)
 
  THX
 
  Assa
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging data frames

2010-06-14 Thread jim holtman
If you want to keep only the rows that are unique in the first column
then do the following:

workComb1 - subset(workComb, !duplicated(ProbeID))

On Mon, Jun 14, 2010 at 11:20 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 well, the problem is basically elsewhere. I have a data frame with
 expression data and doubled IDs in the first column (see example)
 when I want to put them into row names I get the message, that there are
 non-unique items in the data.
 So I tried with unique to delete such rows. The problem is unique doesn't
 delete all of them.

 I compare two data frames with their Probe IDs.
 I would like to delete all double lines with a certain probe ID independent
 from the rest of the line, as to say I would like a data frame with single
 unique idetifiers in the Probe Id column.
 merge doesn't give me that. It doesn't delete all similar line, if the line
 are not identical in the other columns it leaves them in the table.

 Is there a way of deleting whole the line with double Probe IDs?

 workbook - read.delim(file = workbook1.txt, quote = , sep = \t)
 GeneID - read.delim(file = testTable.txt, quote = , sep = \t)
 workComb - merge(workbook, GeneID, by.x = ProbeID, by.y = Probe.Id)
 workComb1 - unique(workComb)
 write.table(workComb, file = workComb.txt , sep = \t, quote = FALSE,
 row.names = FALSE)
 write.table(workComb1, file = workComb1.txt , sep = \t, quote = FALSE,
 row.names = FALSE)

 look at lines 49 and 50 in the file workComb1.txt after using unique on the
 file. The line are identical  with the exception of the Transcript ID. I
 would like to take one of them out of the table.

 THX,

 Assa

 On Mon, Jun 14, 2010 at 15:33, jim holtman jholt...@gmail.com wrote:

 Put the rownames as another column in your dataframe so that it
 remains with the data.  After merging, you can then use it as the
 rownames

 On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz fry...@gmail.com wrote:
  Hi,
 
  is it possible to merge two data frames while preserving the row names
  of
  the bigger data frame?
 
  I have two data frames which  i would like to combine. While doing so I
  always loose the row names. When I try to append this, I get the error
  message, that I have non-unique names. This although I used unique
  command
  on the data frame where the double inputs supposedly are
 
  thanks for the help
 
  Assa
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count of unique factors within another factor

2010-06-14 Thread Henrique Dallazuanna
Another possibility:

 rowSums(table(x)  0)


On Sun, Jun 13, 2010 at 3:08 PM, Erik Iverson er...@ccbr.umn.edu wrote:

 I think ?tapply will help here.  But *please* read the posting guide and
 provide minimal, reproducible examples!


 Birdnerd wrote:

 I have a data frame with two factors (sampling 'unit', 'species'). I want
 to
 calculate the number of unique 'species' per 'unit.' I can calculate the
 number of unique values for each variable separately, but can't get a
 count
 for each ‘unit’.

  data=read.csv(C:/Desktop/sr_sort_practice.csv)
 attach(data)


  data[1:10,]

   unit species
 1   123ACMA
 2   123LIDE
 3   123LIDE
 4   123SESE
 5   123SESE
 6   123SESE
 7   345HEAR
 8   345LOHI
 9   345QUAG
 10  345TODI…..

  sr.unique- lapply (data, unique)

 $unit
 [1] 123 345 216
 $species
  [1] ACMA  LIDE  SESE  HEAR  LOHI  QUAG  TODI  UMCA  ARSP  LIDE

  sapply (sr.unique,length)

unit species   3  10

 Then, I get stuck here because this unique species count is not given for
 each ‘unit’.
 What I'd like to get is:

 unit species
 1233
 3454
 216--

 Thanks--


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How we can open the results are saved

2010-06-14 Thread khazaei
Hi all
I saved the result of my code as a file, like
 save(namefunction,file=adresse/filename.R).
 I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread jim holtman
load('adresse/filename.R')


On Mon, Jun 14, 2010 at 12:41 PM,  khaz...@ceremade.dauphine.fr wrote:
 Hi all
 I saved the result of my code as a file, like
 save(namefunction,file=adresse/filename.R).
  I want to open the filename. Could you please help me how I can open the
 filename and see the result.

 best
 Khazaei

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread Bert Gunter
No.

Binary workspace data are saved by default with the .Rdata extension and
are opened (actually have their contents added to the current workspace)
by load().

.R text files and would need to be sourced:

source('adresse/filename.R')

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of jim holtman
Sent: Monday, June 14, 2010 9:50 AM
To: khaz...@ceremade.dauphine.fr
Cc: r-help@r-project.org
Subject: Re: [R] How we can open the results are saved

load('adresse/filename.R')


On Mon, Jun 14, 2010 at 12:41 PM,  khaz...@ceremade.dauphine.fr wrote:
 Hi all
 I saved the result of my code as a file, like
 save(namefunction,file=adresse/filename.R).
  I want to open the filename. Could you please help me how I can open the
 filename and see the result.

 best
 Khazaei

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unqiue problem

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:


I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of  
getting

the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote:


Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com  
wrote:

Hello everybody,

I have a a matrix of 2 columns and over 27k rows.
some of the rows are double , so I tried to remove them with the  
command

unique():


Workbook5 - read.delim(file =  Workbook5.txt)
dim(Workbook5)

[1] 27748 2

Workbook5 - unique(Workbook5)


Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...


 Workbook5 - Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.


--
David.

dim(Workbook5)

[1] 20101 2

it removed a lot of line, but unfortunately not all of them. I  
wanted to

add

the row names to the matrix and got this error message:

rownames(Workbook5) - Workbook5[,1]
Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
4L, 5L,

:

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘A_51_P102339’,
‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

Is there a better way to discard the duplicataions in the text file

(Excel

file is the origin).


R.version

 _
platform   x86_64-apple-darwin9.8.0
arch   x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status Patched
major  2
minor  11.1
year   2010
month  06
day03
svn rev52201
language   R
version.string R version 2.11.1 Patched (2010-06-03 r52201)

THX

Assa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unqiue problem

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 1:10 PM, David Winsemius wrote:



On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:


I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of  
getting

the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com  
wrote:



Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should  
you

only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz  
fry...@gmail.com wrote:

Hello everybody,

I have a a matrix of 2 columns and over 27k rows.
some of the rows are double , so I tried to remove them with the  
command

unique():


Workbook5 - read.delim(file =  Workbook5.txt)
dim(Workbook5)

[1] 27748 2

Workbook5 - unique(Workbook5)


Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...


Workbook5 - Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.


Actually I was thinking a bit askew although that would have  
succeeded. That was not logical indexing, which would have been done  
with duplicated() ... or rather its negation through the use of the  
! unary operator:


 str(unique(Workbook5[ ,1] ) )
 Factor w/ 17209 levels A_51_P100034,..: 1 2 3 4 5 6 7 8 9 10 ...
 str(!duplicated(Workbook5[ ,1] ) )
 logi [1:20101] TRUE TRUE TRUE TRUE TRUE TRUE ...

So this would have been the way to do it with logical indexing:

Workbook5 - Workbook5[ !duplicated(Workbook5[ ,1] ) , ]




--
David.

dim(Workbook5)

[1] 20101 2

it removed a lot of line, but unfortunately not all of them. I  
wanted to

add

the row names to the matrix and got this error message:

rownames(Workbook5) - Workbook5[,1]
Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
4L, 5L,

:

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘A_51_P102339’,
‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

Is there a better way to discard the duplicataions in the text file

(Excel

file is the origin).


R.version

_
platform   x86_64-apple-darwin9.8.0
arch   x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status Patched
major  2
minor  11.1
year   2010
month  06
day03
svn rev52201
language   R
version.string R version 2.11.1 Patched (2010-06-03 r52201)

THX

Assa

__
R-help@r-project.org mailing list


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overlay of barchart and xyplot

2010-06-14 Thread Chen, Huapeng FOR:EX
Thanks a lot!

Huapeng

-Original Message-
From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf Of 
Felix Andrews
Sent: Friday, June 11, 2010 8:23 PM
To: Chen, Huapeng FOR:EX
Cc: r-help@r-project.org
Subject: Re: [R] Overlay of barchart and xyplot

Hi,

I have an example below of adding a key to the merged plot.

You can not have the key on the right hand side because that viewport
is used by the second ylab (ylab2 from doubleYScale). Well, if you
really wanted to, you could do it with the grid package, using
frameGrob or somesuch.


NTLST_Dispersal_VAR_00_08$Year -
factor(NTLST_Dispersal_VAR_00_08$Year, levels =
c(1999,2000,2001,2002,2003,2004,2005,2006,2007),
ordered = TRUE)

dispersal-
barchart(LDP_PER*100 + SPP_PER*100 + SPG_PER*100 ~
Year | District,
data=NTLST_Dispersal_VAR_00_08,
stack=TRUE,
layout=c(5,5),
scales = list(x = list(rot = 90)),
xlab=Year, ylab=%,
strip = strip.custom( bg=light gray),
par.settings = simpleTheme(col = c(dark gray,
light gray, white)),
auto.key = list(points = FALSE, rectangles = TRUE)
)


vars -
xyplot(sqrt(Infestation_NUM) + AI  ~  Year | District,
data=NTLST_Dispersal_VAR_00_08,
layout=c(5,5),
type=b,
ylab=Square roots of number of infested cells/Landscape
aggregation index,
auto.key = list(lines = TRUE)
)

dblplot -
doubleYScale(dispersal, vars, use.style=FALSE, add.ylab2 = TRUE
)

dblplot -
update(dblplot,
par.settings = simpleTheme(fill = c(white, dark gray,
black), border=black,col.line=black,
   col.points=black,pch=c(16,17),lty=c(1,1,1,2,1))
)

## include second key at the bottom
update(dblplot, legend = list(bottom = vars$legend$top))

## Otherwise you could just include a key argument in the first plot
which includes all the items explicitly.

## Or merge the two 'auto.key's at the top:

mergeLegends - function(a, b, ...) {
g - frameGrob()
agrob - a
if (!inherits(a, grob)) {
a - eval(as.call(c(as.symbol(a$fun), a$args)), getNamespace(lattice))
}
if (!inherits(b, grob)) {
b - eval(as.call(c(as.symbol(b$fun), b$args)), getNamespace(lattice))
}
g - packGrob(g, a, side = left)
packGrob(g, b, side = right)
}

update(dblplot, legend = list(top = list(fun = mergeLegends,
args = list(a = dispersal$legend$top, b = vars$legend$top





On 5 June 2010 04:49, Chen, Huapeng FOR:EX huapeng.c...@gov.bc.ca wrote:
 Hi Felix,

 Thanks for your help and advice. The following code is close to what I want 
 but still have problems of failure to custom lines and add a key in any way. 
 Par.settings with the final plot seems not working somehow except pch and lty 
 but they overwrite par.setting with barchart. I also attached data I used by 
 using dput. I appreciate your further helps.

 Thanks,

 Huapeng


 # code #
 NTLST_Dispersal_VAR_00_08$Year - factor(NTLST_Dispersal_VAR_00_08$Year, 
 levels = c(1999,2000,2001,2002,2003,2004,2005,2006,2007), 
 ordered = TRUE)

 dispersal-barchart(NTLST_Dispersal_VAR_00_08$LDP_PER*100 +
                     NTLST_Dispersal_VAR_00_08$SPP_PER*100 +
                     NTLST_Dispersal_VAR_00_08$SPG_PER*100 ~
                     NTLST_Dispersal_VAR_00_08$Year | 
 NTLST_Dispersal_VAR_00_08$District,
                     data=NTLST_Dispersal_VAR_00_08,
                     horizontal=FALSE,
                     stack=TRUE,
                     layout=c(5,5),
                     xlab=Year,
                     ylab=%,
                     strip = strip.custom( bg=light gray),
                     par.settings = simpleTheme(col = c(dark gray, light 
 gray, white)),
                     #key=list(space=right,size=10,
                     # rectangles=list(size=1.7, border=black, col = 
 c(white, dark gray, black)),
                      
 #lines=list(pch=c(16,17),lty=c(1,2),col=black,type=b),
                     # text=list(text=c(SPG,SPP,LDP)))

                     #auto.key=TRUE
                     )



 xyplot(sqrt(NTLST_Dispersal_VAR_00_08$Infestation_NUM) +
             NTLST_Dispersal_VAR_00_08$AI  ~  NTLST_Dispersal_VAR_00_08$Year | 
 NTLST_Dispersal_VAR_00_08$District,
             data=NTLST_Dispersal_VAR_00_08,
             layout=c(5,5),
             type=b,
             ylab=Square roots of number of infested cells/Landscape 
 aggregation index,
             #par.settings = simpleTheme(col = c(black, black), 
 pch=c(16,17)),
             #key=list(space=right,size=10,
                      #rectangles=list(size=1.7, border=black, col = 
 c(white, dark gray, black)),
             #         
 lines=list(pch=c(16,17),lty=c(1,2),col=black,type=b),
             #         

Re: [R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread David Smith
On Mon, Jun 14, 2010 at 7:16 AM, Red Roo redr...@yahoo.com wrote:

 Looking for a recommended package that handles prime number computations.


The gmp package (http://crantastic.org/packages/gmp) has some good tools for
prime numbers. I've used the is.prime function before; it's stochastic (in
the sense that it gives the right answer with a high probability), but very
fast for large numbers. I'm pretty sure there are exact functions for
testing and generating primes too.

# David Smith

-- 
David M Smith da...@revolutionanalytics.com
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursive function

2010-06-14 Thread K. Elo
Hi!

Do you mean something like this (df is your original data frame):

--- cut here ---

df1-df
df1[[1]]-paste(R,df[[1]],sep=_)
colnames(df1)-c(SERIES,YEAR,value)
df1$value[ df1$YEAR==2009 ]-5
for (i in c(2009:2007)) { df1$value[ df1$YEAR==(i-1) ]-( df1$value[
df1$YEAR==i ]-df$DELTA[ df$year==i ] ) }

--- cut here ---

Now the output:

 df1
  SERIES YEAR value
1 R_EC01 2006   -19
2 R_EC01 2007-9
3 R_EC01 2008-4
4 R_EC01 2009 5

Please let me know if you were looking for a more general approach
suitable for larger data frames with e.g. several variable classes
(EC01, EC02 etc.)

Kind regards,
Kimmo


--
University of Turku, Finland
Dep. of Political Science and Contemporary history

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread Patrick Burns

On 14/06/2010 17:50, jim holtman wrote:

load('adresse/filename.R')


Or:

attach('adresse/filename.R')

The difference between 'load' and 'attach'
is that 'load' puts the contents of the file
into your workspace (global environment, first
location on the search list), while 'attach'
creates a new location on the search list.

But calling this 'filename.R' is likely to lead
to trouble.  The '.R' suffix is usually used
for files containing R commands that can be
used by the 'source' function.

The usual convention for files created by 'save'
is to use a '.rda' suffix.

Pat




On Mon, Jun 14, 2010 at 12:41 PM,khaz...@ceremade.dauphine.fr  wrote:

Hi all
I saved the result of my code as a file, like

save(namefunction,file=adresse/filename.R).

  I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursive function

2010-06-14 Thread Henrique Dallazuanna
Try this:

transform(x, DELTA = NULL, value = rev(c(5, 5 - cumsum(rev(DELTA[-1])


On Mon, Jun 14, 2010 at 12:29 PM, n.via...@libero.it n.via...@libero.itwrote:


 Dear list,
 I have the following problem, what i'm trying to do is to built a function
 which does the following calculationg in a recursive way:


 I have a data frame more or less like this:

 variableyear DELTA

 EC01 2006/
 EC01 2007   10
 EC01 20085
 EC01 20099


 And then I have at time 2009  a variable called R_EC01(2009)=5
 What I have to do is to construct the R_EC01 time series by starting from
 the 2009 value:
 R_EC01(2008)=R_EC01(2009)-DELTA(2009)
 R_EC01(2007)=R_EC01(2008)-DELTA(2008)
 R_EC01(2006)=R_EC01(2007)-DELTA(2007)


 In terms of number, the results that i should get are:
 R_EC01(2008)=5-9=-4

 R_EC01(2007)=-4-5=-9
 R_EC01(2006)=-9-10=-19
 so my data frame should looks like this
 SERIESYEAR value

 R_EC01   2006  -19

 R_EC012007   -9

 R_EC012008   -4

 R_EC01 2009   5
 Anyone Knows hot to do it??
 My dataframe is not set as a time series...


 Thanks a lot!!!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] executable script

2010-06-14 Thread Cliff Clive

In Python, it is literally this easy:

import rpy2.robjects as robjects
robjects.r(
source(C:/YOUR R FILE GOES HERE  )
)


Type the name of your R source code into this script and save it as a Python
script (add the suffix .py), and then you can run by double-clicking.  If
you want to see the results of your program, you'll have to have R print
them in a file.

Now, to get Python to run on your computer, you'll have to install the
following:

http://www.python.org/download/releases/2.6.5/ Python 2.6.5 
http://sourceforge.net/projects/numpy/files/ NumPy 1.4.1 
http://sourceforge.net/projects/rpy/files/rpy2/ Rpy2 2.0.8 

(Note that these are the versions I currently have on my computer, there may
be more recent ones out there.)

You will also have to add Python to your computer's Path.  To do this in
Windows:
1. Right-click on My Computer, select Properties
2. On the Advanced tab, click Environmental Variables
3. In the System Variables list, select Path and click Edit
4. In the Variable Value line, enter C:\Python26\; at the beginning of the
line (or C:\Python30\ if you choose to install Python 3.0)

Now you can run Python scripts simply by double clicking them.  This makes
it very easy to turn R codes into an executable script; the only caveat is
that you need to have Python installed on any computers that you want to run
your script on.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/executable-script-tp839859p2254813.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread Prof Brian Ripley

On Mon, 14 Jun 2010, Patrick Burns wrote:


On 14/06/2010 17:50, jim holtman wrote:

load('adresse/filename.R')


Or:

attach('adresse/filename.R')

The difference between 'load' and 'attach'
is that 'load' puts the contents of the file
into your workspace (global environment, first


Not necessarily: see the envir argument.


location on the search list), while 'attach'
creates a new location on the search list.


location = environement here.


But calling this 'filename.R' is likely to lead
to trouble.  The '.R' suffix is usually used
for files containing R commands that can be
used by the 'source' function.

The usual convention for files created by 'save'
is to use a '.rda' suffix.


Or .RData or .Rdata for those not DOS-centric: see the examples in the 
help file.




Pat




On Mon, Jun 14, 2010 at 12:41 PM,khaz...@ceremade.dauphine.fr  wrote:

Hi all
I saved the result of my code as a file, like

save(namefunction,file=adresse/filename.R).

  I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread array chip
Thanks Charles for the reproducible codes. I started this question because I 
was asked to take a look at such dataset, but I have doubt if it's meaningful 
to do a LR with 50 variables. I haven't got the dataset yet, thus have not 
tried any code. But again for sharing some simulation code.

have one question for your code. When you calculate common standard errors for 
coefficients using sd(coef(fit)), should you exlude intercept by doing 
sd(coef(fit)[-1])? Actually after removing intercept, the standard error 
calculate this way is very similar to one reported from 
colMeans(coef(summary(fit))), for both scenarios in your example (coef = 0.14 
or 1.0)

Another question is the 50 simulated variables have very low correlations 
(ranging from -0.1 to 0.08), that may contribute to the stable model. If some 
(not all) of the 50 variable have considerable correlation, like 0.7 or 0.8 
correlation, How would LR model behave?

Thanks

John



- Original Message 
From: Charles C. Berry cbe...@tajo.ucsd.edu
To: Joris Meys jorism...@gmail.com
Cc: r-help@r-project.org; Marc Schwartz marc_schwa...@me.com
Sent: Mon, June 14, 2010 8:32:02 AM
Subject: Re: [R] logistic regression with 50 varaibales

On Mon, 14 Jun 2010, Joris Meys wrote:

 Hi,

 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.


Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:

 set.seed(54321)
 dat - as.data.frame(matrix(rnorm(1800*50),nc=50))
 dat$y - rbinom(1800,1,plogis(rowSums(dat)/7))
 fit - glm(y~., dat, family=binomial)
 1/7 # the true coef
[1] 0.1428571
 sd(coef(fit)) # roughly, the common standard error
[1] 0.05605597
 colMeans(coef(summary(fit))) # what glm() got
   Estimate Std. Errorz value   Pr(|z|)
0.14590836 0.05380666 2.71067328 0.06354820
 # a trickier case:
 set.seed(54321)
 dat - as.data.frame(matrix(rnorm(1800*50),nc=50))
 dat$y - rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
 fit - glm(y~., dat, family=binomial)
 colMeans(coef(summary(fit)))
Estimate  Std. Error z valuePr(|z|)
0.982944012 0.119063631 8.222138491 0.008458002


Finer examination of the latter fit will show some values that differ too 
far from 1.0 to agree with the asymptotic std err.

 sd(coef(fit)) # rather bigger than 0.119
[1] 0.1827462
 range(coef(fit))
[1] -0.08128586  1.25797057

And near separability may be playing here:

 cbind(
+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
  [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183


Recall that the observed information contains a factor of

plogis( predict(fit) )* plogis( -predict(fit))

 hist(plogis( predict(fit) )* plogis( -predict(fit)))

So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on 
what the setup is.

I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with 
correlated covariates and/or one or more highly influential covariates) 
and that the OPs case is not one of them.


HTH,

Chuck

 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.

 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model, as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible, and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.

 Cheers
 Joris

 On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Jun 13, 2010, at 10:20 PM, array chip wrote:

 Hi, this is not R technical question per se. I know there are many 
 excellent statisticians in this list, so here my questions: I have dataset 
 with ~1800 observations and 50 independent variables, so there are about 35 
 samples per variable. Is it wise to build a stable multiple logistic model 
 with 50 independent variables? Any problem with this approach? Thanks

 John


 The general rule of thumb is to have 10-20 'events' per covariate degree of 
 freedom. Frank has suggested that 

[R] installing RExcel package

2010-06-14 Thread Cable, Samuel B Civ USAF AFMC AFRL/RVBXI
Was wondering if anyone has any experience installing the RExcel package
by hand.  I think I have all the files needed, but our firewall here
prevents RExcelInstaller from going through the internet to get them
like it wants to do, and it just gives up.  Any ideas?  Thanks.

--Sam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] installing RExcel package

2010-06-14 Thread Richard M. Heiberger
You can normally get through the firewall by using
the internet2 option.
Use ??internet for the exact function name.  I am not at my computer now so I 
can't check for you.

Rich
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Zero counts lost with table() in functions

2010-06-14 Thread Ottar Kvindesland
Hi,

I am collecting replies from a survey and counts replies by the table()
function. The function below carries two
data frames and counts the observations of the findings in the first
parameter vector given the value of the second as shown in the code below.

My trouble is that the vector kp_vec at the end of the inner loop seems to
ignore the value when the
table() counts zero occurences of one of the outcomes. I will have

y n
34   0

This is not picked up in the matrix row after the loops with something like

 y   n y
Funding   Survival  12 5   34

where the last n value is missing. This causes my returned data frame to
fail and in all, rather miserable for the plot.

I see the point of this in a way, so I believe it is not a bug. I'd love to
get my zero back. Is it a subtle point in R I have missed?


kpi_test - function ( paramvec, kpivec ) {
kp_vec - c()
res_kpi_y - c()
res_kpi_n - c()

tmp_param - c()
tmp_kpi - c()

for(param_no in seq(from=1, to=length(paramvec), by = 1)) {
tmp_param - paramvec[param_no]
for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) {
tmp_kpi - kpivec[kpi_no]
res_kpi_y - table( tmp_param [ tmp_kpi == 'y' ] )
res_kpi_n - table( tmp_param [ tmp_kpi == 'n' ] )
kp_vec - c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n )
}
}
matrix_vector - matrix(kp_vec, ncol=6, byrow=T)
fres - data.frame(matrix_vector)
return( fres )
}


ottar

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to make a barplot similar to Excel’s “clustered column chart”.

2010-06-14 Thread Josef . Kardos
I have a matrix with 12 rows (one for each month), 2 columns (baseflow, 
runoff). I would like to make a barplot similar to Excel’s “clustered 
column chart”. 
 Here is my matrix ‘x’
 
8.25875413.300710
10.180953   10.760465
11.012184   13.954887
10.910870   13.839839
9.02351911.511129
7.18924112.519830
5.92557617.101491
5.21161313.585175
5.03959213.506304
4.4623259.963006
5.58652111.306202
7.73924214.669374

If I use barplot(x, beside=T), column 1 appears on the left side of the 
plot, and then column 2 appears on the right side of the plot.  I would 
rather that they alternate, so that for month 1, I see 2 bars – column 1, 
then column 2.  And so forth for months 2-12. Is there a simple way to do 
this?  (In excel this is the chart option for “clustered column”)



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subtracting POSIXct data/times

2010-06-14 Thread James Rome
I have two dataframe columns of POXIXct data/times that include seconds.
I got them into this format using for example
zsort$ETA - as.POSIXct(as.character(zsort$ETA), format=%m/%d/%Y %H:%M:%S)

My problem is that when I subtract the two columns, sometimes the
difference is given in seconds, and sometimes it is given in minutes. I
don't care which it is, but I need to know which one I will get.

DateTimeETA 
2010-05-16 02:19:56 2010-05-16 03:46:35 
...
Browse[1] mins = zsort$ETA - zsort$DateTime
Browse[1] mins
Time differences in hours
 [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944
3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278
6.243056 6.243889 6.248056 6.248611 6.248611 6.356667
attr(,tzone)

But sometimes the answer is in seconds.
# make a column with the minutes before landing
zsort$MinBeforeLand = zsort$ETA - zsort$DateTime
 zsort$MinBeforeLand
Time differences in secs
 [1]   -50   136   221   878  1192  2263  3296  3959  4968  5846  8709
11537 12198 12442 12642 15952 18273 19952 20538

How do I specify the resultant units?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero counts lost with table() in functions

2010-06-14 Thread Peter Dalgaard
Ottar Kvindesland wrote:
 Hi,
 
 I am collecting replies from a survey and counts replies by the table()
 function. The function below carries two
 data frames and counts the observations of the findings in the first
 parameter vector given the value of the second as shown in the code below.
 
 My trouble is that the vector kp_vec at the end of the inner loop seems to
 ignore the value when the
 table() counts zero occurences of one of the outcomes. I will have
 
 y n
 34   0
 
 This is not picked up in the matrix row after the loops with something like
 
  y   n y
 Funding   Survival  12 5   34
 
 where the last n value is missing. This causes my returned data frame to
 fail and in all, rather miserable for the plot.
 
 I see the point of this in a way, so I believe it is not a bug. I'd love to
 get my zero back. Is it a subtle point in R I have missed?

I don't know about subtle, but the general idea is that if you can only
expect to get a zero count if the set of possible values is known in
advance to include that value. Otherwise there's just no end to the
number of zeros to include! This can be obtained by making the object
you tabulate into a factor with the relevant level set:

 table(y)

y
1
 table(factor(y,levels=c(y,n)))

y n
1 0


Incidentally, this is the main reason R doesn't by default drop unused
levels when subsetting.

I can't make heads or tails of your code, but I hope the above helps
solve your issue.

-pd

 
 
 kpi_test - function ( paramvec, kpivec ) {
 kp_vec - c()
 res_kpi_y - c()
 res_kpi_n - c()
 
 tmp_param - c()
 tmp_kpi - c()
 
 for(param_no in seq(from=1, to=length(paramvec), by = 1)) {
 tmp_param - paramvec[param_no]
 for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) {
 tmp_kpi - kpivec[kpi_no]
 res_kpi_y - table( tmp_param [ tmp_kpi == 'y' ] )
 res_kpi_n - table( tmp_param [ tmp_kpi == 'n' ] )
 kp_vec - c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n )
 }
 }
 matrix_vector - matrix(kp_vec, ncol=6, byrow=T)
 fres - data.frame(matrix_vector)
 return( fres )
 }
 
 
 ottar
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero counts lost with table() in functions

2010-06-14 Thread Erik Iverson



Ottar Kvindesland wrote:

Hi,

I am collecting replies from a survey and counts replies by the table()
function. The function below carries two
data frames and counts the observations of the findings in the first
parameter vector given the value of the second as shown in the code below.

My trouble is that the vector kp_vec at the end of the inner loop seems to
ignore the value when the
table() counts zero occurences of one of the outcomes. I will have

y n
34   0

This is not picked up in the matrix row after the loops with something like

 y   n y
Funding   Survival  12 5   34

where the last n value is missing. This causes my returned data frame to
fail and in all, rather miserable for the plot.

I see the point of this in a way, so I believe it is not a bug. I'd love to
get my zero back. Is it a subtle point in R I have missed?


kpi_test - function ( paramvec, kpivec ) {
kp_vec - c()
res_kpi_y - c()
res_kpi_n - c()

tmp_param - c()
tmp_kpi - c()

for(param_no in seq(from=1, to=length(paramvec), by = 1)) {
tmp_param - paramvec[param_no]
for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) {
tmp_kpi - kpivec[kpi_no]
res_kpi_y - table( tmp_param [ tmp_kpi == 'y' ] )
res_kpi_n - table( tmp_param [ tmp_kpi == 'n' ] )
kp_vec - c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n )
}
}
matrix_vector - matrix(kp_vec, ncol=6, byrow=T)
fres - data.frame(matrix_vector)
return( fres )
}


Is it possible to provide test data that shows how your function is not 
behaving as you'd like?  It's probable that your function can be reduced 
to a line or two of R code, while solving your problem, and become much 
more readable if we know what it's supposed to do.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make a barplot similar to Excel’s “clustered column chart”.

2010-06-14 Thread Peter Dalgaard
josef.kar...@phila.gov wrote:
 I have a matrix with 12 rows (one for each month), 2 columns (baseflow, 
 runoff). I would like to make a barplot similar to Excel’s “clustered 
 column chart�. 
  Here is my matrix ‘x’
  
 8.25875413.300710
 10.180953   10.760465
 11.012184   13.954887
 10.910870   13.839839
 9.02351911.511129
 7.18924112.519830
 5.92557617.101491
 5.21161313.585175
 5.03959213.506304
 4.4623259.963006
 5.58652111.306202
 7.73924214.669374
 
 If I use barplot(x, beside=T), column 1 appears on the left side of the 
 plot, and then column 2 appears on the right side of the plot.  I would 
 rather that they alternate, so that for month 1, I see 2 bars – column 1, 
 then column 2.  And so forth for months 2-12. Is there a simple way to do 
 this?  (In excel this is the chart option for “clustered column�)

Can't you just transpose x? barplot(t(x), beside=T)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Times series data file?

2010-06-14 Thread Douglas M. Hultstrand

Hello,

I currently splitting a file into individual files (time series each 
separated into one file), the file I read in skips the first four lines 
and extracts the data columns I need.  I was wondering if there is a way 
for R to automatically scan and separate the files based on the head 
information?


The header delineates the observation with a 254 space time space day 
space month space year.

254  0 26  NOV1995

I would like to then use the program I have written to do the analysis.  
I have attached a small subset text file of data (two observations).  
Any thoughts would be very helpful.


# Read in data file
data - read.table(file.txt, skip=4, header=F)
temp - data$V4
elev - data$V3

Thank you,
Doug

--
-
Douglas M. Hultstrand, MS
Senior Hydrometeorologist
Metstat, Inc. Windsor, Colorado
voice: 720.771.5840
email: dmhul...@metstat.com
web: http://www.metstat.com
-

254  0 26  NOV1995
  1  24232  72694  44.92N123.02W61   2302
  2100   1960   2680149  9  4
  3   SLE9 kt
  9  10071 61106 94180 10
  5  10037 89110 82183 12
  4  1120106 84186 12
  6   9780304  9  9205 19
  6   9500544 74 59221 19
254 12 26  NOV1995
  1  24232  72694  44.92N123.02W61   2302
  2100   1960   2680149  9  4
  3   SLE9 kt
  9  10071 61106 94180 10
  5  10037 89110 82183 12
  4  1120106 84186 12
  6   9780304  9  9205 19
  6   9500544 74 59221 19
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make a barplot similar to Excel’s “clustered column chart”.

2010-06-14 Thread Hrishi Mittal

Josef,

I think all you need to do is use the transpose of your data matrix. So if
your dataset is called mydata:

barplot(t(as.matrix(x)),beside=T)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-make-a-barplot-similar-to-Excel-s-clustered-column-chart-tp2254979p2255008.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 3D ncdf file

2010-06-14 Thread ecvetano

Hello,

I have an ncdf file with different variables for dimensions and dates  
where the dimensions are as follows, where X=i and Y=j creating a 88  
by 188 set of cells. For each cell there are 12 readings for DO taken  
at 2 hour intervals and recoded by according to the Julian calendar  
under the ordinal dates variable.


DO: [2280x188x85]
DX: [188x85]
DY: [188x85]
X: [85x1]
Y: [188x1]
Ordinal_Dates: [2280x1]
T: [2280x1]


So far, I have set up each variable as follows, and I don?t know how  
to work with the 3D DO variable which contains 12 daily values per day


setwd (C:/Documents and Settings/Desktop/R)
library(ncdf)
profile1 - open.ncdf(2005r1_v3_Ct=500_lerie_out_bot.nc, header=FALSE)

name1-get.var.ncdf( profile1, DO)
name2-get.var.ncdf( profile1, DX)
name3-get.var.ncdf( profile1, DY)
name4-get.var.ncdf( profile1, X)
name5-get.var.ncdf( profile1, Y)
name6-get.var.ncdf( profile1, Ordinal_Dates)
name7-get.var.ncdf( profile1, T)

I want to set a threshold, and calculate how many days have an average  
of DO greater then my threshold, and also what the daily averages of  
DO are


Any help is appreciated!
Emilija

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NPMC

2010-06-14 Thread Moimt

HI,
I am a new user of R and want to analyse some data using npmc. My data have
several levels of factor (Site, Year and Season) and several variable
(Percentages).
I have tried to use npmc but I always get an error message. My data are in a
table following this example:

SiteYEarSeasonVar1 Var2   
A   2009Dry  1056
B   

here is the error message I get: Erreur dans as.vector(x, mode) : argument
'mode' incorrect.

How should I use the npmc formula?

Thank you for your help
-- 
View this message in context: 
http://r.789695.n4.nabble.com/NPMC-tp2254913p2254913.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtracting POSIXct data/times

2010-06-14 Thread Don MacQueen
See the help page for the difftime() function, which will tell you 
how to specify the units of the differences.

(when you don't specify, it chooses the units according to some rules)

-Don

At 4:24 PM -0400 6/14/10, James Rome wrote:

I have two dataframe columns of POXIXct data/times that include seconds.
I got them into this format using for example
zsort$ETA - as.POSIXct(as.character(zsort$ETA), format=%m/%d/%Y %H:%M:%S)

My problem is that when I subtract the two columns, sometimes the
difference is given in seconds, and sometimes it is given in minutes. I
don't care which it is, but I need to know which one I will get.

DateTimeETA
2010-05-16 02:19:56 2010-05-16 03:46:35
...
Browse[1] mins = zsort$ETA - zsort$DateTime
Browse[1] mins
Time differences in hours
 [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944
3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278
6.243056 6.243889 6.248056 6.248611 6.248611 6.356667
attr(,tzone)

But sometimes the answer is in seconds.
# make a column with the minutes before landing

zsort$MinBeforeLand = zsort$ETA - zsort$DateTime
 zsort$MinBeforeLand

Time differences in secs
 [1]   -50   136   221   878  1192  2263  3296  3959  4968  5846  8709
11537 12198 12442 12642 15952 18273 19952 20538

How do I specify the resultant units?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >