date:20100614

And this one is only from last week. Please, read the posting guides carefully.

 Cheers
Joris

-- Forwarded message --
From: Joris Meys jorism...@gmail.com
Date: Sat, Jun 5, 2010 at 11:04 PM
Subject: Re: [R] What is the largest in memory data object you've
worked with in R?
To: Nathan Stephens nwsteph...@gmail.com
Cc: r-help r-help@r-project.org

You have to take some things into account :
- the maximum memory set for R might not be the maximum memory available
- R needs the memory not only for the dataset. Matrix manipulations
require frquently double of the amount of memory taken by the dataset.
- memory allocation is important when dealing with large datasets.
There is plenty of information about that
- R has some packages to get around memory problems with big datasets.

Read this discussione for example :
http://tolstoy.newcastle.edu.au/R/help/05/05/4507.html

and this page of Matthew Keller is a good summary too :
http://www.matthewckeller.com/html/memory.html

Cheers
Joris

On Sat, Jun 5, 2010 at 12:32 AM, Nathan Stephens nwsteph...@gmail.com wrote:
 For me, I've found that I can easily work with 1 GB datasets.  This includes
 linear models and aggregations.  Working with 5 GB becomes cumbersome.
 Anything over that, and R croaks.  I'm using a dual quad core Dell with 48
 GB of RAM.

 I'm wondering if there is anyone out there running jobs in the 100 GB
 range.  If so, what does your hardware look like?

 --Nathan

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi
meenakshichidamba...@gmail.com wrote:

 HI,

 I want to import 1.5G CSV file in R.
 But the following error comes:

 'Victor allocation 12.4 size'

 How to read the large CSV file in R .

 Any one can help me?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Large Data

http://www.google.com/#hl=ensource=hpq=R+big+data+setsaq=faqi=g1aql=oq=gs_rfai=fp=686584f57664

Cheers
Joris

On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi
meenakshichidamba...@gmail.com wrote:

 HI,

 I want to import 1.5G CSV file in R.
 But the following error comes:

 'Victor allocation 12.4 size'

 How to read the large CSV file in R .

 Any one can help me?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list matching

Try this:

cbind(mynames[rep(seq(nrow(mynames)), sapply(lst, length)),], unlist(lst))

On Mon, Jun 14, 2010 at 9:06 AM, Yuan Jian jayuan2...@yahoo.com wrote:

 Hello,

 I could not find a clear solution for the follow question. please allow me
 to ask. thanks

 mynames=cbind(c('a','b'),c(11,22))
 lst=list(a=c(1,2), b=5)

 now I try to combine mynames and lst:
 a  1  11
 a  2  11
 b  5  22

 thanks
 jian




[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG

Hi R help,

Hi R help,

Which is the easiest (most elegant) way to force aov to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
 DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
 summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

This results differ from book result at p 159, since cars and driver are 
treated as numerical variables by aov.

BRUTE FORCE SOLUTION

Manually transforming cars and driver into categorical variables, I obtain 
the correct result:
 DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
 summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

QUESTION

Which is the easiest (most elegant) way to force driver and cars from DATA 
to be treated as categorical variables by aov?
More generally, which is the easiest way to force aov  to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
 DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
 summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

This results differ from book result at p 159, since cars and driver are 
treated as numerical variables by aov.

BRUTE FORCE SOLUTION

Manually transforming cars and driver into categorical variables, I obtain 
the correct result:
 DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
 summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

QUESTION

Which is the easiest (most elegant) way to

Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?

2010-06-14 Thread Martin Maechler

Dear Deepayan,

this is in reply to a message almost 6 months ago :

 Deepayan Sarkar deepayan.sar...@gmail.com
 on Sun, 17 Jan 2010 01:39:21 -0800 writes:

 On Sat, Jan 16, 2010 at 11:56 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 Marius Hofert wrote:
 
 Dear ExpeRts,
 
 I have the scatter plot matrix as given below. I would like the 
different
 sub-plots in the scatter plot matrix to be colored differently. How 
do I
 get all points shown in the upper-left plot (on position (1,1) in the
 scatter plot matrix) to be plotted in blue, and the points shown in the 
plot
 to the right (on position (1,2) in the scatter plot matrix) to be 
plotted in
 red? More generally, how can I provide a matrix of colors to be used by
 splom() such that all data points in the corresponding sub-plot of the
 scatter plot matrix are shown in the specified color?
 
 Cheers,
 
 Marius
 
 Here is the code:
 
 library(lattice)
 
 entrymat=matrix(0,nrow=3,ncol=3)
 entrymat[1,2]=black
 entrymat[1,3]=blue
 entrymat[2,3]=red
 entrymat=t(entrymat)
 
 splom(~iris[,1:3],superpanel=function(z,...){
 
  
mymat.df=data.frame(rows=as.vector(row(entrymat)),cols=as.vector(col(entrymat)),entries=as.vector(entrymat))
        mymat.df=subset(mymat.df,colsrows)
                with(mymat.df,{
                        panel.text(x=rows,y=cols,labels=entries)
                })
 
  panel.pairs(z,upper.panel=panel.splom,lower.panel=function(...){},...)
        },varnames=c(1,2,3)
 )
 
 I think that you will have to modify panel.pairs to get what
 you want. But I must admit that I can't see why you would
 want such a plot. What's achieved by having different
 colours in different subpanels? And you would lose the
 ability to colour groups differently (or things would become
 really complicated and messy).

 Thanks, I was going to say the same thing, except that it would be
(1)
 conceptually simpler just to add the 'i' and 'j' values as arguments
 to the panel function (the 'pargs' variable). The colors could then
 be passed through as part of the ... arguments, and the relevant entry
 extracted in the panel function.
(2)
 The other option is to keep a global counter and increment it inside
 the panel function, and choosing colors based on that counter and
 knowledge of the order in which panels are drawn. Not very elegant,
 but the least intrusive solution I can think of.

 -Deepayan

Against the R-forge version of lattice,
the following very small patch to  panel.pairs
would allow users to use '(1)' i.e., provide  panel functions
with (i,j) arguments directly.
I'm pretty sure that the change could not easily break existing
splom() usage.

--- R/splom.R   (revision 619)
+++ R/splom.R   (working copy)
@@ -291,7 +291,8 @@
y = z[subscripts, i]),
 ##panel.number = panel.number,
 ##packet.number = packet.number),
-  list(...))
+  list(...),
+  list(i = i, j = j))
 else
 c(list(x = z[subscripts, j],
y = z[subscripts, i],
@@ -299,7 +300,8 @@
subscripts = subscripts),
 ##panel.number = panel.number,
 ##packet.number = packet.number),
-  list(...))
+  list(...),
+  list(i = i, j = j))
 
 if (!(... %in% names(formals(panel
 pargs - pargs[intersect(names(pargs), 
names(formals(panel)))]


With the above change, 
a user could use a panel function with (i,j) arguments, 
and e.g. say

Cmat - outer(1:6,1:6, 
  function(i,j) rainbow(11, start=.12, end=.5)[i+j-1])

splom(~diag(6), ## for testing: superpanel=mypanel.pairs,
  panel=function(x,y,i,j,...){
   panel.fill(Cmat[i,j]); panel.splom(x,y,...)
   panel.text(.5,.5, paste((,i,,,j,),sep=))
  })


I think that would allow quite a bit more flexibility without
the need to explicitly hack  panel.pairs
(and having to maintain such a hack against the ever-enhancing
lattice).

Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Marc Schwartz

On Jun 13, 2010, at 10:20 PM, array chip wrote:

 Hi, this is not R technical question per se. I know there are many excellent 
 statisticians in this list, so here my questions: I have dataset with ~1800 
 observations and 50 independent variables, so there are about 35 samples per 
 variable. Is it wise to build a stable multiple logistic model with 50 
 independent variables? Any problem with this approach? Thanks
 
 John


The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25. 

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

  
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

  
http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?

2010-06-14 Thread Ivan Calandra

Hi,

See ?factor
e.g.: DATA$driver - factor(DATA$driver)
See also the level= argument if you want to change the order of your levels.

HTH,
Ivan

Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit :
 Hi R help,

 Hi R help,

 Which is the easiest (most elegant) way to force aov to treat numerical 
 variables as categorical ?

 Sincerely, Andrea Bernasconi DG

 PROBLEM EXAMPLE

 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.

 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

 The file tab0408.dat contains following DATA:

 DATA
  
 driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16

 Now

 summary( aov(MODEL, data=DATA) )
  
  Df Sum Sq Mean Sq F value Pr(F)
 cars 1   12.8  12.800  0.8889 0.3680
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634
 Residuals   10  144.0  14.400
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.

 BRUTE FORCE SOLUTION

 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:

 DATA_AB
  
 driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16

 summary( aov(MODEL, data=DATA_AB) )
  
  Df Sum Sq Mean Sq F value   Pr(F)
 cars 3 24   8.000 1.5 0.307174
 driver   3216  72.00013.5 0.004466 **
 additive 3 40  13.333 2.5 0.156490
 Residuals6 32   5.333
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 QUESTION

 Which is the easiest (most elegant) way to force driver and cars from 
 DATA to be treated as categorical variables by aov?
 More generally, which is the easiest way to force aov  to treat numerical 
 variables as categorical ?

 Sincerely, Andrea Bernasconi DG

 PROBLEM EXAMPLE

 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.

 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

 The file tab0408.dat contains following DATA:

 DATA
  
 driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16

 Now

 summary( aov(MODEL, data=DATA) )
  
  Df Sum Sq Mean Sq F value Pr(F)
 cars 1   12.8  12.800  0.8889 0.3680
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634
 Residuals   10  144.0  14.400
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.

 BRUTE FORCE SOLUTION

 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:

 DATA_AB
  
 driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16

 summary( aov(MODEL, data=DATA_AB) )
  
  Df Sum

Re: [R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG

I think I found the solution !

 cc-factor(cars)
 dd-factor(driver)
 MODEL-y~cc+dd+additive
 summary(aov(MODEL,data=DATA))

On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote:

 Hi R help,
 
 Hi R help,
 
 Which is the easiest (most elegant) way to force aov to treat numerical 
 variables as categorical ?
 
 Sincerely, Andrea Bernasconi DG
 
 PROBLEM EXAMPLE
 
 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.
 
 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
 
 The file tab0408.dat contains following DATA:
  DATA
driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16
 
 Now
  summary( aov(MODEL, data=DATA) )
 Df Sum Sq Mean Sq F value Pr(F)  
 cars 1   12.8  12.800  0.8889 0.3680  
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634  
 Residuals   10  144.0  14.400 
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.
 
 BRUTE FORCE SOLUTION
 
 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:
  DATA_AB
driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16
  summary( aov(MODEL, data=DATA_AB) )
 Df Sum Sq Mean Sq F value   Pr(F)   
 cars 3 24   8.000 1.5 0.307174   
 driver   3216  72.00013.5 0.004466 **
 additive 3 40  13.333 2.5 0.156490   
 Residuals6 32   5.333
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 
 
 QUESTION
 
 Which is the easiest (most elegant) way to force driver and cars from 
 DATA to be treated as categorical variables by aov?
 More generally, which is the easiest way to force aov  to treat numerical 
 variables as categorical ?
 
 Sincerely, Andrea Bernasconi DG
 
 PROBLEM EXAMPLE
 
 I consider the latin squares example described at page 157 of the book:
 Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
 P. Box, J. Stuart Hunter, William G. Hunter.
 
 This example use the data-file /BHH2-Data/tab0408.dat from 
 ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
 
 The file tab0408.dat contains following DATA:
  DATA
driver cars additive  y
 1   11A 19
 2   21D 23
 3   31B 15
 4   41C 19
 5   12B 24
 6   22C 24
 7   32D 14
 8   42A 18
 9   13D 23
 10  23A 19
 11  33C 15
 12  43B 19
 13  14C 26
 14  24B 30
 15  34A 16
 16  44D 16
 
 Now
  summary( aov(MODEL, data=DATA) )
 Df Sum Sq Mean Sq F value Pr(F)  
 cars 1   12.8  12.800  0.8889 0.3680  
 driver   1  115.2 115.200  8. 0.0179 *
 additive 3   40.0  13.333  0.9259 0.4634  
 Residuals   10  144.0  14.400 
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 This results differ from book result at p 159, since cars and driver are 
 treated as numerical variables by aov.
 
 BRUTE FORCE SOLUTION
 
 Manually transforming cars and driver into categorical variables, I 
 obtain the correct result:
  DATA_AB
driver cars additive  y
 1  D1   C1A 19
 2  D2   C1D 23
 3  D3   C1B 15
 4  D4   C1C 19
 5  D1   C2B 24
 6  D2   C2C 24
 7  D3   C2D 14
 8  D4   C2A 18
 9  D1   C3D 23
 10 D2   C3A 19
 11 D3   C3C 15
 12 D4   C3B 19
 13 D1   C4C 26
 14 D2   C4B 30
 15 D3   C4A 16
 16 D4   C4D 16
  summary( aov(MODEL, data=DATA_AB) )
 Df Sum Sq Mean Sq F

[R] merging data frames

2010-06-14 Thread Assa Yeroslaviz

Hi,

is it possible to merge two data frames while preserving the row names of
the bigger data frame?

I have two data frames which  i would like to combine. While doing so I
always loose the row names. When I try to append this, I get the error
message, that I have non-unique names. This although I used unique command
on the data frame where the double inputs supposedly are

thanks for the help

Assa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Uwe Ligges




On 14.06.2010 14:11, Katya Mauff wrote:

Hi all

Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)

I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?



Yes, just type ?topic and your web brwoser opens and displays the help 
file. Note that the address beginning http://127.0.0.1; is on your 
local machine, hence you can stay offline.


Uwe Ligges



Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list matching

One thing you might do is to transform the data into a format that is
easier to combine; I like using 'merge':

 mynames=cbind(c('a','b'),c(11,22))
 lst=list(a=c(1,2), b=5)
 mynames
 [,1] [,2]
[1,] a  11
[2,] b  22
 lst
$a
[1] 1 2

$b
[1] 5

 mynames.df - as.data.frame(mynames)
 mynames.df
  V1 V2
1  a 11
2  b 22
 lst.s - stack(lst)
 lst.s
  values ind
1  1   a
2  2   a
3  5   b
 merge(mynames.df, lst.s, by.x=V1, by.y=ind)
  V1 V2 values
1  a 11  1
2  a 11  2
3  b 22  5



On Mon, Jun 14, 2010 at 8:06 AM, Yuan Jian jayuan2...@yahoo.com wrote:
 Hello,

 I could not find a clear solution for the follow question. please allow me 
 to ask. thanks

 mynames=cbind(c('a','b'),c(11,22))
 lst=list(a=c(1,2), b=5)

 now I try to combine mynames and lst:
 a  1  11
 a  2  11
 b  5  22

 thanks
 jian




        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging data frames

Put the rownames as another column in your dataframe so that it
remains with the data.  After merging, you can then use it as the
rownames

On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 Hi,

 is it possible to merge two data frames while preserving the row names of
 the bigger data frame?

 I have two data frames which  i would like to combine. While doing so I
 always loose the row names. When I try to append this, I get the error
 message, that I have non-unique names. This although I used unique command
 on the data frame where the double inputs supposedly are

 thanks for the help

 Assa

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Prof Brian Ripley


On Mon, 14 Jun 2010, Katya Mauff wrote:


Hi all

Apologies if this is a trivial question- I have searched the lists 
and the online help files etc but have not managed to find anything. 
I recently downloaded the latest version of R, which has the help 
type set to htmlhelp as default (according to 
http://127.0.0.1:18380/library/utils/html/help.html)


Not so for the latest R (2.11.1) compiled with the default settings. 
The help(help) page in 2.11.1 actually says


 The ‘factory-fresh’ default is text help except from the Mac OS
 GUI, which uses HTML help displayed in its own browser window.

However, the Windows installer allows you to set the default type of 
help, and its default is HTML help in R = 2.10.0.


I would very much like to be able to access the help files when I am 
offline by typing ?topic etc as I used to with the previous R 
version. Any suggestions?


You should be able to use HTML help offline: it is running locally on 
your machine (127.0.0.1 is the loopback interface).  (It is 
conceivable that your unstated OS disables the loopback interface when 
'offline', but we have not come across an instance of this.)


But ?help tells you how select other forms of help by default via

  help_type = getOption(help_type))

and ?'?' says

 This is a shortcut to ‘help’ and uses its default type of help.




Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with 50 varaibales

Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out the curse of
dimensionality(Hughes effect) which starts to play rather quickly.

The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Jun 13, 2010, at 10:20 PM, array chip wrote:

 Hi, this is not R technical question per se. I know there are many excellent 
 statisticians in this list, so here my questions: I have dataset with ~1800 
 observations and 50 independent variables, so there are about 35 samples per 
 variable. Is it wise to build a stable multiple logistic model with 50 
 independent variables? Any problem with this approach? Thanks

 John


 The general rule of thumb is to have 10-20 'events' per covariate degree of 
 freedom. Frank has suggested that in some cases that number should be as high 
 as 25.

 The number of events is the smaller of the two possible outcomes for your 
 binary dependent variable.

 Covariate degrees of freedom refers to the number of columns in the model 
 matrix. Continuous variables are 1, binary factors are 1, K-level factors are 
 K - 1.

 So if out of your 1800 records, you have at least 500 to 1000 events, 
 depending upon how many of your 50 variables are K-level factors and whether 
 or not you need to consider interactions, you may be OK. Better if towards 
 the high end of that range, especially if the model is for prediction versus 
 explanation.

 Two excellent references would be Frank's book:

  http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

 and Steyerberg's book:

  http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

 to assist in providing guidance for model building/validation techniques.

 HTH,

 Marc Schwartz

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Katya Mauff

Hi-I have tried that offline, -my browser opens and says: Offline mode
Firefox is currently in offline mode and can't browse the Web.
Uncheck Work Offline in the File menu, then try again
 
I can access ?help that way, or pages I've been to before having gone offline, 
but nothing new.

 Uwe Ligges lig...@statistik.tu-dortmund.de 2010/06/14 03:29 PM 


On 14.06.2010 14:11, Katya Mauff wrote:
 Hi all

 Apologies if this is a trivial question- I have searched the lists and the 
 online help files etc but have not managed to find anything. I recently 
 downloaded the latest version of R, which has the help type set to htmlhelp 
 as default (according to http://127.0.0.1:18380/library/utils/html/help.html)

 I would very much like to be able to access the help files when I am offline 
 by typing ?topic etc as I used to with the previous R version. Any 
 suggestions?


Yes, just type ?topic and your web brwoser opens and displays the help 
file. Note that the address beginning http://127.0.0.1; is on your 
local machine, hence you can stay offline.

Uwe Ligges


 Thanks
 Katya




 ###
 UNIVERSITY OF CAPE TOWN

 This e-mail is subject to the UCT ICT policies and e-mail disclaimer 
 published on our website at 
 http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 
 21 650 4500. This e-mail is intended only for the person(s) to whom it is 
 addressed. If the e-mail has reached you in error, please notify the author. 
 If you are not the intended recipient of the e-mail you may not use, 
 disclose, copy, redirect or print the content. If this e-mail is not related 
 to the business of UCT it is sent by the sender in the sender's individual 
 capacity.

 ###


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Html help

If the IP number is something like 127.0.0.1:x then you are on
your local computer.
Cheers
Joris

On Mon, Jun 14, 2010 at 1:33 PM, Murray Jorgensen m...@waikato.ac.nz wrote:
 I have just installed R 2.11.1 on my XP laptop.

 I like html help for browsing but text help for on-the-fly look-ups. I was a
 bit surprised when I was asked to choose between them during the
 installation. I chose text, thinking I could fix the html help later, which
 is what I am trying to do now.

 Now when I ask for html help my browser goes to

 'http://-ip-number-/doc/html/index.html'

 instead of where I want on my computer:

 C:\apps\R\R-2.11.1\doc\html\index.html

 Now I can go where I want manually but then the package list on


 C:\apps\R\R-2.11.1\doc\html\packages.html

 does not include all the packages that I have installed and linked. I don't
 want to read my html help from the web because sometimes I am off-line or on
 a slow connection.

 How do I go about getting a local set of html help files?

 Cheers,   Murray Jorgensen

 --
 Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
 Department of Statistics, University of Waikato, Hamilton, New Zealand
 Email: m...@waikato.ac.nz  majorgen...@ihug.co.nz        Fax 7 838 4155
 Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] remove last char of a text string

2010-06-14 Thread glaporta


Dear R experts,
is there a simple way to remove the last char of a text string?
substr() function use as parameter start end only... but my strings are of
different length...
01asap05a - 01asap05
02ee04b - 02ee04
Thank you all,
Gianandrea
-- 
View this message in context: 
http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove last char of a text string

2010-06-14 Thread Gustaf Rydevik

On Mon, Jun 14, 2010 at 3:47 PM, glaporta glapo...@freeweb.org wrote:

 Dear R experts,
 is there a simple way to remove the last char of a text string?
 substr() function use as parameter start end only... but my strings are of
 different length...
 01asap05a - 01asap05
 02ee04b - 02ee04
 Thank you all,
 Gianandrea
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
 Sent from the R help mailing list archive at Nabble.com.


It's not terribly elegant, but this works:

orig.text-c(01asap05a,02ee04b)
substr(orig.text,1,nchar(orig.text)-1)

Regards,
Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove last char of a text string

2010-06-14 Thread Sarah Goslee

Sure. You can use nchar() to find out how long the string is.

 teststring - 01asap05a
 substr(teststring, 1, nchar(teststring)-1)
[1] 01asap05

On Mon, Jun 14, 2010 at 9:47 AM, glaporta glapo...@freeweb.org wrote:

 Dear R experts,
 is there a simple way to remove the last char of a text string?
 substr() function use as parameter start end only... but my strings are of
 different length...
 01asap05a - 01asap05
 02ee04b - 02ee04
 Thank you all,
 Gianandrea


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Uwe Ligges




On 14.06.2010 15:39, Katya Mauff wrote:

Hi-I have tried that offline, -my browser opens and says: Offline mode
Firefox is currently in offline mode and can't browse the Web.
Uncheck Work Offline in the File menu, then try again



Well, go online with Firefox which means firefox can access the R help 
pages afterwards. You do not need to go really online with your machine.


Uwe



I can access ?help that way, or pages I've been to before having gone offline, 
but nothing new.


Uwe Liggeslig...@statistik.tu-dortmund.de  2010/06/14 03:29 PM



On 14.06.2010 14:11, Katya Mauff wrote:

Hi all

Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)

I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?



Yes, just type ?topic and your web brwoser opens and displays the help
file. Note that the address beginning http://127.0.0.1; is on your
local machine, hence you can stay offline.

Uwe Ligges



Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove last char of a text string

Try:

 gsub(.$, , c('01asap05a', '02ee04b'))

On Mon, Jun 14, 2010 at 10:47 AM, glaporta glapo...@freeweb.org wrote:


 Dear R experts,
 is there a simple way to remove the last char of a text string?
 substr() function use as parameter start end only... but my strings are of
 different length...
 01asap05a - 01asap05
 02ee04b - 02ee04
 Thank you all,
 Gianandrea
 --
 View this message in context:
 http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread Red Roo

Looking for a recommended package that handles prime number computations.

Tried the following unsuccessfully:
primeFactors() in the R.basic package failed to install.

primes() and primlist are broken in Schoolmath pkg on CRAN.
My analysis can be found here http://j.mp/9BNI9q
Not sure what the procedure is for getting things fixed, so I've cross-posted 
to r-dev as well.
--njg

TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/
Follow me on Twitter http://twitter.com/DrQz
PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cforest and Random Forest memory use

2010-06-14 Thread Max Kuhn

The first thing that I would recommend is to avoid the formula
interface to models. The internals that R uses to create matrices
form a formula+data set are not efficient. If you had a large number
of variables, I would have automatically pointed to that as a source
of issues. cforest and ctree only have formula interfaces though, so
you are stuck on that one. The randomForest package has both
interfaces, so that might be better.

Probably the issue is the depth of the trees. With that many
observations, you are likely to get extremely deep trees. You might
try limiting the depth of the tree and see if that has an effect on
performance.

We run into these issues with large compound libraries; in those cases
we do whatever we can to avoid ensembles of trees or kernel methods.
If you want those, you might need to write your own code that is
hyper-efficient and tuned to your particular data structure (as we
did).

On another note... are this many observations really needed? You have
40ish variables; I suspect that 1M points are pretty densely packed
into 40-dimensional space. Do you loose much by sampling the data set
or allocating a large portion to a test set? If you have thousands of
predictors, I could see the need for so many observations, but I'm
wondering if many of the samples are redundant.

Max

On Mon, Jun 14, 2010 at 3:45 AM, Matthew OKane mlok...@gmail.com wrote:
Answers added below.
Thanks again,
Matt

On 11 June 2010 14:28, Max Kuhn mxk...@gmail.com wrote:

Also, you have not said:

- your OS: Windows Server 2003 64-bit
- your version of R: 2.11.1 64-bit
- your version of party: 0.9-9995

- your code: test.cf -(formula=badflag~.,data =
example,control=cforest_control

(teststat = 'max', testtype =
'Teststatistic', replace = FALSE, ntree = 500, savesplitstats = FALSE,mtry =
10))

- what Large data set means: 1 million observations, 40+ variables,
around 200MB
- what very large model objects means - anything which breaks

So... how is anyone suppose to help you?

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?

2010-06-14 Thread Deepayan Sarkar

On Mon, Jun 14, 2010 at 6:24 PM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Dear Deepayan,

 this is in reply to a message almost 6 months ago :

 Deepayan Sarkar deepayan.sar...@gmail.com

[...]

     Thanks, I was going to say the same thing, except that it would be
 (1)
     conceptually simpler just to add the 'i' and 'j' values as arguments
     to the panel function (the 'pargs' variable). The colors could then
     be passed through as part of the ... arguments, and the relevant entry
     extracted in the panel function.
 (2)
     The other option is to keep a global counter and increment it inside
     the panel function, and choosing colors based on that counter and
     knowledge of the order in which panels are drawn. Not very elegant,
     but the least intrusive solution I can think of.

     -Deepayan

 Against the R-forge version of lattice,
 the following very small patch to  panel.pairs
 would allow users to use '(1)' i.e., provide  panel functions
 with (i,j) arguments directly.
 I'm pretty sure that the change could not easily break existing
 splom() usage.

 --- R/splom.R   (revision 619)
 +++ R/splom.R   (working copy)
 @@ -291,7 +291,8 @@
                                    y = z[subscripts, i]),
  ##                                    panel.number = panel.number,
  ##                                    packet.number = packet.number),
 -                              list(...))
 +                              list(...),
 +                              list(i = i, j = j))
                         else
                             c(list(x = z[subscripts, j],
                                    y = z[subscripts, i],
 @@ -299,7 +300,8 @@
                                    subscripts = subscripts),
  ##                                    panel.number = panel.number,
  ##                                    packet.number = packet.number),
 -                              list(...))
 +                              list(...),
 +                              list(i = i, j = j))

                     if (!(... %in% names(formals(panel
                         pargs - pargs[intersect(names(pargs), 
 names(formals(panel)))]

Done in r-forge svn.

-Deepayan



 With the above change,
 a user could use a panel function with (i,j) arguments,
 and e.g. say

 Cmat - outer(1:6,1:6,
              function(i,j) rainbow(11, start=.12, end=.5)[i+j-1])

 splom(~diag(6), ## for testing: superpanel=mypanel.pairs,
      panel=function(x,y,i,j,...){
               panel.fill(Cmat[i,j]); panel.splom(x,y,...)
               panel.text(.5,.5, paste((,i,,,j,),sep=))
      })


 I think that would allow quite a bit more flexibility without
 the need to explicitly hack  panel.pairs
 (and having to maintain such a hack against the ever-enhancing
 lattice).

 Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Claudia Beleites


Dear all,

(this first part of the email I sent to John earlier today, but forgot to put it 
to the list as well)

Dear John,

 Hi, this is not R technical question per se. I know there are many excellent
 statisticians in this list, so here my questions: I have dataset with ~1800
 observations and 50 independent variables, so there are about 35 samples per
 variable. Is it wise to build a stable multiple logistic model with 50
 independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with far less than 1800 samples and 
far more variates (e.g. 75 patients / 256 spectral measurement channels). Though 
I have many measurements per sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really _independent_ test data and 
measure the stability according to what your stability needs are (e.g. stable 
parameters or stable predictions?).




(From here on reply to Joris)

 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
No doubt.

 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow substantially slower, if the model 
has appropriate restrictions. I remember the recommendation of at least 5 
samples per class and variate for linear classification models. I.e. not to get 
a good model, but to have a reasonable chance of getting a stable model.


 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model,
Am I wrong thinking that there may be a substantial difference between stability 
of predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples with physical-chemical 
explanation that yield the same prediction with different parameters (of course 
because of correlation).


 as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on the meaning of the variables. 
E.g. for the spectra a set of model parameters may be interpreted like spectra 
or difference spectra. Of course this has to do with the fact, that a parallel 
coordinate plot is the more natural view of spectra compared to a point in so 
many dimensions.


 and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know beforehand to be without information are 
already discarded.
The dimensionality is then further reduced in a data-driven way (e.g. by PCA or 
PLS). The model is built in the reduced space.


How much less samples are actually needed, considering the fact that the 
dimension reduction is a model estimated on the data?
...which of course also means that the honest validation embraces the 
data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain the 10^xy samples required for 
comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to ignore rows missing arguments of a function when creating a function?

Ah, I overlooked that possibility.

You can do following :

not - attr(fm$model,na.action)
if( ! is.null(not)){   # only drop the NA values if there are any left
out of the model
   cluster - cluster[-not]
   dat - dat[-not,]
}
with(dat,{


On Mon, Jun 14, 2010 at 4:30 PM, edmund jones edmund.j.jo...@gmail.com wrote:
 Thanks a lot!

 So, what you propose works perfectly (the cluster variable is indeed a
 vector), except in the case where I have no missing values in my regression.
 With the following function:

 cl2 - function(dat,fm, cluster){
 attach(dat, warn.conflicts = F)
 require(sandwich)
 require(lmtest)
 not - attr(fm$model,na.action)
 cluster - cluster[-not]
 with( dat[-not,] ,{
 M - length(unique(cluster))
 N - length(cluster)
 K - fm$rank
 dfc - (M/(M-1))*((N-1)/(N-K))
 uj - apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
 vcovCL - dfc*sandwich(fm, meat=crossprod(uj)/N)
 coeftest(fm, vcovCL)
 }
 )
 }

 If I have no missing values in the arguments of fm, get the message
 error in -not: invalid argument to unary operator

 In your example:


 x - rnorm(100)
 y - c(rnorm(90),NA,rnorm(9))
 test - lm(x~y)
 str(test)
 List of 13
  ... (tons of information)
  $ model        :'data.frame':  99 obs. of  2 variables:
  ... (more tons of information)
  ..- attr(*, na.action)=Class 'omit'  Named int 91
  .. .. ..- attr(*, names)= chr 91
  - attr(*, class)= chr lm

 Now we know that we can do :
 not -attr(test$model,na.action)
 y[-not]

 If I have, for example, y - rnorm(100), I also get the error:
 error in -not: invalid argument to unary operator


 In my database:

  female    income transport lunch dist reg_f
 4900  0 18.405990 0 0   75   750
 4901  0    NA 0    NA   75   753
 4902  1    NA 0 1   75   752
 4903  1    NA 0 1   75   751
 4904  1 69.678340 1 0   74   740
 4905  0 57.953230 1 0   73   730
 4906  1 85.835130 0 1   68   680
 4907  0 81.952980 0 0   75   750
 4908  1 46.837490 1 0   74   740
 4909  0    NA 1 0    5    52
 4910  1 65.041360 0 1   75   750
 4911  0 77.451870 1 0   75   750
 4912  0 96.148590 1 0   75   750
 4913  0 64.510410 0 0   74   740
 4914  0 69.391230 0 0   75   750
 4915  0  4.804243 0 1   65   650
 4916  0    NA 0 0   75   751
 4917  1    NA 0 0   75   751
 4918  1    NA 0 1   40   401
 4919  1 49.920750 0 1   76   760
 4920  0    NA 0 1   76   763
 4921  0 10.187910 0 0   77   770
 4922  0 14.839710 0 1   77   770
 4923  1 32.041000 0 0   77   770
 4924  0 85.639440 0 0   77   770
 4925  1 86.308410 0 0   68   680
 4926  0 79.223910 0 0    7    70
 4927  0 81.825800 0 0   78   780
 4928  0 31.931000 0 1   37   370
 4929  0 53.282310 0 1   41   410
 4930  1 31.312910 1 1   25   250
 4931  1 50.478870 0 1   25   250
 4932  0    NA 0 0   66   662
 4933  1 58.156940 0 1   31   310
 4934  0    NA 0 1    1    13
 4935  1    NA 0 1    1    12
 4936  1 59.149180 0 0    3    30
 4937  1  5.400807 0 1    5    50
 4938  1 76.828630 0 0    6    60
 4939  1 73.488300 0 1   63   630
 4940  0  6.529074 0 1    6    60
 4941  0    NA 0 0    6    61
 4942  1 70.128530 0 0    3    30
 4943  0    NA 0 1   53   531
 4944  1 75.715350 1 0    6    60
 4945  0  8.623850 0 1   24   240
 4946  1 79.062470 0 1   62   620
 4947  0 83.863370 1 1   11   110
 4948  0 58.904450 0 1   62   620
 4949  0 88.500290 0 0    9    90
 4950  0    NA 1 1    9    90
 4951 NA    NA 0 1   15   151

 It works perfectly if I do
 cl2(testdata, lm(income ~ transport + reg_f ), female)

 but not for cl2(testdata, lm(dist ~ transport + reg_f ), female)
 Or any other case about when the arguments of the lm function have no
 missing values.

 How can I tell R to do this only if there is a missing problem?

 Thanks a lot for your help! Working on your previous replies has been very
 helpful in understanding how R works.

 Cheers,
 Edmund.


 2010/6/13 Joris Meys jorism...@gmail.com

  Next to that, Off course you have to use the right  indices for
 cluster, but as I have no clue what the dist is, I just put there
 something. So if it is a matrix, naturally you'll get errors. If I
 make a vector cluster with the same length as the

[R] using R to draw over a distribution.

2010-06-14 Thread SHANE MILLER, BLOOMBERG/ 731 LEXIN

Hi,

Suppose I analyze a log to create a histogram:

event E1 occurred N1 times
event E2 occurred N2 times
  ...
  ... for m total events
  ...
event Em occurred Nm times

The total number of occurrences is: T = SumNj
j=1..m

I want to give this histogram to R and ask it to produce T random events such 
that approximately N1 instances of E1 are drawn, N2 instances of E2 drawn, and 
so forth.

Regards,
Shane
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using R to draw over a distribution.

2010-06-14 Thread David Winsemius



On Jun 14, 2010, at 10:42 AM, SHANE MILLER, BLOOMBERG/ 731 LEXIN wrote:


Hi,

Suppose I analyze a log to create a histogram:

event E1 occurred N1 times
event E2 occurred N2 times
 ...
 ... for m total events
 ...
event Em occurred Nm times

The total number of occurrences is: T = SumNj
   j=1..m

I want to give this histogram to R and ask it to produce T random  
events such that approximately N1 instances of E1 are drawn, N2  
instances of E2 drawn, and so forth.


?table  # or perhaps use the fact that hist() will return a table of a  
particular type.

?sample  # from the events with prob = frequencies/T

(If you insist on constraining the total count to be the observed sum  
it will no longer be random with m degrees of freedom. And there are  
many postings on how to do this in the archives.)


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Robert A LaBudde

I think the real issue is why the fit is being 
done. If it is solely to interpolate and condense 
the dataset, the number of variables is not an important issue.


If the issue is developing a model that will 
capture causality, it is hard to believe that can 
be accomplished with 50+ variables. With this 
many, some kind of hunt would have to be done, 
and the resulting model would not be real stable. 
It would be better perhaps to first reduce the 
variable set by, say, principal components 
analysis, so that a reasonable sized set results.


If a stable and meaningful model is the goal, 
each term in the final model should be plausibly causal.


At 10:36 AM 6/14/2010, Claudia Beleites wrote:

Dear all,

(this first part of the email I sent to John 
earlier today, but forgot to put it to the list as well)

Dear John,

 Hi, this is not R technical question per se. 
I know there are many excellent

 statisticians in this list, so here my questions: I have dataset with ~1800
 observations and 50 independent variables, so 
there are about 35 samples per

 variable. Is it wise to build a stable multiple logistic model with 50
 independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with 
far less than 1800 samples and far more variates 
(e.g. 75 patients / 256 spectral measurement 
channels). Though I have many measurements per 
sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really 
_independent_ test data and measure the 
stability according to what your stability needs 
are (e.g. stable parameters or stable predictions?).




(From here on reply to Joris)

 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
No doubt.

 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow 
substantially slower, if the model has 
appropriate restrictions. I remember the 
recommendation of at least 5 samples per class 
and variate for linear classification models. 
I.e. not to get a good model, but to have a 
reasonable chance of getting a stable model.


 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model,
Am I wrong thinking that there may be a 
substantial difference between stability of 
predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples 
with physical-chemical explanation that yield 
the same prediction with different parameters 
(of course because of correlation).


 as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on 
the meaning of the variables. E.g. for the 
spectra a set of model parameters may be 
interpreted like spectra or difference spectra. 
Of course this has to do with the fact, that a 
parallel coordinate plot is the more natural 
view of spectra compared to a point in so many dimensions.


 and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know 
beforehand to be without information are already discarded.
The dimensionality is then further reduced in a 
data-driven way (e.g. by PCA or PLS). The model is built in the reduced space.


How much less samples are actually needed, 
considering the fact that the dimension 
reduction is a model estimated on the data?
...which of course also means that the honest 
validation embraces the data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain 
the 10^xy samples required for comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università

Re: [R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread Gavin Simpson

On Mon, 2010-06-14 at 07:16 -0700, Red Roo wrote:
 Looking for a recommended package that handles prime number computations.
 
 Tried the following unsuccessfully:
 primeFactors() in the R.basic package failed to install.
 
 primes() and primlist are broken in Schoolmath pkg on CRAN.
 My analysis can be found here http://j.mp/9BNI9q
 Not sure what the procedure is for getting things fixed, so I've
 cross-posted to r-dev as well.

Neither place is correct. This has *nothing* to do with R. Address bug
reports directly to the package maintainer, details of which can be
found here:

http://cran.r-project.org/web/packages/schoolmath/index.html

which is what is requested in the posting guide...

HTH

G

 --njg
 
 TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/
 Follow me on Twitter http://twitter.com/DrQz
 PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Marc Schwartz

Joris,

There are two separate issues here:

1. Can you consider an LR model with 50 covariates?

2. Should you have 50 covariates in your LR model?


The answer to 1 is certainly yes, given what I noted below as a general working 
framework. I have personally been involved with the development and validation 
of LR models with ~35 covariates, albeit with notably larger datasets than 
discussed below, because the models are used for prediction. In fact, the 
current incarnations of those same models, now 15 years later, appear to have 
40 covariates and are quite stable. The interpretation of the models by both 
statisticians and clinicians is relatively straightforward.

The answer to 2 gets into the subject matter that you raise, which is to 
consider other factors beyond the initial rules of thumb for minimum sample 
size. These get into reasonable data reduction methods, the consideration of 
collinearity, subject matter expertise, sparse data, etc.

The issues raised in number 2 are discussed in the two references that I noted.

Two additional references that might be helpful here on the first point are:

P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein. A 
simulation study of the number of events per variable in logistic regression 
analysis. J Clin Epi, 49:1373–1379, 1996. 

E. Vittinghoff and C. E. McCulloch. Relaxing the rule of ten events per 
variable in logistic and Cox regression. Am J Epi, 165:710–718, 2006.


Regards,

Marc

On Jun 14, 2010, at 8:38 AM, Joris Meys wrote:

 Hi,
 
 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
 
 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.
 
 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model, as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible, and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.
 
 Cheers
 Joris
 
 On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Jun 13, 2010, at 10:20 PM, array chip wrote:
 
 Hi, this is not R technical question per se. I know there are many 
 excellent statisticians in this list, so here my questions: I have dataset 
 with ~1800 observations and 50 independent variables, so there are about 35 
 samples per variable. Is it wise to build a stable multiple logistic model 
 with 50 independent variables? Any problem with this approach? Thanks
 
 John
 
 
 The general rule of thumb is to have 10-20 'events' per covariate degree of 
 freedom. Frank has suggested that in some cases that number should be as 
 high as 25.
 
 The number of events is the smaller of the two possible outcomes for your 
 binary dependent variable.
 
 Covariate degrees of freedom refers to the number of columns in the model 
 matrix. Continuous variables are 1, binary factors are 1, K-level factors 
 are K - 1.
 
 So if out of your 1800 records, you have at least 500 to 1000 events, 
 depending upon how many of your 50 variables are K-level factors and whether 
 or not you need to consider interactions, you may be OK. Better if towards 
 the high end of that range, especially if the model is for prediction versus 
 explanation.
 
 Two excellent references would be Frank's book:
 
  
 http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/
 
 and Steyerberg's book:
 
  
 http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/
 
 to assist in providing guidance for model building/validation techniques.
 
 HTH,
 
 Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] do faster ANOVAS

2010-06-14 Thread melissa

Thank you, with the matrix for the responses (here my 101 timepoints), it takes 
less than 30 minutes for 1000 pemutations, whereas before it takes 2h30!

Best regards,

MÃ©lissa





 Message du 10/06/10 18:52
 De : Douglas Bates 
 A : melissa 
 Copie Ã  : r-help@r-project.org
 Objet : Re: [R] do faster ANOVAS
 
 
 The lm and aov functions can take a matrix response allowing you to
 fit all of the responses for a single attribute simultaneously.
 
 
 On Thu, Jun 10, 2010 at 8:47 AM, melissa wrote:
  Dear all R users,
  I want to realize 800 000 ANOVAS and to store Sum of Squares of the 
  effects. Here is an extract of my table data
  Product attribute subject rep t1 t2 t3 â¦ t101
  P1 A1 S1 R1 1 0 0 â¦ 1
  I want to realize 1 ANOVA per timepoint and per attribute, there are 101 
  timepoints and 8 attributes so I want to realize 808 ANOVAS. This will be 
  an ANOVA with two factors :
  Here is one example:
  Aov(t1~Subject*Product,data[data$attribute==âA1â,])
  I want to store for each ANOVA SSprod,SSsujet,SSerreur,SSinter and SStotal.
  In fact I want the result in several matrices:
  Ssprod matrice:
  T1 t2 t3 t4 â¦ t101
  A1 ssprod(A1,T1)
  A2
  A3
  â¦
  A8
  So I would like a matrice like that for ssprod, ssujet,sserreur,ssinter and 
  sstotal.
  And this is for one permutation, and I want to do 1000 permutations
  Here is my code:
  SSmatrixglobal-function(k){
 
  daten.temp-data
  daten.temp$product=permutations[[k]]
  listmat-apply(daten.temp[,5:105],2,function(x,y){
  tab2-as.data.frame(cbind(x,y))
  tab.class-by(tab2[,1:3],tab2[,4],function(x){
  f - formula(paste(names(x)[1],~,names(x)[2],*,names(x)[3],sep=))
  anovas - aov(f, data=x)
  anovas$call$formula -f
  s1 - summary(anovas)
  qa - s1[[1]][,2]
  return(qa)
  })
  return(tab.class)
  },y=daten.temp[,1:3]
  )
  ar - 
  array(unlist(listmat),dim=c(length(listmat[[1]][[1]]),length(listmat[[1]]),length(listmat)))
  l=lapply(1:4,function(i) ar[i,,])
  sssujet=l[[1]]
  ssprod=l[[2]]
  ssinter=l[[3]]
  sserreur=l[[4]]
  ss=rbind(sssujet,ssprod,ssinter,sserreur,sstotal)
  ss=as.data.frame(ss)
  sqlSave(channel,ss,SS1000,append=T)
  rm(ss,numperm,daten.temp)
  }
 
  system.time(por - lapply(c(1:1000), SSmatrixglobal))
 
  But it takes time about 90seconds for a permutation so *1000, how can I do 
  in order to do faster ANOVAS?
 
  Many thanks
  Best regards
  MÃ©lissa
 
  PS: I think that I can gain a lot of time in the aov function but I don't 
  know how to do
  [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xtable with Sweave

2010-06-14 Thread Silvano


Hi,

I'm using Sweave to prepare a descriptive report.
Are at least 20 tables built with xtable command of kind:

echo=F, results=hide=
q5 = factor(Q5, label=c(Não, Sim))
(q5.tab = cbind(table(q5)))
@

echo=F, results=tex=
xtable(q5.tab, align=l|c, caption.placement = top, 
table.placement='H')

@

I'm getting the following message:

Too many unprocessed floats

in Latex file.

How to avoid these messages appearing?

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lme command

2010-06-14 Thread Enrico Colosimo

Hi,

I am doing a longitudinal data set fit using lme.
I used two forms of the lme command and I am
getting two different outputs.

FIRST
out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random=(list(ident=~Idade+Idade2)))

SECOND
out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random= ~Idade+Idade2|ident,data=dados)

I got weird results from the first one and could not understand the
reason of it. All the results are
exactly the same but  the intercetp, and the two main terms sexo
(gender) and status (treatment).
That differences made a lot of difference in the final results.

Anybody can tell me what is the differences  between them?
Thanks.

Enrico.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R in Linux: problem with special characters

2010-06-14 Thread daniel fernandes


Hi,

 

First of all, thank you for you reply. It was very helpfull.

 

I have another problem: I have changed the locale to pt_pt.iso885...@euro. Now 
the problem that I reported earlier doesnt occur .

 

print(dúvida)
[1] dúvida

 

My system information now is the following:

 

Sys.getlocale()
[1] 
lc_ctype=pt_pt.iso885...@euro;LC_NUMERIC=C;lc_time=pt_pt.iso885...@euro;lc_collate=pt_pt.iso885...@euro;
LC_MONETARY=C;lc_messages=pt_pt.iso885...@euro;lc_paper=pt_pt.iso885...@euro;LC_NAME=C;LC_ADDRESS=C;
LC_TELEPHONE=C;lc_measurement=pt_pt.iso885...@euro;LC_IDENTIFICATION=C
 

 

[daniel.fernan...@pt-lnx13 ~]$ locale


 

lang=pt_pt.iso885...@euro
lc_ctype=pt_pt.iso885...@euro
lc_numeric=pt_pt.iso885...@euro
lc_time=pt_pt.iso885...@euro
lc_collate=pt_pt.iso885...@euro
lc_monetary=pt_pt.iso885...@euro
lc_messages=pt_pt.iso885...@euro
lc_paper=pt_pt.iso885...@euro
lc_name=pt_pt.iso885...@euro
lc_address=pt_pt.iso885...@euro
lc_telephone=pt_pt.iso885...@euro
lc_measurement=pt_pt.iso885...@euro
lc_identification=pt_pt.iso885...@euro
LC_ALL=
 
However, if I utilise the tcltk package and try to build a frame with the label 
dúvida the string appears incorrect. My guess is that this problem is also 
related with the locales. Does Tcl/tk utilise a different locale?

 

 

Thanks in advance,

 

Daniel

 


 
 Date: Fri, 11 Jun 2010 17:54:34 -0400
 From: murdoch.dun...@gmail.com
 To: danielpas...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] R in Linux: problem with special characters
 
 daniel fernandes wrote:
 
 
 
  Hi,
  
  Im working with the 64 bit version of R 2.11.0 for Linux. My session info 
  is:
  
  R version 2.11.0 (2010-04-22)
  x86_64-redhat-linux-gnu
  
  locale:
  [1] C
  
  attached base packages:
  [1] stats graphics grDevices utils datasets methods base 
  
  
  When I try to print words with special characters the result is that the 
  expression printed has some kind of code substituting the special 
  character. For example, if I run print(dúvida) the result is:
  
   print(dúvida)
  [1] d\372vida
  
  This as problem has something to do with the locale settings? If I run the 
  locale command in the Linux server, I get:
  
 
 Yes, it's your locale settings. The C locale doesn't support the ú 
 character in your string, and displays it in octal.
 
 Duncan Murdoch
  
  [daniel.fernan...@pt-lnx13 ~]$ locale
  LANG=pt_PT.UTF-8
  LC_CTYPE=C
  LC_NUMERIC=C
  LC_TIME=C
  LC_COLLATE=C
  LC_MONETARY=C
  LC_MESSAGES=C
  LC_PAPER=C
  LC_NAME=C
  LC_ADDRESS=C
  LC_TELEPHONE=C
  LC_MEASUREMENT=C
  LC_IDENTIFICATION=C
  LC_ALL=C
  
  Thanks in advance for your help,
  
  Daniel
  
 
 
  TRANSFORME SUAS FOTOS EM EMOTICONS PARA O MESSENGER. CLIQUE AQUI E VEJA 
  COMO. 
  _
  VEJA SEUS EMAILS ONDE QUER QUE VOCÊ ESTEJA, ACESSE O HOTMAIL PELO SEU 
  CELULAR AGORA.
 
  =Live_Hotmailutm_medium=Taglineutm_content=VEJASEUSEM84utm_campaign=MobileServices
  [[alternative HTML version deleted]]
 
  
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
  
_
ACESSE O MESSENGER DO SEU CELULAR AGORA MESMO. CLIQUE E VEJA AQUI UM PASSO A 
PASSO.

rce=Live_Hotmailutm_medium=Taglineutm_content=ACESSEOMES83utm_campaign=MobileServices
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] MICE Package and LMER

2010-06-14 Thread KDT


Hi R users,

I am estimating a multilevel model using lmer.  My dataset has missing
values and I am using MICE package to make Multiple Imputations. 
Everything works good until i reach the POOLING stage using the pool()
function. I am able to get a summary of the pooled fixed effects but not the
random effects. No errors or warnings are given to me.
I checked the help file in R and the developers of MICE noted that The
function pools also estimates obtained with lme() and lmer(), BUT only the
fixed part of the model.

Does anyone have any ideas on how I can get a summary of pooled random
effects? 

Below is my code
imp-mice(mydata,m=3, imputationMethod =c(,,,logreg),maxit=2, pri=F) 
model - with(data=imp, lmer(miss~ sex + age + (1|id) + (1|sch),
family=binomial(link=logit)))
result-pool(model)
summary(result)

Thanks
Trevor
-- 
View this message in context: 
http://r.789695.n4.nabble.com/MICE-Package-and-LMER-tp2254504p2254504.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] recursive function

2010-06-14 Thread n.via...@libero.it


Dear list,
I have the following problem, what i'm trying to do is to built a function 
which does the following calculationg in a recursive way:


I have a data frame more or less like this:

variableyear DELTA

EC01 2006/
EC01 2007   10
EC01 20085
EC01 20099


And then I have at time 2009  a variable called R_EC01(2009)=5
What I have to do is to construct the R_EC01 time series by starting from the 
2009 value:
R_EC01(2008)=R_EC01(2009)-DELTA(2009)
R_EC01(2007)=R_EC01(2008)-DELTA(2008)
R_EC01(2006)=R_EC01(2007)-DELTA(2007)


In terms of number, the results that i should get are:
R_EC01(2008)=5-9=-4

R_EC01(2007)=-4-5=-9
R_EC01(2006)=-9-10=-19
so my data frame should looks like this
SERIESYEAR value

R_EC01   2006  -19

R_EC012007   -9

R_EC012008   -4

R_EC01 2009   5
Anyone Knows hot to do it??
My dataframe is not set as a time series...


Thanks a lot!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lme command

2010-06-14 Thread Joshua Wiley

Hello Enrico,

One thing I notice between your two calls is that in the second you
specify data=dados, but you do not in the first. When I try to do
something similar to your formulae using one of my longitudinal
datasets, I get the same results whether or not I put the formula for
random in a list. Perhaps you could provide some sample data that
shows what is happening?

One other comment, rather than specifically adding each term and
interaction, you can use this shorthand:

Altura~(Idade+Idade2)*(sexo+status)

Best regards,

Josh

On Mon, Jun 14, 2010 at 6:37 AM, Enrico Colosimo enrico...@gmail.com wrote:
Hi,

I am doing a longitudinal data set fit using lme.
I used two forms of the lme command and I am
getting two different outputs.

FIRST
out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random=(list(ident=~Idade+Idade2)))

SECOND
out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random= ~Idade+Idade2|ident,data=dados)

I got weird results from the first one and could not understand the
reason of it. All the results are
exactly the same but the intercetp, and the two main terms sexo
(gender) and status (treatment).
That differences made a lot of difference in the final results.

Anybody can tell me what is the differences between them?
Thanks.

Enrico.

--
Joshua Wiley
Ph.D. Student
Health Psychology
University of California, Los Angeles

Re: [R] Prime Numbers Pkgs

2010-06-14 Thread Karl-Dieter Crisman

 Looking for a recommended package that handles prime number computations.

I'm not sure whether this would be helpful to you, but Sage
(http://www.sagemath.org) has excellent number theory support and
several ways to interface with R (which is included in the
distribution of Sage).  I use it myself to access several R packages I
need to do analysis on data I produce in Sage/Python, and then send
the data back to Sage for further processing.

Of course, this is not an R package, so depending on how much you need
such things from within R itself it may or may not help you out.

HTH,
Karl-Dieter Crisman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unqiue problem

Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 Hello everybody,

 I have a a matrix of 2 columns and over 27k rows.
 some of the rows are double , so I tried to remove them with the command
 unique():

 Workbook5 - read.delim(file =  Workbook5.txt)
 dim(Workbook5)
 [1] 27748     2
 Workbook5 - unique(Workbook5)
 dim(Workbook5)
 [1] 20101     2

 it removed a lot of line, but unfortunately not all of them. I wanted to add
 the row names to the matrix and got this error message:
 rownames(Workbook5) - Workbook5[,1]
 Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,  :
  duplicate 'row.names' are not allowed
 In addition: Warning message:
 non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô,
 ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô,
 ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô,
 ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô,
 ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô,
 ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô,
 ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô,
 ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô,
 ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô,
 ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô,
 ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô,
 ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô,
 ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô,
 ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô,
 ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô,
 ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô,
 ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

 Is there a better way to discard the duplicataions in the text file (Excel
 file is the origin).

 R.version
               _
 platform       x86_64-apple-darwin9.8.0
 arch           x86_64
 os             darwin9.8.0
 system         x86_64, darwin9.8.0
 status         Patched
 major          2
 minor          11.1
 year           2010
 month          06
 day            03
 svn rev        52201
 language       R
 version.string R version 2.11.1 Patched (2010-06-03 r52201)

 THX

 Assa

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Charles C. Berry


On Mon, 14 Jun 2010, Joris Meys wrote:


Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out the curse of
dimensionality(Hughes effect) which starts to play rather quickly.



Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:


set.seed(54321)
dat - as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y - rbinom(1800,1,plogis(rowSums(dat)/7))
fit - glm(y~., dat, family=binomial)
1/7 # the true coef

[1] 0.1428571

sd(coef(fit)) # roughly, the common standard error

[1] 0.05605597

colMeans(coef(summary(fit))) # what glm() got

  Estimate Std. Errorz value   Pr(|z|)
0.14590836 0.05380666 2.71067328 0.06354820

# a trickier case:
set.seed(54321)
dat - as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y - rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
fit - glm(y~., dat, family=binomial)
colMeans(coef(summary(fit)))

   Estimate  Std. Error z valuePr(|z|)
0.982944012 0.119063631 8.222138491 0.008458002




Finer examination of the latter fit will show some values that differ too 
far from 1.0 to agree with the asymptotic std err.



sd(coef(fit)) # rather bigger than 0.119

[1] 0.1827462

range(coef(fit))

[1] -0.08128586  1.25797057

And near separability may be playing here:


cbind(

+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
 [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183




Recall that the observed information contains a factor of

plogis( predict(fit) )* plogis( -predict(fit))


hist(plogis( predict(fit) )* plogis( -predict(fit)))


So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on 
what the setup is.


I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with 
correlated covariates and/or one or more highly influential covariates) 
and that the OPs case is not one of them.



HTH,

Chuck


The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote:

On Jun 13, 2010, at 10:20 PM, array chip wrote:


Hi, this is not R technical question per se. I know there are many excellent 
statisticians in this list, so here my questions: I have dataset with ~1800 
observations and 50 independent variables, so there are about 35 samples per 
variable. Is it wise to build a stable multiple logistic model with 50 
independent variables? Any problem with this approach? Thanks

John



The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25.

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

?http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

?http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

Re: [R] points marking

2010-06-14 Thread Greg Snow

I don't think that I would use a barplot as the base, but rather just set up 
the graph and add the lines where I wanted them.  I still don't understand what 
you want your graph to look like, or what question you are trying to answer 
with it (part may be a language barrier).  If you can give us a better example 
of what you are trying to accomplish, or a better description of what your data 
is like and what you are trying to get from the graph, we will have a better 
chance of being able to help you.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: khush  [mailto:bioinfo.kh...@gmail.com]
Sent: Saturday, June 12, 2010 5:38 AM
To: Greg Snow
Cc: r-help@r-project.org; Petr PIKAL
Subject: Re: [R] points marking

Hi,

Well Thanks for letting me know that pch is of no use with segments petr. I am 
using lend as it suits to me more as gregory suggested , but I am not getting 
imite???  think I try to fix it with some other method also, as I have to deal 
more with the symbols in this case, But I want to the know one thing from you 
guys that the way I am using the code is good enough to start, as I am not much 
familiar with this suff or its dirty way to handle such task. please let me 
know.

Thanks gregory and petr.

Thank you
Jeet


On Fri, Jun 11, 2010 at 9:07 PM, Greg Snow 
greg.s...@imail.orgmailto:greg.s...@imail.org wrote:
Those graphs look like chromosome maps, if so, you may want to look into the 
bioconductor project, they may have some prewritten functions to do this.  If 
not, the lend argument (see ?par) may be something to look at.  If you really 
want points and segments you will need to plot the points with the points 
function and the segments separately.  Segments can take vectors, so you don't 
need to separate things into multiple calls.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.orgmailto:greg.s...@imail.org
801.408.8111

From: khush  
[mailto:bioinfo.kh...@gmail.commailto:bioinfo.kh...@gmail.com]
Sent: Friday, June 11, 2010 12:00 AM
To: Greg Snow
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] points marking

Dear Gregory ,

Thnaks for your reply and help. I am explaining you my problems again, below  
is my script for the same .

Dom -c (195,568,559)

fkbp - barplot (Dom, col=black, xlab=, border = NA, space = 7, 
xlim=c(0,650), ylim =c(0, 87), las = 2, horiz = TRUE)

axis (1, at = seq(0,600,10), las =2)

1. ==Segments 1=

segments(164,7.8,192,7.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(45,15.8,138,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(160,15.8,255,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(277,15.8,378,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(51,23.8,145,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(167,23.8,262,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20)
segments(284,23.8,381,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20)

2. ==Segments 2 ==
segments(399,15.8,432,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(448,15.8,475,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(486,15.8,515,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(401,23.8,434,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(450,23.8,475,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)
segments(486,23.8,517,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20)

I solved one part of my query i.e to mark points from one positions to other is 
ok and I found that its working fine but I have another issue now, as I am 
using using two segments data 1 and 2 , although I want to draw different 
shapes for segmants 2 as I am giving pch=21, but I it seems to give a solid 
line for both. I want to draw different shapes for every chunk of segments i.e 
is the whole point.

I want to make script which can generate such figures, below is link to one of 
the tool.
http://www.expasy.ch/tools/mydomains/

Thank you

Jeet
On Thu, Jun 10, 2010 at 11:10 PM, Greg Snow 
greg.s...@imail.orgmailto:greg.s...@imail.org wrote:
Your question is not really clear, do either of these examples do what you want?

 with(anscombe, plot(x1, y2, ylim=range(y2,y3)) )
 with(anscombe, points(x1, y3, col='blue', pch=2) )
 with(anscombe, segments(x1, y2, x1, y3, col=ifelse( y2y3, 'green','red') ) )


 with(anscombe, plot(x1, y2, ylim=range(y2,y3), type='n') )
 with(anscombe[order(anscombe$x1),], polygon( c( x1,rev(x1) ), c(y2, rev(y3)), 
col='grey' ) )



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.orgmailto:greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-mailto:r-help-boun...@r-

[R] Revolutions Blog: May Roundup

2010-06-14 Thread David Smith

I write about R every weekday at the Revolutions blog:
 http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month
of particular interest to readers of r-help.

http://bit.ly/dn7DgR linked to 13 videos for learning R, from the
basics (What is R?) to more advanced topics.

http://bit.ly/cJUhiY noted the release of R 2.11.1.

http://bit.ly/d53tvn announced that Revolution Analytics makes its
software available free of charge to the academic community.

http://bit.ly/9xCQ83 noted how statistical inference was used to
accurately estimate German tank forces in WWII, and linked to an R
simulation to verify the calculation.

http://bit.ly/b5puKD announced a new website for the R community,
www.inside-R.org, sponsored by Revolution Analytics.

http://bit.ly/9r85bd linked to a video of economist JD Long explaining
why he uses R.

http://bit.ly/cySMgE linked to Jeromy Anglim's explanation of the
abbreviated names of 150 R functions.

http://bit.ly/cEceU8 announced a webinar I gave, Introduction to
Revolution R. The live event has passed now, but you can download
slides and watch a replay at this link.

http://bit.ly/brs2s2 recapped some recent news articles mentioning R
and Revolution, in Forbes, The Register, PC World and elsewhere.

http://bit.ly/9UDgOL linked to an analysis in R on predicting the
outcome of a series of baseball games.

http://bit.ly/bJYW9v linked to some materials from the CloudAsia
conference on parallel computing in R for life sciences.

http://bit.ly/dfb4PA provided a tip on keeping the console window
active in the R Productivity Environment GUI.

http://bit.ly/aeHO7B announced a webinar on integrating R-based graphs
and computations with business intelligence dashboards. (The live
event has passed, but you can download slides and a replay at this
link.)

http://bit.ly/dqr1hc linked to another method of mapping your Twitter
social network with R.

http://bit.ly/bVI6e9 linked to a video by JD Long on using the Amazon
Elastic Map-Reduce system to run large-scale map-reduce calculations
in the cloud with R.

http://bit.ly/auZ7N8 announced Revolution Analytic's development
roadmap for 2010.

There are new R user groups in Boston (http://bit.ly/cHedm0) and
Atlanta (http://bit.ly/aVo1cI). Also, http://bit.ly/a82GAf noted the
list of local R User Groups worldwide at MeetUp.com, and how you can
request a new group in your area.

Other non-R-related stories in the past month included Google's new
BigQuery and Prediction API tools (http://bit.ly/bfEeLm), the effects
of volcanic ash on a modern city (http://bit.ly/9qWfQf) and (on a
lighter note) the dating equation (http://bit.ly/9LR28N) and a neat,
practical optical illusion (http://bit.ly/dsmyov).

The R Community Calendar has also been updated at:
http://blog.revolutionanalytics.com/calendar.html

If you're looking for more articles about R, you can find summaries
from previous months at http://blog.revolutionanalytics.com/roundups/.
Join the REvolution mailing list at
http://revolutionanalytics.com/newsletter to be alerted to new
articles on a monthly basis.

As always, thanks for the comments and please keep sending suggestions
to me at da...@revolutionanalytics.com . Don't forget you can also
follow the blog using an RSS reader like Google Reader, or by
following me on Twitter (I'm @revodavid).

Cheers,
# David

--
David M Smith da...@revolutionanalytics.com
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Html help

2010-06-14 Thread Duncan Murdoch


Murray Jorgensen wrote:

I have just installed R 2.11.1 on my XP laptop.

I like html help for browsing but text help for on-the-fly look-ups. I 
was a bit surprised when I was asked to choose between them during the 
installation. I chose text, thinking I could fix the html help later, 
which is what I am trying to do now.


Now when I ask for html help my browser goes to

'http://-ip-number-/doc/html/index.html'

instead of where I want on my computer:

C:\apps\R\R-2.11.1\doc\html\index.html

Now I can go where I want manually but then the package list on


C:\apps\R\R-2.11.1\doc\html\packages.html

does not include all the packages that I have installed and linked. I 
don't want to read my html help from the web because sometimes I am 
off-line or on a slow connection.


How do I go about getting a local set of html help files?

  


Since 2.10.0, HTML help is generated on demand.  It doesn't go off your 
local computer, it works locally.  This saves a bit of space (the HTML 
is generated from the same source as the text is generated from), but 
the main point is that it allows help pages to contain dynamic content.  
For example, Romain Francois posted some demo code a while ago to allow 
the display of graphics generated by R within help pages.  
(Unfortunately it depended on a particular browser feature not supported 
by Internet Explorer, so I'm going to need to put together something 
less elegant, but that's life.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xtable with Sweave

2010-06-14 Thread Marc Schwartz

On Jun 14, 2010, at 8:09 AM, Silvano wrote:

 Hi,
 
 I'm using Sweave to prepare a descriptive report.
 Are at least 20 tables built with xtable command of kind:
 
 echo=F, results=hide=
 q5 = factor(Q5, label=c(Não, Sim))
 (q5.tab = cbind(table(q5)))
 @
 
 echo=F, results=tex=
 xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H')
 @
 
 I'm getting the following message:
 
 Too many unprocessed floats
 
 in Latex file.
 
 How to avoid these messages appearing?


Hi,

That is an error message from 'latex' indicating that you may have too many 
float tables without sufficient separation (eg. new pages, text in between, 
etc.) and/or conflicts in table placement.

You might have a look at the relevant TeX FAQ here:

  http://www.tex.ac.uk/cgi-bin/texfaq2html?label=tmupfl

Also, while I don't use xtable(), I do believe that you have to call it using 
the print method directly to specify non-default arguments:

  print(xtable(q5.tab, align = l|c), caption.placement = top, 
table.placement = 'H')

See the help pages for ?xtable and ?print.table, including the last examples in 
the former.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] script development for Unconditional Density and Probability estimation

2010-06-14 Thread Steve_Friedman


Hello,

I'd like to automate this script a bit more and cycle several
parameters(both the species and the metric).  For example where AnnualDepth
occurs, I need to process about 12 metrics so instead of writing this
entire script 12 times once for each metric  I'd like to be able to
automatically get another metric.

Any suggestion will be greatly appreciated.

Currently running Windows XP, R 2.11.1

###

Marsh - cbind(SoilVegHydro, vegcode)
AnnualDepth - Marsh[,'meanAnnualDepthAve']

cattail_0 - Marsh[,'cattail'] == '0'  # no   need to run for 8
species, automate if possible
 cattail_1 - Marsh[,'cattail'] == '1' # yes need to run for 8
species

spbase.rate.d1 - sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) )
annualDepth.density - density(AnnualDepth)# this line needs either
interactively defined or automatically cycle thru a number of metrics

cattail.d0.density - density(AnnualDepth[cattail_0])
cattail.d1.density - density(AnnualDepth[cattail_1])

approxfun (cattail.d0.density$x, cattail.d0.density$y) - cattail.d0.f
approxfun (cattail.d1.density$x, cattail.d1.density$y) - cattail.d1.f

p.d.given.AnnualDepth - function(AnnualDepth, spbase.rate.d1)
 {
p1 - cattail.d1.f(AnnualDepth) * spbase.rate.d1
p0 - cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1)
   p1/(cattail_0+cattail_1)
}

x - 1:1292
y - p.d.given.AnnualDepth(x, spbase.rate.d1)
plot (x, y, type='l', col='red', xlab='Mean Annual Depth',
main=c(Cattail), ylab='estimated\nProbability(cattail|AnnualDepth)')

plot (cattail.d0.density, col ='red', lty= 1, main = )
  lines(cattail.d1.density, col = 'blue', lty=1)
  lines(annualDepth.density , col='black', lty = 1)
  legend(2000, 0.0023, c(No Cattail, Cattail, Mean Annual Depth),
  col=c(red, blue, black),lty=c(1))




#

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Count of unique factors within another factor

2010-06-14 Thread Birdnerd


Hi Dennis,

Thanks for this suggestion (which I got to run!), as this code makes
intuitive sense, whereas not all the other suggestions were that
straightforward. I'm relatively new to programming in R and am very
appreciative that you and others take time to help out where you can.

Sincerely,
Sarah



On Sun, Jun 13, 2010 at 8:47 PM, Dennis Murphy [via R] 
ml-node+2253845-1393472685-291...@n4.nabble.comml-node%2b2253845-1393472685-291...@n4.nabble.com
 wrote:

 Hi:

 Another possibility:

 as.data.frame(with(data[!duplicated(data), ], table(unit))
   unit Freq
 1  1233
 2  3454

 HTH,
 Dennis

 On Sun, Jun 13, 2010 at 9:07 AM, Birdnerd [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2253845i=0
 wrote:

 
  I have a data frame with two factors (sampling 'unit', 'species'). I want

  to
  calculate the number of unique 'species' per 'unit.' I can calculate the
  number of unique values for each variable separately, but can't get a
 count
  for each âunitâ.
 
   data=read.csv(C:/Desktop/sr_sort_practice.csv)
   attach(data)
 
   data[1:10,]
unit species
  1   123ACMA
  2   123LIDE
  3   123LIDE
  4   123SESE
  5   123SESE
  6   123SESE
  7   345HEAR
  8   345LOHI
  9   345QUAG
  10  345TODIâ¦..
 
   sr.unique- lapply (data, unique)
  $unit
  [1] 123 345 216
  $species
   [1] ACMA  LIDE  SESE  HEAR  LOHI  QUAG  TODI  UMCA  ARSP  LIDE
 
   sapply (sr.unique,length)
 unit species
   3  10
 
  Then, I get stuck here because this unique species count is not given for

  each âunitâ.
  What I'd like to get is:
 
  unit species
  1233
  3454
  216--
 
  Thanks--
 
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.htmlhttp://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html?by-user=tby-user=t
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email] 
  http://user/SendEmail.jtp?type=nodenode=2253845i=1mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 [[alternative HTML version deleted]]


 __
 [hidden email] http://user/SendEmail.jtp?type=nodenode=2253845i=2mailing 
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  View message @
 http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253845.html
 To unsubscribe from Count of unique factors within another factor, click
 here (link removed) .





-- 
Sarah E. Haas
haaszool...@gmail.com

Center for Applied Geographic Information Science (CAGIS)
Department of Geography and Earth Sciences
University of North Carolina at Charlotte
9201 University City Blvd.
Charlotte, NC 28223, USA
http://www.gis.uncc.edu/

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2254591.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xtable with Sweave

2010-06-14 Thread James W. MacDonald


Hi Silvano,

Silvano wrote:

Hi,

I'm using Sweave to prepare a descriptive report.
Are at least 20 tables built with xtable command of kind:

echo=F, results=hide=
q5 = factor(Q5, label=c(Não, Sim))
(q5.tab = cbind(table(q5)))
@

echo=F, results=tex=
xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H')
@

I'm getting the following message:

Too many unprocessed floats

in Latex file.

How to avoid these messages appearing?


Not really an R-help question, and easily answered using our friend google.

Anyway, you need to cause the floats to be processed before you have 
'too many'. This is done by adding a \clearpage to the LaTeX portion of 
your document every so often.


Best,

Jim




--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] script development for Unconditional Density and Probability estimation

First start by putting it in a function so you can specify the
parameters you want to change.

On Mon, Jun 14, 2010 at 11:54 AM,  steve_fried...@nps.gov wrote:

 Hello,

 I'd like to automate this script a bit more and cycle several
 parameters(both the species and the metric).  For example where AnnualDepth
 occurs, I need to process about 12 metrics so instead of writing this
 entire script 12 times once for each metric  I'd like to be able to
 automatically get another metric.

 Any suggestion will be greatly appreciated.

 Currently running Windows XP, R 2.11.1

 ###

 Marsh - cbind(SoilVegHydro, vegcode)
 AnnualDepth - Marsh[,'meanAnnualDepthAve']

 cattail_0 - Marsh[,'cattail'] == '0'  # no           need to run for 8
 species, automate if possible
  cattail_1 - Marsh[,'cattail'] == '1' # yes         need to run for 8
 species

 spbase.rate.d1 - sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) )
 annualDepth.density - density(AnnualDepth)    # this line needs either
 interactively defined or automatically cycle thru a number of metrics

 cattail.d0.density - density(AnnualDepth[cattail_0])
 cattail.d1.density - density(AnnualDepth[cattail_1])

 approxfun (cattail.d0.density$x, cattail.d0.density$y) - cattail.d0.f
 approxfun (cattail.d1.density$x, cattail.d1.density$y) - cattail.d1.f

 p.d.given.AnnualDepth - function(AnnualDepth, spbase.rate.d1)
     {
        p1 - cattail.d1.f(AnnualDepth) * spbase.rate.d1
        p0 - cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1)
       p1/(cattail_0+cattail_1)
    }

 x - 1:1292
 y - p.d.given.AnnualDepth(x, spbase.rate.d1)
 plot (x, y, type='l', col='red', xlab='Mean Annual Depth',
 main=c(Cattail), ylab='estimated\nProbability(cattail|AnnualDepth)')

 plot (cattail.d0.density, col ='red', lty= 1, main = )
  lines(cattail.d1.density, col = 'blue', lty=1)
  lines(annualDepth.density , col='black', lty = 1)
  legend(2000, 0.0023, c(No Cattail, Cattail, Mean Annual Depth),
      col=c(red, blue, black),lty=c(1))




 #

 Steve Friedman Ph. D.
 Spatial Statistical Analyst
 Everglades and Dry Tortugas National Park
 950 N Krome Ave (3rd Floor)
 Homestead, Florida 33034

 steve_fried...@nps.gov
 Office (305) 224 - 4282
 Fax     (305) 224 - 4147

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unqiue problem

2010-06-14 Thread Assa Yeroslaviz

I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of getting
the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote:

 Your process does remove all the duplicate entries based on the
 content of the two columns.  After you do this, there are still
 duplicate entries in the first column that you are trying to use as
 rownames and therefore the error.  Why to you want to use non-unique
 entries as rownames?  Do you really need the row names, or should you
 only be keeping unique values for the first column?

 On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote:
  Hello everybody,
 
  I have a a matrix of 2 columns and over 27k rows.
  some of the rows are double , so I tried to remove them with the command
  unique():
 
  Workbook5 - read.delim(file =  Workbook5.txt)
  dim(Workbook5)
  [1] 27748 2
  Workbook5 - unique(Workbook5)
  dim(Workbook5)
  [1] 20101 2
 
  it removed a lot of line, but unfortunately not all of them. I wanted to
 add
  the row names to the matrix and got this error message:
  rownames(Workbook5) - Workbook5[,1]
  Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,
  :
   duplicate 'row.names' are not allowed
  In addition: Warning message:
  non-unique values when setting 'row.names': âÃÃ²A_51_P102339âÃÃ´,
  âÃÃ²A_51_P102518âÃÃ´, âÃÃ²A_51_P103435âÃÃ´, 
  âÃÃ²A_51_P103465âÃÃ´,
  âÃÃ²A_51_P103594âÃÃ´, âÃÃ²A_51_P104409âÃÃ´, 
  âÃÃ²A_51_P104718âÃÃ´,
  âÃÃ²A_51_P105869âÃÃ´, âÃÃ²A_51_P106428âÃÃ´, 
  âÃÃ²A_51_P106799âÃÃ´,
  âÃÃ²A_51_P107176âÃÃ´, âÃÃ²A_51_P107959âÃÃ´, 
  âÃÃ²A_51_P108767âÃÃ´,
  âÃÃ²A_51_P109258âÃÃ´, âÃÃ²A_51_P109708âÃÃ´, 
  âÃÃ²A_51_P110341âÃÃ´,
  âÃÃ²A_51_P111757âÃÃ´, âÃÃ²A_51_P112427âÃÃ´, 
  âÃÃ²A_51_P112662âÃÃ´,
  âÃÃ²A_51_P113672âÃÃ´, âÃÃ²A_51_P115018âÃÃ´, 
  âÃÃ²A_51_P116496âÃÃ´,
  âÃÃ²A_51_P116636âÃÃ´, âÃÃ²A_51_P117666âÃÃ´, 
  âÃÃ²A_51_P118132âÃÃ´,
  âÃÃ²A_51_P118168âÃÃ´, âÃÃ²A_51_P118400âÃÃ´, 
  âÃÃ²A_51_P118506âÃÃ´,
  âÃÃ²A_51_P119315âÃÃ´, âÃÃ²A_51_P120093âÃÃ´, 
  âÃÃ²A_51_P120305âÃÃ´,
  âÃÃ²A_51_P120738âÃÃ´, âÃÃ²A_51_P120785âÃÃ´, 
  âÃÃ²A_51_P121134âÃÃ´,
  âÃÃ²A_51_P121359âÃÃ´, âÃÃ²A_51_P121412âÃÃ´, 
  âÃÃ²A_51_P121652âÃÃ´,
  âÃÃ²A_51_P121724âÃÃ´, âÃÃ²A_51_P121829âÃÃ´, 
  âÃÃ²A_51_P122141âÃÃ´,
  âÃÃ²A_51_P122964âÃÃ´, âÃÃ²A_51_P123422âÃÃ´, 
  âÃÃ²A_51_P123895âÃÃ´,
  âÃÃ²A_51_P124008âÃÃ´, âÃÃ²A_51_P124719âÃÃ´, 
  âÃÃ²A_51_P125648âÃÃ´,
  âÃÃ²A_51_P125679âÃÃ´, âÃÃ²A_51_P125779â [... truncated]
 
  Is there a better way to discard the duplicataions in the text file
 (Excel
  file is the origin).
 
  R.version
_
  platform   x86_64-apple-darwin9.8.0
  arch   x86_64
  os darwin9.8.0
  system x86_64, darwin9.8.0
  status Patched
  major  2
  minor  11.1
  year   2010
  month  06
  day03
  svn rev52201
  language   R
  version.string R version 2.11.1 Patched (2010-06-03 r52201)
 
  THX
 
  Assa
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging data frames

If you want to keep only the rows that are unique in the first column
then do the following:

workComb1 - subset(workComb, !duplicated(ProbeID))

On Mon, Jun 14, 2010 at 11:20 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 well, the problem is basically elsewhere. I have a data frame with
 expression data and doubled IDs in the first column (see example)
 when I want to put them into row names I get the message, that there are
 non-unique items in the data.
 So I tried with unique to delete such rows. The problem is unique doesn't
 delete all of them.

 I compare two data frames with their Probe IDs.
 I would like to delete all double lines with a certain probe ID independent
 from the rest of the line, as to say I would like a data frame with single
 unique idetifiers in the Probe Id column.
 merge doesn't give me that. It doesn't delete all similar line, if the line
 are not identical in the other columns it leaves them in the table.

 Is there a way of deleting whole the line with double Probe IDs?

 workbook - read.delim(file = workbook1.txt, quote = , sep = \t)
 GeneID - read.delim(file = testTable.txt, quote = , sep = \t)
 workComb - merge(workbook, GeneID, by.x = ProbeID, by.y = Probe.Id)
 workComb1 - unique(workComb)
 write.table(workComb, file = workComb.txt , sep = \t, quote = FALSE,
 row.names = FALSE)
 write.table(workComb1, file = workComb1.txt , sep = \t, quote = FALSE,
 row.names = FALSE)

 look at lines 49 and 50 in the file workComb1.txt after using unique on the
 file. The line are identical  with the exception of the Transcript ID. I
 would like to take one of them out of the table.

 THX,

 Assa

 On Mon, Jun 14, 2010 at 15:33, jim holtman jholt...@gmail.com wrote:

 Put the rownames as another column in your dataframe so that it
 remains with the data.  After merging, you can then use it as the
 rownames

 On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz fry...@gmail.com wrote:
  Hi,
 
  is it possible to merge two data frames while preserving the row names
  of
  the bigger data frame?
 
  I have two data frames which  i would like to combine. While doing so I
  always loose the row names. When I try to append this, I get the error
  message, that I have non-unique names. This although I used unique
  command
  on the data frame where the double inputs supposedly are
 
  thanks for the help
 
  Assa
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Count of unique factors within another factor

Another possibility:

 rowSums(table(x)  0)


On Sun, Jun 13, 2010 at 3:08 PM, Erik Iverson er...@ccbr.umn.edu wrote:

 I think ?tapply will help here.  But *please* read the posting guide and
 provide minimal, reproducible examples!


 Birdnerd wrote:

 I have a data frame with two factors (sampling 'unit', 'species'). I want
 to
 calculate the number of unique 'species' per 'unit.' I can calculate the
 number of unique values for each variable separately, but can't get a
 count
 for each unit.

  data=read.csv(C:/Desktop/sr_sort_practice.csv)
 attach(data)


  data[1:10,]

   unit species
 1   123ACMA
 2   123LIDE
 3   123LIDE
 4   123SESE
 5   123SESE
 6   123SESE
 7   345HEAR
 8   345LOHI
 9   345QUAG
 10  345TODI..

  sr.unique- lapply (data, unique)

 $unit
 [1] 123 345 216
 $species
  [1] ACMA  LIDE  SESE  HEAR  LOHI  QUAG  TODI  UMCA  ARSP  LIDE

  sapply (sr.unique,length)

unit species   3  10

 Then, I get stuck here because this unique species count is not given for
 each unit.
 What I'd like to get is:

 unit species
 1233
 3454
 216--

 Thanks--


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How we can open the results are saved

2010-06-14 Thread khazaei

Hi all
I saved the result of my code as a file, like
 save(namefunction,file=adresse/filename.R).
 I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How we can open the results are saved

load('adresse/filename.R')


On Mon, Jun 14, 2010 at 12:41 PM,  khaz...@ceremade.dauphine.fr wrote:
 Hi all
 I saved the result of my code as a file, like
 save(namefunction,file=adresse/filename.R).
  I want to open the filename. Could you please help me how I can open the
 filename and see the result.

 best
 Khazaei

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How we can open the results are saved

2010-06-14 Thread Bert Gunter

No.

Binary workspace data are saved by default with the .Rdata extension and
are opened (actually have their contents added to the current workspace)
by load().

.R text files and would need to be sourced:

source('adresse/filename.R')

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of jim holtman
Sent: Monday, June 14, 2010 9:50 AM
To: khaz...@ceremade.dauphine.fr
Cc: r-help@r-project.org
Subject: Re: [R] How we can open the results are saved

load('adresse/filename.R')


On Mon, Jun 14, 2010 at 12:41 PM,  khaz...@ceremade.dauphine.fr wrote:
 Hi all
 I saved the result of my code as a file, like
 save(namefunction,file=adresse/filename.R).
  I want to open the filename. Could you please help me how I can open the
 filename and see the result.

 best
 Khazaei

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unqiue problem

2010-06-14 Thread David Winsemius



On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:


I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of  
getting

the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote:


Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com  
wrote:

Hello everybody,

I have a a matrix of 2 columns and over 27k rows.
some of the rows are double , so I tried to remove them with the  
command

unique():


Workbook5 - read.delim(file =  Workbook5.txt)
dim(Workbook5)

[1] 27748 2

Workbook5 - unique(Workbook5)


Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...


 Workbook5 - Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.


--
David.

dim(Workbook5)

[1] 20101 2

it removed a lot of line, but unfortunately not all of them. I  
wanted to

add

the row names to the matrix and got this error message:

rownames(Workbook5) - Workbook5[,1]
Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
4L, 5L,

:

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô,
‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô,
‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô,
‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô,
‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô,
‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô,
‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô,
‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô,
‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô,
‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô,
‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô,
‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô,
‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô,
‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô,
‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô,
‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô,
‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

Is there a better way to discard the duplicataions in the text file

(Excel

file is the origin).


R.version

 _
platform   x86_64-apple-darwin9.8.0
arch   x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status Patched
major  2
minor  11.1
year   2010
month  06
day03
svn rev52201
language   R
version.string R version 2.11.1 Patched (2010-06-03 r52201)

THX

Assa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unqiue problem

2010-06-14 Thread David Winsemius



On Jun 14, 2010, at 1:10 PM, David Winsemius wrote:



On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:


I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of  
getting

the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com  
wrote:



Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should  
you

only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz  
fry...@gmail.com wrote:

Hello everybody,

I have a a matrix of 2 columns and over 27k rows.
some of the rows are double , so I tried to remove them with the  
command

unique():


Workbook5 - read.delim(file =  Workbook5.txt)
dim(Workbook5)

[1] 27748 2

Workbook5 - unique(Workbook5)


Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...


Workbook5 - Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.


Actually I was thinking a bit askew although that would have  
succeeded. That was not logical indexing, which would have been done  
with duplicated() ... or rather its negation through the use of the  
! unary operator:


 str(unique(Workbook5[ ,1] ) )
 Factor w/ 17209 levels A_51_P100034,..: 1 2 3 4 5 6 7 8 9 10 ...
 str(!duplicated(Workbook5[ ,1] ) )
 logi [1:20101] TRUE TRUE TRUE TRUE TRUE TRUE ...

So this would have been the way to do it with logical indexing:

Workbook5 - Workbook5[ !duplicated(Workbook5[ ,1] ) , ]




--
David.

dim(Workbook5)

[1] 20101 2

it removed a lot of line, but unfortunately not all of them. I  
wanted to

add

the row names to the matrix and got this error message:

rownames(Workbook5) - Workbook5[,1]
Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
4L, 5L,

:

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô,
‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô,
‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô,
‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô,
‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô,
‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô,
‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô,
‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô,
‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô,
‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô,
‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô,
‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô,
‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô,
‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô,
‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô,
‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô,
‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

Is there a better way to discard the duplicataions in the text file

(Excel

file is the origin).


R.version

_
platform   x86_64-apple-darwin9.8.0
arch   x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status Patched
major  2
minor  11.1
year   2010
month  06
day03
svn rev52201
language   R
version.string R version 2.11.1 Patched (2010-06-03 r52201)

THX

Assa

__
R-help@r-project.org mailing list


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Overlay of barchart and xyplot

2010-06-14 Thread Chen, Huapeng FOR:EX

Thanks a lot!

Huapeng

-Original Message-
From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf Of 
Felix Andrews
Sent: Friday, June 11, 2010 8:23 PM
To: Chen, Huapeng FOR:EX
Cc: r-help@r-project.org
Subject: Re: [R] Overlay of barchart and xyplot

Hi,

I have an example below of adding a key to the merged plot.

You can not have the key on the right hand side because that viewport
is used by the second ylab (ylab2 from doubleYScale). Well, if you
really wanted to, you could do it with the grid package, using
frameGrob or somesuch.


NTLST_Dispersal_VAR_00_08$Year -
factor(NTLST_Dispersal_VAR_00_08$Year, levels =
c(1999,2000,2001,2002,2003,2004,2005,2006,2007),
ordered = TRUE)

dispersal-
barchart(LDP_PER*100 + SPP_PER*100 + SPG_PER*100 ~
Year | District,
data=NTLST_Dispersal_VAR_00_08,
stack=TRUE,
layout=c(5,5),
scales = list(x = list(rot = 90)),
xlab=Year, ylab=%,
strip = strip.custom( bg=light gray),
par.settings = simpleTheme(col = c(dark gray,
light gray, white)),
auto.key = list(points = FALSE, rectangles = TRUE)
)


vars -
xyplot(sqrt(Infestation_NUM) + AI  ~  Year | District,
data=NTLST_Dispersal_VAR_00_08,
layout=c(5,5),
type=b,
ylab=Square roots of number of infested cells/Landscape
aggregation index,
auto.key = list(lines = TRUE)
)

dblplot -
doubleYScale(dispersal, vars, use.style=FALSE, add.ylab2 = TRUE
)

dblplot -
update(dblplot,
par.settings = simpleTheme(fill = c(white, dark gray,
black), border=black,col.line=black,
   col.points=black,pch=c(16,17),lty=c(1,1,1,2,1))
)

## include second key at the bottom
update(dblplot, legend = list(bottom = vars$legend$top))

## Otherwise you could just include a key argument in the first plot
which includes all the items explicitly.

## Or merge the two 'auto.key's at the top:

mergeLegends - function(a, b, ...) {
g - frameGrob()
agrob - a
if (!inherits(a, grob)) {
a - eval(as.call(c(as.symbol(a$fun), a$args)), getNamespace(lattice))
}
if (!inherits(b, grob)) {
b - eval(as.call(c(as.symbol(b$fun), b$args)), getNamespace(lattice))
}
g - packGrob(g, a, side = left)
packGrob(g, b, side = right)
}

update(dblplot, legend = list(top = list(fun = mergeLegends,
args = list(a = dispersal$legend$top, b = vars$legend$top





On 5 June 2010 04:49, Chen, Huapeng FOR:EX huapeng.c...@gov.bc.ca wrote:
 Hi Felix,

 Thanks for your help and advice. The following code is close to what I want 
 but still have problems of failure to custom lines and add a key in any way. 
 Par.settings with the final plot seems not working somehow except pch and lty 
 but they overwrite par.setting with barchart. I also attached data I used by 
 using dput. I appreciate your further helps.

 Thanks,

 Huapeng


 # code #
 NTLST_Dispersal_VAR_00_08$Year - factor(NTLST_Dispersal_VAR_00_08$Year, 
 levels = c(1999,2000,2001,2002,2003,2004,2005,2006,2007), 
 ordered = TRUE)

 dispersal-barchart(NTLST_Dispersal_VAR_00_08$LDP_PER*100 +
                     NTLST_Dispersal_VAR_00_08$SPP_PER*100 +
                     NTLST_Dispersal_VAR_00_08$SPG_PER*100 ~
                     NTLST_Dispersal_VAR_00_08$Year | 
 NTLST_Dispersal_VAR_00_08$District,
                     data=NTLST_Dispersal_VAR_00_08,
                     horizontal=FALSE,
                     stack=TRUE,
                     layout=c(5,5),
                     xlab=Year,
                     ylab=%,
                     strip = strip.custom( bg=light gray),
                     par.settings = simpleTheme(col = c(dark gray, light 
 gray, white)),
                     #key=list(space=right,size=10,
                     # rectangles=list(size=1.7, border=black, col = 
 c(white, dark gray, black)),
                      
 #lines=list(pch=c(16,17),lty=c(1,2),col=black,type=b),
                     # text=list(text=c(SPG,SPP,LDP)))

                     #auto.key=TRUE
                     )



 xyplot(sqrt(NTLST_Dispersal_VAR_00_08$Infestation_NUM) +
             NTLST_Dispersal_VAR_00_08$AI  ~  NTLST_Dispersal_VAR_00_08$Year | 
 NTLST_Dispersal_VAR_00_08$District,
             data=NTLST_Dispersal_VAR_00_08,
             layout=c(5,5),
             type=b,
             ylab=Square roots of number of infested cells/Landscape 
 aggregation index,
             #par.settings = simpleTheme(col = c(black, black), 
 pch=c(16,17)),
             #key=list(space=right,size=10,
                      #rectangles=list(size=1.7, border=black, col = 
 c(white, dark gray, black)),
             #         
 lines=list(pch=c(16,17),lty=c(1,2),col=black,type=b),
             #

Re: [R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread David Smith

On Mon, Jun 14, 2010 at 7:16 AM, Red Roo redr...@yahoo.com wrote:

 Looking for a recommended package that handles prime number computations.


The gmp package (http://crantastic.org/packages/gmp) has some good tools for
prime numbers. I've used the is.prime function before; it's stochastic (in
the sense that it gives the right answer with a high probability), but very
fast for large numbers. I'm pretty sure there are exact functions for
testing and generating primes too.

# David Smith

-- 
David M Smith da...@revolutionanalytics.com
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recursive function

2010-06-14 Thread K. Elo

Hi!

Do you mean something like this (df is your original data frame):

--- cut here ---

df1-df
df1[[1]]-paste(R,df[[1]],sep=_)
colnames(df1)-c(SERIES,YEAR,value)
df1$value[ df1$YEAR==2009 ]-5
for (i in c(2009:2007)) { df1$value[ df1$YEAR==(i-1) ]-( df1$value[
df1$YEAR==i ]-df$DELTA[ df$year==i ] ) }

--- cut here ---

Now the output:

 df1
  SERIES YEAR value
1 R_EC01 2006   -19
2 R_EC01 2007-9
3 R_EC01 2008-4
4 R_EC01 2009 5

Please let me know if you were looking for a more general approach
suitable for larger data frames with e.g. several variable classes
(EC01, EC02 etc.)

Kind regards,
Kimmo


--
University of Turku, Finland
Dep. of Political Science and Contemporary history

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How we can open the results are saved

2010-06-14 Thread Patrick Burns


On 14/06/2010 17:50, jim holtman wrote:

load('adresse/filename.R')


Or:

attach('adresse/filename.R')

The difference between 'load' and 'attach'
is that 'load' puts the contents of the file
into your workspace (global environment, first
location on the search list), while 'attach'
creates a new location on the search list.

But calling this 'filename.R' is likely to lead
to trouble.  The '.R' suffix is usually used
for files containing R commands that can be
used by the 'source' function.

The usual convention for files created by 'save'
is to use a '.rda' suffix.

Pat




On Mon, Jun 14, 2010 at 12:41 PM,khaz...@ceremade.dauphine.fr  wrote:

Hi all
I saved the result of my code as a file, like

save(namefunction,file=adresse/filename.R).

  I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recursive function