Re: [R] cointegration analysis

2007-08-09 Thread gyadav

i got this error, i dont remember what was the cause, but what i did work 
around was that see the example in the manual pages of the ca.po... etc 
and try to make your date in the same format. also just see whether the 
functions will take so many columns as a parameter. I have not checked it. 
Lastly see whether the data what you are using is not having any missing 
values or number in 'text' format


HTH



Dorina LAZAR [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
08/08/2007 10:15 PM

To
r-help@stat.math.ethz.ch
cc

Subject
[R] cointegration analysis







Hello,

I tried to use urca package (R) for cointegration analysis. The data
matrix to be investigated for cointegration contains 8 columns
(variables). Both procedures, Phillips  Ouliaris test and Johansen's
procedures give errors (error in evaluating the argument 'object' in
selecting a method for function 'summary' respectiv too many
variables, critical values cannot be computed”). What can I do?

With regards,

Dorina LAZAR
Department of Statistics, Forecasting, Mathematics
Babes Bolyai University, Faculty of Economic Science
Teodor Mihali 58-60, 400591 Cluj-Napoca, Romania

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the message) are confidential and 
intended
solely for the addressees. Unauthorized reading, copying, dissemination, 
distribution or
disclosure either whole or partial, is prohibited. If you receive this message 
in error,
please delete it and immediately notify the sender. Communicating through email 
is not
secure and capable of interception, corruption and delays. Anyone communicating 
with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks 
involved and their
consequences. The internet can not guarantee the integrity of this message. 
CCIL shall
(will) not therefore be liable for the message if modified. The recipient 
should check this
email and any attachments for the presence of viruses. CCIL accepts no 
liability for any
damage caused by any virus transmitted by this email.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cointegration analysis

2007-08-09 Thread gyadav

regrets
typo error - please read 'date' as 'data'


---
  Regards,
Gaurav Yadav (mobile: +919821286118)
Assistant Manager, CCIL, Mumbai (India)
mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
Profile: http://www.linkedin.com/in/gydec25
  Keep in touch and keep mailing :-)
slow or fast, little or too much




[EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
08/09/2007 11:49 AM

To
Dorina LAZAR [EMAIL PROTECTED]
cc
r-help@stat.math.ethz.ch, [EMAIL PROTECTED]
Subject
Re: [R] cointegration analysis







i got this error, i dont remember what was the cause, but what i did work 
around was that see the example in the manual pages of the ca.po... etc 
and try to make your date in the same format. also just see whether the 
functions will take so many columns as a parameter. I have not checked it. 

Lastly see whether the data what you are using is not having any missing 
values or number in 'text' format


HTH



Dorina LAZAR [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
08/08/2007 10:15 PM

To
r-help@stat.math.ethz.ch
cc

Subject
[R] cointegration analysis







Hello,

I tried to use urca package (R) for cointegration analysis. The data
matrix to be investigated for cointegration contains 8 columns
(variables). Both procedures, Phillips  Ouliaris test and Johansen's
procedures give errors (error in evaluating the argument 'object' in
selecting a method for function 'summary' respectiv too many
variables, critical values cannot be computed”). What can I do?

With regards,

Dorina LAZAR
Department of Statistics, Forecasting, Mathematics
Babes Bolyai University, Faculty of Economic Science
Teodor Mihali 58-60, 400591 Cluj-Napoca, Romania

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the message) are confidential 
and intended
solely for the addressees. Unauthorized reading, copying, dissemination, 
distribution or
disclosure either whole or partial, is prohibited. If you receive this 
message in error,
please delete it and immediately notify the sender. Communicating through 
email is not
secure and capable of interception, corruption and delays. Anyone 
communicating with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks 
involved and their
consequences. The internet can not guarantee the integrity of this 
message. CCIL shall
(will) not therefore be liable for the message if modified. The recipient 
should check this
email and any attachments for the presence of viruses. CCIL accepts no 
liability for any
damage caused by any virus transmitted by this email.

 [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the message) are confidential and 
intended
solely for the addressees. Unauthorized reading, copying, dissemination, 
distribution or
disclosure either whole or partial, is prohibited. If you receive this message 
in error,
please delete it and immediately notify the sender. Communicating through email 
is not
secure and capable of interception, corruption and delays. Anyone communicating 
with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks 
involved and their
consequences. The internet can not guarantee the integrity of this message. 
CCIL shall
(will) not therefore be liable for the message if modified. The recipient 
should check this
email and any attachments for the presence of viruses. CCIL accepts no 
liability for any
damage caused by any virus transmitted by this email.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading time/date string

2007-08-09 Thread Matthew Walker
Thanks Mark, that was very helpful.  I'm now so close!

Can anyone tell me how to extract the value from an instance of a 
difftime class?  I can see the value, but how can I place it in a 
dataframe?

  time_string1 - 10:17:07 02 Aug 2007
  time_string2 - 13:17:40 02 Aug 2007
 
  time1 - strptime(time_string1, format=%H:%M:%S %d %b %Y)
  time2 - strptime(time_string2, format=%H:%M:%S %d %b %Y)
 
  time_delta - difftime(time2,time1, unit=sec)
  time_delta
Time difference of 10833 secs # --- I'd like this value just here!
 
  data.frame(time1, time2, time_delta)
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class difftime into a data.frame



Thanks again,

Matthew


Mark W Kimpel wrote:
 Look at some of these functions...

 DateTimeClasses(base)   Date-Time Classes
 as.POSIXct(base)Date-time Conversion Functions
 cut.POSIXt(base)Convert a Date or Date-Time Object to a Factor
 format.Date(base)   Date Conversion Functions to and from Character

 Mark
 ---

 Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN  46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 663-0513 Home (no voice mail please)

 **

 Matthew Walker wrote:
 Hello everyone,

 Can anyone tell me what function I should use to read time/date 
 strings and turn them into a form such that I can easily calculate 
 the difference of two?  The strings I've got look like 10:17:07 02 
 Aug 2007.  If I could calculate the number of seconds between them 
 I'd be very happy!

 Cheers,

 Matthew

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 .


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading time/date string

2007-08-09 Thread Prof Brian Ripley
On Thu, 9 Aug 2007, Matthew Walker wrote:

 Thanks Mark, that was very helpful.  I'm now so close!

 Can anyone tell me how to extract the value from an instance of a
 difftime class?  I can see the value, but how can I place it in a
 dataframe?

as.numeric(time_delta)

Hint: you want the number, not the value (which is a classed object).


  time_string1 - 10:17:07 02 Aug 2007
  time_string2 - 13:17:40 02 Aug 2007
 
  time1 - strptime(time_string1, format=%H:%M:%S %d %b %Y)
  time2 - strptime(time_string2, format=%H:%M:%S %d %b %Y)
 
  time_delta - difftime(time2,time1, unit=sec)
  time_delta
 Time difference of 10833 secs # --- I'd like this value just here!
 
  data.frame(time1, time2, time_delta)
 Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class difftime into a data.frame



 Thanks again,

 Matthew


 Mark W Kimpel wrote:
 Look at some of these functions...

 DateTimeClasses(base)   Date-Time Classes
 as.POSIXct(base)Date-time Conversion Functions
 cut.POSIXt(base)Convert a Date or Date-Time Object to a Factor
 format.Date(base)   Date Conversion Functions to and from Character

 Mark
 ---

 Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
 Indiana University School of Medicine

 15032 Hunter Court, Westfield, IN  46074

 (317) 490-5129 Work,  Mobile  VoiceMail
 (317) 663-0513 Home (no voice mail please)

 **

 Matthew Walker wrote:
 Hello everyone,

 Can anyone tell me what function I should use to read time/date
 strings and turn them into a form such that I can easily calculate
 the difference of two?  The strings I've got look like 10:17:07 02
 Aug 2007.  If I could calculate the number of seconds between them
 I'd be very happy!

 Cheers,

 Matthew

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 .


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cointegration analysis

2007-08-09 Thread Pfaff, Bernhard Dr.
Hello Dorina,

if you apply ca.jo to a system with more than five variables, a
*warning* is issued that no critical values are provided. This is not an
error, but documented in ?ca.jo. In the seminal paper of Johansen, only
cv for up to five variables are provided. Hence, you need to refer to a
different source of cv in order to determine the cointegration rank.

Best,
Bernhard



Hello,

I tried to use urca package (R) for cointegration analysis. The data
matrix to be investigated for cointegration contains 8 columns
(variables). Both procedures, Phillips  Ouliaris test and Johansen's
procedures give errors (error in evaluating the argument 'object' in
selecting a method for function 'summary' respectiv too many
variables, critical values cannot be computed). What can I do?

With regards,

Dorina LAZAR
Department of Statistics, Forecasting, Mathematics
Babes Bolyai University, Faculty of Economic Science
Teodor Mihali 58-60, 400591 Cluj-Napoca, Romania

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

*
Confidentiality Note: The information contained in this mess...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R memory usage

2007-08-09 Thread Prof Brian Ripley
See

?gc
?Memory-limits

On Wed, 8 Aug 2007, Jun Ding wrote:

 Hi All,

 I have two questions in terms of the memory usage in R
 (sorry if the questions are naive, I am not familiar
 with this at all).

 1) I am running R in a linux cluster. By reading the R
 helps, it seems there are no default upper limits for
 vsize or nsize. Is this right? Is there an upper limit
 for whole memory usage? How can I know the default in
 my specific linux environment? And can I increase the
 default?

See ?Memory-limits, but that is principally a Linux question.


 2) I use R to read in several big files (~200Mb each),
 and then I run:

 gc()

 I get:

used  (Mb) gc trigger   (Mb)  max used
 Ncells  23083130 616.4   51411332 1372.9  51411332
 Vcells 106644603 813.7  240815267 1837.3 227550003

 (Mb)
 1372.9
 1736.1

 What do columns of used, gc trigger and max used
 mean? It seems to me I have used 616Mb of Ncells and
 813.7Mb of Vcells. Comparing with the numbers of max
 used, I still should have enough memory. But when I
 try

 object.size(area.results)   ## area.results is a big
 data.frame

 I get an error message:

 Error: cannot allocate vector of size 32768 Kb

 Why is that? Looks like I am running out of memory. Is
 there a way to solve this problem?

 Thank you very much!

 Jun


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tcltk error on Linux

2007-08-09 Thread Prof Brian Ripley
On Thu, 9 Aug 2007, Mark W Kimpel wrote:

 I am having trouble getting tcltk package to load on openSuse 10.2
 running R-devel. I have specifically put my /usr/share/tcl directory in
 my PATH, but R doesn't seem to see it. I also have installed tk on my
 system. Any ideas on what the problem is?

Whether Tcl/Tk would available was determined when you installed R.  The 
relevant information was in the configure output and log, which we don't 
have.

You are not running a released version of R: please don't use the 
development version unless you are familiar with the build process and 
know how to debug such things yourself.  The rule is that questions about 
development versions of R should not be asked here but on R-devel (and not 
to R-core which I have deleted from the recipients).

I suggest reinstalling R (preferably R-patched) and if tcltk still is not 
available sending the relevant configure information to the R-devel list.

 Also, note that I have some warning messages on starting up R, not sure
 what they mean or if they are pertinent.

Those are coming from a Bioconductor package: again you must be using 
development versions with R-devel and those are not stable (last time I 
looked even Biobase would not install, and the packages change daily).

If you have all those packages in your startup, please don't -- there will 
be a considerable performance hit so only load them when you need them.


 Thanks, Mark

 Warning messages:
 1: In .updateMethodsInTable(fdef, where, attach) :
   Couldn't find methods table for conditional, package Category may
 be out of date
 2: In .updateMethodsInTable(fdef, where, attach) :
   Methods list for generic conditional not found
  require(tcltk)
 Loading required package: tcltk
 Error in firstlib(which.lib.loc, package) :
   Tcl/Tk support is not available on this system
  sessionInfo()
 R version 2.6.0 Under development (unstable) (2007-08-01 r42387)
 i686-pc-linux-gnu

 locale:
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

 attached base packages:
 [1] splines   tools stats graphics  grDevices utils datasets
 [8] methods   base

 other attached packages:
  [1] affycoretools_1.9.3annaffy_1.9.1  xtable_1.5-0
  [4] gcrma_2.9.1matchprobes_1.9.10 biomaRt_1.11.4
  [7] RCurl_0.8-1XML_1.9-0  GOstats_2.3.8
 [10] Category_2.3.19genefilter_1.15.9  survival_2.32
 [13] KEGG_1.17.0RBGL_1.13.3annotate_1.15.3
 [16] AnnotationDbi_0.0.88   RSQLite_0.6-0  DBI_0.2-3
 [19] GO_1.17.0  limma_2.11.9   affy_1.15.7
 [22] preprocessCore_0.99.12 affyio_1.5.6   Biobase_1.15.23
 [25] graph_1.15.10

 loaded via a namespace (and not attached):
 [1] cluster_1.11.7  rcompgen_0.1-15
 



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: Successively eliminating most frequent elemets

2007-08-09 Thread Petr PIKAL
Hi

your construction is quite complicated so instead of refining it I tried 
to do such task different way. If I understand what you want to do you can 
use

 set.seed(1)
 T - matrix(trunc(runif(20)*10), nrow=10, ncol=2)
 T
  [,1] [,2]
 [1,]22
 [2,]31
 [3,]56
 [4,]93
 [5,]27
 [6,]84
 [7,]97
 [8,]69
 [9,]63
[10,]07

 m-table(T) # matrix is vector with dimensions
 todel-rowSums(T==as.numeric(names(which.max(m0 # check which 
element of matrix is the first frequent

 T[todel,]
 [,1] [,2]
[1,]22
[2,]27

 T[!todel,]
 [,1] [,2]
[1,]31
[2,]56
[3,]93
[4,]84
[5,]97
[6,]69
[7,]63
[8,]07

You can put all of these to cycle but you have to decide when to end the 
cycle.
 
Regards
Petr
[EMAIL PROTECTED]

[EMAIL PROTECTED] napsal dne 08.08.2007 15:33:58:

 Dear experts,
 
I have a 10x2 matrix T containing random integers. I would like to 
delete 
 pairs (rows) iteratively, which contain the most frequent element either 
in 
 the first or second column:
 
 
 T - matrix(trunc(runif(20)*10), nrow=10, ncol=2)
 
 G - matrix(0, nrow=6, ncol=2)
 
 for (i  in (1:6)){
   print(** Start iteration  ~i~  
***)
   print(Current matrix:)
   print(T)
 
   m - append(T[,1], T[,2])
 
   print(Concatenated columns:)
   print(m)
 
 
   # build frequency table
   F - data.matrix(as.data.frame(table(m)))
 
   dimnames(F)-NULL
 
   # pick up the most frequent element: sort decreasing and take is from 
the top
   F - F[order(F[,2], decreasing=TRUE),]
 
   print(Freq. table:)
   print(F[1:5,])
 
   todel - F[1,1] #rows containing the most frequent element will be 
deleted
   G[i,1] - todel
   G[i,2] - F[1,2]
 
   print(todel=~todel)
 
   # eliminate rows containing the most frequent element
   # either the first or the second column contains this element
   id - which(T[,1]==todel)
   print(Indexes of rows to be deleted:)
   print(id)
   if (length(id)0){
 T - T[-1*id, ]
   }
 
   id - which(T[,2]==todel)
   print(Indexes of rows to be deleted:)
   print(id)
   if (length(id)0){
 T - T[-1*id, ]
   }
 
   print(nrow(T)=~nrow(T))
 
 }
 
 print(Result matrix:)
 print(G)
 
 The output of the first two iterations looks like as follows. As one can 
see, 
 the frequency table in the second iteration still contains the element 
deleted
 in the first iteration! Is this a bug or what am I doing here wrong?
 Any help greatly appreciated!
 
 [1] ** Start iteration 1 ***
 [1] Current matrix:
   [,1] [,2]
  [1,]22
  [2,]67
  [3,]99
  [4,]35
  [5,]40
  [6,]79
  [7,]57
  [8,]17
  [9,]96
 [10,]33
 [1] Concatenated columns:
  [1] 2 6 9 3 4 7 5 1 9 3 2 7 9 5 0 9 7 7 6 3
 [1] Freq. table:
  [,1] [,2]
 [1,]84
 [2,]94
 [3,]43
 [4,]32
 [5,]62
 [1] todel=8
 [1] Indexes of rows to be deleted:
 integer(0)
 [1] Indexes of rows to be deleted:
 integer(0)
 [1] nrow(T)=10
 [1] ** Start iteration 2 ***
 [1] Current matrix:
   [,1] [,2]
  [1,]22
  [2,]67
  [3,]99
  [4,]35
  [5,]40
  [6,]79
  [7,]57
  [8,]17
  [9,]96
 [10,]33
 [1] Concatenated columns:
  [1] 2 6 9 3 4 7 5 1 9 3 2 7 9 5 0 9 7 7 6 3
 [1] Freq. table:
  [,1] [,2]
 [1,]84
 [2,]94
 [3,]43
 [4,]32
 [5,]62
 [1] todel=8
 [1] Indexes of rows to be deleted:
 integer(0)
 [1] Indexes of rows to be deleted:
 integer(0)
 [1] nrow(T)=10
 [1] ** Start iteration 3 ***
 [1] Current matrix:
 ...
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using data() and determining data types

2007-08-09 Thread Edna Bell
Hi R Gurus:

I'm using the data() function to get the list of data sets for a package.

I would like to find the class for each data set; i.e.,data.frame, etc.

Using str(), I can find the name of the data set.

However, when I try the class function on the str output, I get
character, since the name in the str is a character.

I've also tried this with just plain results column.  Still no luck.

Any help would be much appreciated.

Sincerely,
Edna Bell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data() problem solved

2007-08-09 Thread Edna Bell
Problem solved:
 sapply(data(package=car)$results[,3], function(x)class(get(x)))
sorry for the silliness



Hi R Gurus:

I'm using the data() function to get the list of data sets for a package.

I would like to find the class for each data set; i.e.,data.frame, etc.

Using str(), I can find the name of the data set.

However, when I try the class function on the str output, I get
character, since the name in the str is a character.

I've also tried this with just plain results column.  Still no luck.

Any help would be much appreciated.

Sincerely,

Edna Bell


Hi R Gurus:

I'm using the data() function to get the list of data sets for a package.

I would like to find the class for each data set; i.e.,data.frame, etc.

Using str(), I can find the name of the data set.

However, when I try the class function on the str output, I get
character, since the name in the str is a character.

I've also tried this with just plain results column.  Still no luck.

Any help would be much appreciated.

Sincerely,

Edna Bell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change in R**2 for block entry regression

2007-08-09 Thread Chuck Cleland
David Kaplan wrote:
 Hi all,
 
 I'm demonstrating a block entry regression using R for my regression 
 class.  For each block, I get the R**2 and the associated F.  I do this 
 with separate regressions adding the next block in and then get the 
 results by writing separate summary() statements for each regression. 
 Is there a more convenient way to do this and also to get the change in 
 R**2 and associated F test for the change?
 
 Thanks in advance.
 
 David

  I'm not sure this is the best approach, but you might start with the
data frame returned by applying anova() to several models and extend
that to include the squared multiple correlation and increments:

  mod.0 - lm(breaks ~ 1, data = warpbreaks)
  mod.1 - lm(breaks ~ 1 + wool, data = warpbreaks)
  mod.2 - lm(breaks ~ 1 + wool + tension, data = warpbreaks)
  mod.3 - lm(breaks ~ 1 + wool * tension, data = warpbreaks)
  BlockRegSum - anova(mod.0, mod.1, mod.2, mod.3)
  BlockRegSum$R2 - 1 - (BlockRegSum$RSS / BlockRegSum$RSS[1])
  BlockRegSum$IncR2 - c(NA, diff(BlockRegSum$R2))
  BlockRegSum$R2[1] - NA

  BlockRegSum

Analysis of Variance Table

Model 1: breaks ~ 1
Model 2: breaks ~ 1 + wool
Model 3: breaks ~ 1 + wool + tension
Model 4: breaks ~ 1 + wool * tension
  Res.DfRSS Df Sum of Sq  FPr(F)R2 IncR2
1 53 9232.8
2 52 8782.1  1 450.7 3.7653   0.1 0.0488114 0.0488114
3 50 6747.9  22034.3 8.4980 0.0006926   0.3   0.2
4 48 5745.1  21002.8 4.1891 0.0210442   0.4   0.1

  BlockRegSum$R2
[1] NA 0.04881141 0.26914067 0.37775086

  BlockRegSum$IncR2
[1] NA 0.04881141 0.22032926 0.10861019

  summary(mod.1)$r.squared
[1] 0.04881141

  summary(mod.2)$r.squared
[1] 0.2691407

  summary(mod.3)$r.squared
[1] 0.3777509

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] substrings

2007-08-09 Thread Edna Bell
Hello again!

I have a set of character results.  If one of the characters is a
blank space, followed by other characters, I want to end at the blank
space.

I tried strsplit, but it picks up again after the blank.

Any help would be much appreciated.

TIA,
Edna

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Countvariable for id by date

2007-08-09 Thread David Gyllenberg
Best R-users, 
   
  Here’s a  newbie question. I have tried to find an answer to this via 
help and the “ave(x,factor(),FUN=function(y)  rank (z,tie=’first’)”-function, 
but without success. 
   
  I have a dataframe  (~8000 observations, registerdata) with four columns: 
id, dg1, dg2 and date(-MM-DD)  of interest:
   
  id;dg1;dg2;date;
  1;F28;;1997-11-04;
  1;F20;F702;1998-11-09;
  1;F20;;1997-12-03;
  1;F208;;2001-03-18;
  2;F32;;1999-03-07;
  2;F29;F32;2000-01-06;
  2;F32;;2003-07-05;
  2;F323;F2800;2000-02-05;
  ...
   
  I would  like o have two additional columns:
  1. “countF20”:  a “countvariable” that shows which in order (by date) the 
id has if it fulfils  the following logical expression: dg1 = F20* OR dg2 = 
F20*, 
  where *  means F201,F202... F2001,F2002...F20001,F20002...
  2. “countF2129”:  another “countvariable” that shows which in order (by 
date) the id has if it fulfils  the following logical expression: dg1 = 
F21*-F29* OR dg2 = F21*-F29*, 
  where F21*-F29*  means F21*, F22*...F29* and
  where *  means F211,F212... F2101,F2102...F21001,F21002...
   
  ... so the  dataframe would look like this, where 1 is the first 
observation for the id with  the right condition, 2 is the second etc.:
   
  id;dg1;dg2;date;countF20;countF2129;
  1;F28;;1997-11-04;;1;
  1;F20;F702;1998-11-09;2;;
  1;F20;;1997-12-03;1;;
  1;F208;;2001-03-18;3;;
  2;F32;;1999-03-07;;;
  2;F29;F32;2000-01-06;;1;
  2;F32;;2003-07-05;;;
  2;F323;F2800;2000-02-05;;2;
  ...
   
  Do you know  a convenient way to create these kind of “countvariables”? 
Thank you in  advance!
   
  / David (david.gyllenberg  at  yahoo.com

   
-
Park yourself in front of a world of choices in alternative vehicles.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substrings

2007-08-09 Thread Vladimir Eremeev

Is this what you want?

 a-c(a b c,1 2 3,q - 5)
 a
[1] a b c 1 2 3 q - 5
 sapply(strsplit(a,[[:blank:]]),function(x)x[1])
[1] a 1 q


Edna Bell wrote:
 
 I have a set of character results.  If one of the characters is a
 blank space, followed by other characters, I want to end at the blank
 space.
 
 I tried strsplit, but it picks up again after the blank.
 
 Any help would be much appreciated.
 


-- 
View this message in context: 
http://www.nabble.com/substrings-tf4241506.html#a12069209
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substrings

2007-08-09 Thread Vladimir Eremeev

one more, shorter, solution.
 a
[1] a b c 1 2 3 q- 5 

 gsub(\\s.+,,a)
[1] a  1  q-



Edna Bell wrote:
 
 I have a set of character results.  If one of the characters is a
 blank space, followed by other characters, I want to end at the blank
 space.
 
 I tried strsplit, but it picks up again after the blank.
 
 Any help would be much appreciated.
 

-- 
View this message in context: 
http://www.nabble.com/substrings-tf4241506.html#a12069488
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help on R performance using aov function

2007-08-09 Thread Francoise PFIFFELMANN
Hi, 
I’m trying to replace some SAS statistical functions by R (batch calling).
But I’ve seen that calling R in a batch mode (under Unix) takes about 2or 3
times more than SAS software. So it’s a great problem of performance for me.
Here is an extract of the calculation:

stoutput-file(res_oneWayAnova.dat,w); 
cat(Param|F|Prob,file=stoutput,\n); 
for (i in 1:n) { 
p-list_param[[i]] 
aov_-aov(A[,p]~ A[,wafer],data=A); 
anova_-summary(aov_); 
if (!is.na(anova_[[1]][1,5])  anova_[[1]][1,5]=0.0001)
res_aov-cbind(p,anova_[[1]][1,4],0.0001) else
res_aov-cbind(p,anova_[[1]][1,4],anova_[[1]][1,5]); 
cat(res_aov, file=stoutput, append = TRUE,sep = |,\n); 
}; 
close(stoutput); 


A is a data.frame of about (400 lines and 1800 parameters).
I’m a new user of R and I don’t know if it’s a problem in my code or if
there are some tips that I can use to optimise my treatment.

Thanks a lot for your help.

Françoise Pfiffelmann
Engineering Data Analysis Group
--
Crolles2 Alliance
860 rue Jean Monnet
38920 Crolles, France
Tel: +33 438 92 29 84
Email: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on R performance using aov function

2007-08-09 Thread Prof Brian Ripley
aov() will handle multiple responses and that would be considerably more 
efficient than running separate fits as you seem to be doing.


Your code is nigh unreadable: please use your spacebar and remove the 
redundant semicolons: `Writing R Extensions' shows you how to tidy up 
your code to make it presentable.  But I think anova_[[1]] is really

coef(summary(aov_)) which is a lot more intelligible.

On Thu, 9 Aug 2007, Francoise PFIFFELMANN wrote:


Hi,
I’m trying to replace some SAS statistical functions by R (batch calling).
But I’ve seen that calling R in a batch mode (under Unix) takes about 2or 3
times more than SAS software. So it’s a great problem of performance for me.
Here is an extract of the calculation:

stoutput-file(res_oneWayAnova.dat,w);
cat(Param|F|Prob,file=stoutput,\n);
for (i in 1:n) {
p-list_param[[i]]
aov_-aov(A[,p]~ A[,wafer],data=A);
anova_-summary(aov_);
if (!is.na(anova_[[1]][1,5])  anova_[[1]][1,5]=0.0001)
res_aov-cbind(p,anova_[[1]][1,4],0.0001) else
res_aov-cbind(p,anova_[[1]][1,4],anova_[[1]][1,5]);
cat(res_aov, file=stoutput, append = TRUE,sep = |,\n);
};
close(stoutput);


A is a data.frame of about (400 lines and 1800 parameters).
I’m a new user of R and I don’t know if it’s a problem in my code or if
there are some tips that I can use to optimise my treatment.

Thanks a lot for your help.

Françoise Pfiffelmann
Engineering Data Analysis Group
--
Crolles2 Alliance
860 rue Jean Monnet
38920 Crolles, France
Tel: +33 438 92 29 84
Email: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Discriminant scores plot

2007-08-09 Thread Dani Valverde
Hello,
How can I plot the discriminant scores resulting from prediction using a
dataset and n lda model and its decission boundaries using GGobi and
rggobi?
Best regards,

Dani

Daniel Valverde Saubí
Grup d'Aplicacions Biomèdiques de la Ressonància Magnètica Nuclear
(GABRMN)
Departament de Bioquímica i Biologia Molecular
Edifici C, Facultat de Biociències, Campus Universitat Autònoma de
Barcelona
08193 Cerdanyola del Vallès, Spain
Tlf. (0034) 935814126
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLMM: MEEM error due to dichotomous variables

2007-08-09 Thread Elva Robinson
I am trying to run a GLMM on some binomial data. My fixed factors include 2 
dichotomous variables, day, and distance. When I run the model:

modelA-glmmPQL(Leaving~Trial*Day*Dist,random=~1|Indiv,family=binomial)

I get the error:

iteration 1
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1

From looking at previous help topics,( 
http://tolstoy.newcastle.edu.au/R/help/02a/4473.html)
I gather this is because of the dichotomous predictor variables - what 
approach should I take to avoid this problem?

Thanks, Elva.

_
Got a favourite clothes shop, bar or restaurant? Share your local knowledge

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Ted . Harding
Greg, I'm going to join issue with your here! Not that I'll go near
advocating Excel-style graphics (abominable, and the Patrick Burns
URL which you cite is remarkable in its restraint). Also, I'm aware
that this is potential flame-war territory --  again, I want to avoid
that too.

However, this is the second time you have intervened on this theme
(previously Mon 6 August), along with John Kane on Wed 1 August and
again today on similar lines, and I think it's time an alternative
point of view was presented, to counteract (I hope usefully) what
seems to be a draconianly prescriptive approach to the presentation
of information.

On 07-Aug-07 21:37:50, Greg Snow wrote:
 Generally adding the numbers to a graph accomplishes 2 things:

 1) it acts as an admission that your graph is a failure

Generally, I disagree. Different elements in a display serve different
purposes, according to the psychological aspects of visual preception.

Sizes, proportions, colours etc. of shapes (bars in a histogram, the
marks representing points in a scatterplot, ... ) are interpreted, so
to speak, intuitively -- the resulting perception is formed by
processes which are hard to ascertain consciously, and the overall
effect can only be ascertained by looking at it, and noting what
impression one has formed. They stimulate mental responses in the
domain of perception of spatial relationships.

Numbers, and text, on the other hand, while still shapes from the
optical point of view, up to the point of their impact on the retina,
provoke different perceptions. They are interpreted analytically
stimulating mental responses in the domains of language and number.

There is no Law whatever which requires that the two must be separated.

It may be that adding any annotation to a graph or diagram will
interfere with the intuitive imterpretation that the diagram is
intended to stimulate, with no associated benefit.

It may be that presenting numerical/textual information within a
graphical/diagrammatic context will interfere with the analytic
interpretation wich is desired, with no associated benefit.

In such cases, it is clearly (and as a matter of fact to be decided
in each case) better to separate the two apsects.

It may, however, be that both can be combined in such a way that
each enhances the other; and also the simultaneous perception of
both aspects induces a cartesian-product richness of interpretation
where each element of the graphical presentation combines with
each element of the textual/numerical presentation to generate
a perception which could not possibly have been realised if they
had been presented separately. This, too, is a matter to be decided
in each case.

On that basis, if a graph without numbers fails to stimulate a
desired impression which could have been stimulated by adding the
numbers to the graph, then the graph without numbers is a failure.

 2) it converts the graph into a poorly laid out table (with a
 colorful and distracting background)

 In general it is better to find an appropriate graph that does
 convey the information that is intended or if a table is more
 appropriate, then replace it with a well laid out table (or both).

There is an implication here that the information conveyed by a graph,
and the information conveyed by a table, are mutually exclusive.
And that it then follows: Thou Shalt Not Allow The One To Corrupt
The Other. While this has the appearance of a Law, it is (for reasons
I have sketched above) a Law which is not *generally* applicable.

 Remember that the role of tables is to look up specific values
 and the role of graphs is to give a good overview.

I would agree with this only to the following extent:

Tables allow *only* the look-up of values.
Graphs (modulo the capacity of the eye/brain to more or less precisely
judge relative magnitudes) only allow a good overview.

I would not agree that these are their exclusive roles.

The role of Hamlet is to agonise over revenge for his father's death.
The role of Ophelia is to embody the love interest in the play.

This does not imply that there should be parallel performances of
Hamlet on two different  stages, with the audience trooping from
one to the other according to which character is currently at the
centre of the action. It actually works better when they're all up
there at once, interacting!

 The books by William Cleveland and Tufte have a lot of good advice
 on these issues.

Since you mention Tufte, I commend the admiring discussion in his
book The Visual Display of Quantitative Information, Chapter 1
(Graphical Excellence), section Narrative Graphics of Space and
Time (pp. 40-41 in the edition which I have) of Minard's graphical
representation of what happened to Napoleon's army in the course
of its advance on, and retreat from, Moscow.

An impression of the original can be formed from the rather small
version displayed on Tufte's website at the top of
  http://www.edwardtufte.com/tufte/posters
The version in the book 

[R] Term Structure Estimation using Kalman Filter

2007-08-09 Thread Bernardo Ribeiro
Long time reader, first time poster,

I'm working on a paper regarding a term structure estimation using the
Kalman Filter Algorithm. The model in question is the Generalized Vasicek,
and since there are coupon-bonds being estimated, I'm supposed to make some
changes on the Kalman Filter.

Does anyone has already used R for these purposes? Any tips?

Does anyone has a Kalman Filter code I could use as a starting point for an
Extended Kalman Filter Approach?

Thanks a lot for the patience and time,

Bernardo Ribeiro

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Countvariable for id by date

2007-08-09 Thread jim holtman
This should do what you want:

 x - read.table(textConnection(id;dg1;dg2;date;
+  1;F28;;1997-11-04;
+  1;F20;F702;1998-11-09;
+  1;F20;;1997-12-03;
+  1;F208;;2001-03-18;
+  2;F32;;1999-03-07;
+  2;F29;F32;2000-01-06;
+  2;F32;;2003-07-05;
+  2;F323;F2800;2000-02-05;), header=TRUE, sep=;, as.is=TRUE)
 # convert dates
 x$dateP - unclass(as.POSIXct(x$date))
 # matches for F20
 F20 - grep(F20, paste(x$dg1, x$dg2))
 # matches for F21 - F29
 F21 - grep(F2[1-9], paste(x$dg1, x$dg2))
 # grouping
 x$F20 - x$F21 - NA
 x$F20[F20] - rank(x$dateP[F20])
 x$F21[F21] - rank(x$dateP[F21])
 x
  id  dg1   dg2   date  X  dateP F21 F20
1  1  F28   1997-11-04 NA  878601600   1  NA
2  1  F20  F702 1998-11-09 NA  910569600  NA   2
3  1  F20   1997-12-03 NA  881107200  NA   1
4  1 F208   2001-03-18 NA  984873600  NA   3
5  2  F32   1999-03-07 NA  920764800  NA  NA
6  2  F29   F32 2000-01-06 NA  947116800   2  NA
7  2  F32   2003-07-05 NA 1057363200  NA  NA
8  2 F323 F2800 2000-02-05 NA  949708800   3  NA


On 8/9/07, David Gyllenberg [EMAIL PROTECTED] wrote:
Best R-users,

  Here's a  newbie question. I have tried to find an answer to this via 
 help and the ave(x,factor(),FUN=function(y)  rank (z,tie='first')-function, 
 but without success.

  I have a dataframe  (~8000 observations, registerdata) with four 
 columns: id, dg1, dg2 and date(-MM-DD)  of interest:

  id;dg1;dg2;date;
  1;F28;;1997-11-04;
  1;F20;F702;1998-11-09;
  1;F20;;1997-12-03;
  1;F208;;2001-03-18;
  2;F32;;1999-03-07;
  2;F29;F32;2000-01-06;
  2;F32;;2003-07-05;
  2;F323;F2800;2000-02-05;
  ...

  I would  like o have two additional columns:
  1. countF20:  a countvariable that shows which in order (by date) 
 the id has if it fulfils  the following logical expression: dg1 = F20* OR dg2 
 = F20*,
  where *  means F201,F202... F2001,F2002...F20001,F20002...
  2. countF2129:  another countvariable that shows which in order (by 
 date) the id has if it fulfils  the following logical expression: dg1 = 
 F21*-F29* OR dg2 = F21*-F29*,
  where F21*-F29*  means F21*, F22*...F29* and
  where *  means F211,F212... F2101,F2102...F21001,F21002...

  ... so the  dataframe would look like this, where 1 is the first 
 observation for the id with  the right condition, 2 is the second etc.:

  id;dg1;dg2;date;countF20;countF2129;
  1;F28;;1997-11-04;;1;
  1;F20;F702;1998-11-09;2;;
  1;F20;;1997-12-03;1;;
  1;F208;;2001-03-18;3;;
  2;F32;;1999-03-07;;;
  2;F29;F32;2000-01-06;;1;
  2;F32;;2003-07-05;;;
  2;F323;F2800;2000-02-05;;2;
  ...

  Do you know  a convenient way to create these kind of countvariables? 
 Thank you in  advance!

  / David (david.gyllenberg  at  yahoo.com


 -
 Park yourself in front of a world of choices in alternative vehicles.

[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ARIMA fitting

2007-08-09 Thread laura
Hello,
I‘m trying to fit an ARIMA process, using STATS package, arima function.
Can I expect, that fitted model with any parameters is stationary, causal
and invertible?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interior branch test

2007-08-09 Thread Nora Muda
Dear R users,

Does anyone know which package provide interior branch test for phylogenetic 
tree with distance based method? 

Any helps are really appreciated.

Thank you.

Nora.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lo and MacKinlay variance ratio test (Lo.Mac)

2007-08-09 Thread Cassius2

Hi all,

I am trying to calculate the variance ratio of a time series under
heteroskedasticity.
So I know that the variance ratio should be calculated as a weighted average
of autocorrelations.
But I don't find the same results when I calculate the variance ratio
manually and when I compute the M2 (M2 for heteroskedasticity) variance
ratio using  Lo.Mac function in R.

Anybody knows what formula R is using to calculate the M2 statistics?
-- 
View this message in context: 
http://www.nabble.com/Lo-and-MacKinlay-variance-ratio-test-%28Lo.Mac%29-tf4232129.html#a12040466
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Countvariable for id by date

2007-08-09 Thread Gabor Grothendieck
Try this:

Lines - id;dg1;dg2;date;
1;F28;;1997-11-04;
1;F20;F702;1998-11-09;
1;F20;;1997-12-03;
1;F208;;2001-03-18;
2;F32;;1999-03-07;
2;F29;F32;2000-01-06;
2;F32;;2003-07-05;
2;F323;F2800;2000-02-05;


# replace textConnection(Lines) with actual file name
DF - read.csv2(textConnection(Lines), as.is = TRUE,
 colClasses = list(numeric, character, character, Date, NULL))

rk - function(x, pat) {
  z - regexpr(pat, x$dg1)  0 | regexpr(pat, x$dg2)  0
  rank(ifelse(z, x$date, NA), na.last = keep)
}

DF$countF20 - unlist(by(DF, DF$id, rk, pat = ^F20))
DF$countF2129 - unlist(by(DF, DF$id, rk, pat = ^F2[1-9]))
DF




On 8/9/07, David Gyllenberg [EMAIL PROTECTED] wrote:
Best R-users,

  Here's a  newbie question. I have tried to find an answer to this via 
 help and the ave(x,factor(),FUN=function(y)  rank (z,tie='first')-function, 
 but without success.

  I have a dataframe  (~8000 observations, registerdata) with four 
 columns: id, dg1, dg2 and date(-MM-DD)  of interest:

  id;dg1;dg2;date;
  1;F28;;1997-11-04;
  1;F20;F702;1998-11-09;
  1;F20;;1997-12-03;
  1;F208;;2001-03-18;
  2;F32;;1999-03-07;
  2;F29;F32;2000-01-06;
  2;F32;;2003-07-05;
  2;F323;F2800;2000-02-05;
  ...

  I would  like o have two additional columns:
  1. countF20:  a countvariable that shows which in order (by date) 
 the id has if it fulfils  the following logical expression: dg1 = F20* OR dg2 
 = F20*,
  where *  means F201,F202... F2001,F2002...F20001,F20002...
  2. countF2129:  another countvariable that shows which in order (by 
 date) the id has if it fulfils  the following logical expression: dg1 = 
 F21*-F29* OR dg2 = F21*-F29*,
  where F21*-F29*  means F21*, F22*...F29* and
  where *  means F211,F212... F2101,F2102...F21001,F21002...

  ... so the  dataframe would look like this, where 1 is the first 
 observation for the id with  the right condition, 2 is the second etc.:

  id;dg1;dg2;date;countF20;countF2129;
  1;F28;;1997-11-04;;1;
  1;F20;F702;1998-11-09;2;;
  1;F20;;1997-12-03;1;;
  1;F208;;2001-03-18;3;;
  2;F32;;1999-03-07;;;
  2;F29;F32;2000-01-06;;1;
  2;F32;;2003-07-05;;;
  2;F323;F2800;2000-02-05;;2;
  ...

  Do you know  a convenient way to create these kind of countvariables? 
 Thank you in  advance!

  / David (david.gyllenberg  at  yahoo.com


 -
 Park yourself in front of a world of choices in alternative vehicles.

[[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rcmdr window border lost

2007-08-09 Thread Andy Weller
OK, I tried completely removing and reinstalling R, but this has not 
worked - I am still missing window borders for Rcmdr. I am certain that 
everything is installed correctly and that all dependencies are met - 
there must be something trivial I am missing?!

Thanks in advance, Andy

Andy Weller wrote:
 Dear all,
 
 I have recently lost my Rcmdr window borders (all my other programs have 
 borders)! I am unsure of what I have done, although I have recently 
 update.packages() in R... How can I reclaim them?
 
 I am using:
 Ubuntu Linux (Feisty)
 R version 2.5.1
 R Commander Version 1.3-5
 
 I have deleted the folder: /usr/local/lib/R/site-library/Rcmdr and 
 reinstalled Rcmdr with: install.packages(Rcmdr, dep=TRUE)
 
 This has not solved my problem though.
 
 Maybe I need to reinstall something else as well?
 
 Thanks in advance, Andy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help using gPath

2007-08-09 Thread Paul Murrell
Hi


Emilio Gagliardi wrote:
 Hi everyone,I'm trying to figure out how to use gPath and the documentation
 is not very helpful :(
 
 I have the following plot object:
 plot-surrounds::
  background
  plot.gTree.378::
   background
   guide.gTree.355:: (background.rect.345, minor-horizontal.segments.347,
 minor-vertical.segments.349, major-horizontal.segments.351,
 major-vertical.segments.353)
   guide.gTree.356:: (background.rect.345, minor-horizontal.segments.347,
 minor-vertical.segments.349, major-horizontal.segments.351,
 major-vertical.segments.353)
   yaxis.gTree.338::
ticks.segments.321
labels.gTree.335:: (label.text.324, label.text.326, label.text.328,
 label.text.330, label.text.332, label.text.334)
   xaxis.gTree.339::
ticks.segments.309
labels.gTree.315:: (label.text.312, label.text.314)
   xaxis.gTree.340::
ticks.segments.309
labels.gTree.315:: (label.text.312, label.text.314)
   strip.gTree.364:: (background.rect.361, label.text.363)
   strip.gTree.370:: (background.rect.367, label.text.369)
   guide.rect.357
   guide.rect.358
   boxplots.gTree.283::
geom_boxplot.gTree.273:: (GRID.segments.267, GRID.segments.268,
 geom_bar.rect.270, geom_bar.rect.272)
geom_boxplot.gTree.281:: (GRID.segments.275, GRID.segments.276,
 geom_bar.rect.278, geom_bar.rect.280)
   boxplots.gTree.301::
geom_boxplot.gTree.291:: (GRID.segments.285, GRID.segments.286,
 geom_bar.rect.288, geom_bar.rect.290)
geom_boxplot.gTree.299:: (GRID.segments.293, GRID.segments.294,
 geom_bar.rect.296, geom_bar.rect.298)
   geom_jitter.points.303
   geom_jitter.points.305
   guide.rect.357
   guide.rect.358
  ylabel.text.382
  xlabel.text.380
  title


It would be easier to help if we also had the code used to produce this 
plot, but in the meantime ...


 Could someone be so kind and create the proper call to grid.gedit() to
 access a couple of different aspects of this graph?
 I tried:
 grid.gedit(gPath(ylabel.text.382,labels), gp=gpar(fontsize=16)) # error


That is looking for a grob called labels that is the child of a grob 
called ylabel.text.382.  I can see a grob called ylabel.text.382, 
but it has no children.  Try just ...

grid.gedit(gPath(ylabel.text.382), gp=gpar(fontsize=16))


 I'd like to change the margins on the label for the yaxis (not the tick
 marks) to put more space between the label and the tick marks.  I'd also


Margins may be tricky because it likely depends on a layout generated by 
ggplot;   Hadley Wickham may have to help us out with a ggplot argument 
here ... (?)


 like to remove the left border on the first panel.  I'd like to adjust the


I'd guess you'd have to remove the grob background.rect.345 and then 
draw in just the sides you want, which would require getting to the 
right viewport, for which you'll need to study the viewport tree (see 
current.vpTree())


 size of the font for the axis labels independently of the tick marks. I'd


That's the one we've already done, right?


 like to change the color of the lines that make up the boxplots.  Plus, I'd


Something like ...

grid.gedit(geom_bar.rect, gp=gpar(col=green))

...?

Again, it would really help to have some code to run.

Paul


 like to change the margins of the strip labels. If you could show me a
 couple of examples I'm sure I cold get the rest working.
 
 Thanks so much,
 emilio
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
[EMAIL PROTECTED]
http://www.stat.auckland.ac.nz/~paul/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Frank E Harrell Jr
[EMAIL PROTECTED] wrote:
 Greg, I'm going to join issue with your here! Not that I'll go near
 advocating Excel-style graphics (abominable, and the Patrick Burns
 URL which you cite is remarkable in its restraint). Also, I'm aware
 that this is potential flame-war territory --  again, I want to avoid
 that too.
 
 However, this is the second time you have intervened on this theme
 (previously Mon 6 August), along with John Kane on Wed 1 August and
 again today on similar lines, and I think it's time an alternative
 point of view was presented, to counteract (I hope usefully) what
 seems to be a draconianly prescriptive approach to the presentation
 of information.


---snip---

Ted,

You make many excellent points and provide much food for thought.  I 
still think that Greg's points are valid too, and in this particular 
case, bar plots are a bad choice and adding numbers at variable heights 
causes a perception error as I wrote previously.

Thanks for your elaboration on this important subject.

Frank

 
 On 07-Aug-07 21:37:50, Greg Snow wrote:
 Generally adding the numbers to a graph accomplishes 2 things:

 1) it acts as an admission that your graph is a failure
 
 Generally, I disagree. Different elements in a display serve different
 purposes, according to the psychological aspects of visual preception.

. . .

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regsubsets statistics

2007-08-09 Thread Thomas Lumley
On Wed, 8 Aug 2007, Markus Brugger wrote:


 Dear R-help,

 I have used the regsubsets function from the leaps package to do subset
 selection of a logistic regression model with 6 independent variables and
 all possible ^2 interactions. As I want to get information about the
 statistics behind the selection output, I´ve intensively searched the
 mailing list to find answers to following questions:

 1. What should I do to get the statistics behind the selection (e.g. BIC)?
 summary.regsubsets(object) just returns * meaning in or   meaning out.
 For the plot function generates BICs, it is obviously that these values must
 be computed and available somewhere, but where? Is it possible to directly
 get AIC values instead of BIC?

These statistics are in the object returned by summary(). Using the first 
example from the help page
 names(summary(a))
[1] which  rsqrssadjr2  cp bicoutmat obj   
 summary(a)$bic
[1] -19.60287 -28.61139 -35.65643 -37.23388 -34.55301


 2. As to the plot function, I´ve encountered a problem with setting the ylim
 argument. I fear that this (nice!) particular plot function ignores many of
 these additional arguments. How can I nevertheless change this setting?

You can't (without modifying the plot function). The ... argument is required 
for inheritance [ie, required for R CMD check] but it doesn't take graphical 
parameters

 3. For it is not explicitly mentioned in the manual, can I really use
 regsubsets for logistic regression?


No.  If your data set is large enough relative to the number of variables, you 
can fit a model with all variables and then apply regsubsets() to the weighted 
linear model arising from the IWLS algorithm.  This will give an approximate 
ranking of models that you can then refit exactly. This is useful if you wanted 
to summarize the best few thousand models on 30 variables but not if you want a 
single model. On the other hand, regsubsets() isn't useful if you want a single 
model anyway.


 -thomas



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Gabor Grothendieck
You could put the numbers inside the bars in which
case it would not add to the height of the bar:

x - 1:5
names(x) - letters[1:5]
bp - barplot(x)
text(bp, x - .02 * diff(par(usr)[3:4]), x)

On 8/9/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
  Greg, I'm going to join issue with your here! Not that I'll go near
  advocating Excel-style graphics (abominable, and the Patrick Burns
  URL which you cite is remarkable in its restraint). Also, I'm aware
  that this is potential flame-war territory --  again, I want to avoid
  that too.
 
  However, this is the second time you have intervened on this theme
  (previously Mon 6 August), along with John Kane on Wed 1 August and
  again today on similar lines, and I think it's time an alternative
  point of view was presented, to counteract (I hope usefully) what
  seems to be a draconianly prescriptive approach to the presentation
  of information.


 ---snip---

 Ted,

 You make many excellent points and provide much food for thought.  I
 still think that Greg's points are valid too, and in this particular
 case, bar plots are a bad choice and adding numbers at variable heights
 causes a perception error as I wrote previously.

 Thanks for your elaboration on this important subject.

 Frank

 
  On 07-Aug-07 21:37:50, Greg Snow wrote:
  Generally adding the numbers to a graph accomplishes 2 things:
 
  1) it acts as an admission that your graph is a failure
 
  Generally, I disagree. Different elements in a display serve different
  purposes, according to the psychological aspects of visual preception.

 . . .

 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with Filtering (interest rates related)

2007-08-09 Thread Bernardo Ribeiro
Dear, r-help,

Long time reader, first time poster,

I'm working on a paper regarding a term structure estimation using the
Kalman Filter Algorithm. The model in question is the Generalized Vasicek,
and since there are coupon-bonds being estimated, I'm supposed to make some
changes on the Kalman Filter.

Does anyone has already used R for these purposes? Any tips?

Does anyone has a Kalman Filter code I could use as a starting point for an
Extended Kalman Filter Approach?

Thanks a lot for the patience and time,

Bernardo Ribeiro

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Frank E Harrell Jr
Gabor Grothendieck wrote:
 You could put the numbers inside the bars in which
 case it would not add to the height of the bar:

I think the Cleveland/Tufte prescription would be much different: 
horizontal dot charts with the numbers in the right margin.  I do this 
frequently with great effect.  The Hmisc dotchart2 function makes this easy.

Frank

 
 x - 1:5
 names(x) - letters[1:5]
 bp - barplot(x)
 text(bp, x - .02 * diff(par(usr)[3:4]), x)
 
 On 8/9/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 Greg, I'm going to join issue with your here! Not that I'll go near
 advocating Excel-style graphics (abominable, and the Patrick Burns
 URL which you cite is remarkable in its restraint). Also, I'm aware
 that this is potential flame-war territory --  again, I want to avoid
 that too.

 However, this is the second time you have intervened on this theme
 (previously Mon 6 August), along with John Kane on Wed 1 August and
 again today on similar lines, and I think it's time an alternative
 point of view was presented, to counteract (I hope usefully) what
 seems to be a draconianly prescriptive approach to the presentation
 of information.

 ---snip---

 Ted,

 You make many excellent points and provide much food for thought.  I
 still think that Greg's points are valid too, and in this particular
 case, bar plots are a bad choice and adding numbers at variable heights
 causes a perception error as I wrote previously.

 Thanks for your elaboration on this important subject.

 Frank

 On 07-Aug-07 21:37:50, Greg Snow wrote:
 Generally adding the numbers to a graph accomplishes 2 things:

 1) it acts as an admission that your graph is a failure
 Generally, I disagree. Different elements in a display serve different
 purposes, according to the psychological aspects of visual preception.
 . . .


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARIMA fitting

2007-08-09 Thread Prof Brian Ripley

On Tue, 7 Aug 2007, [EMAIL PROTECTED] wrote:


Hello,
I‘m trying to fit an ARIMA process, using STATS package, arima function.
Can I expect, that fitted model with any parameters is stationary, causal
and invertible?


Please read ?arima: it answers all your questions, and points out that the 
answer depends on the arguments passed to arima().


The posting guide did ask you to do this *before* posting: please study it 
more carefully.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpret impulse response functions from irf in MSBVAR library

2007-08-09 Thread sj
Hello,

I am wondering if anyone knows how to interpret the values returned by irf
function in the MSBVAR library. Some of the literature I have read indicates
that impulse responses in the dependent variables are often based on a 1
unit change in the independent variable, but other sources suggest that they
are based on a a change of 1 standard deviation. Any ideas which irf uses to
compute the irf? The documentation is not very clear.

Thanks,


Spencer

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AlgDesign expand.formula()

2007-08-09 Thread S Ellison
Can anyone explain why AlgDesign's expand.formula help and output differ?


#From help:
#  quad(A,B,C) makes ~(A+B+C)^2+I(A^2)+I(B^2)+I(C^2)

expand.formula(~quad(A+B+C))
#actually gives ~(A + B + C)^2 + I(A + B + C^2)

They don't _look_ the same...

Steve E

***
This email contains information which may be confidential and/or privileged, 
and is intended only for the individual(s) or organisation(s) named above. If 
you are not the intended recipient, then please note that any disclosure, 
copying, distribution or use of the contents of this email is prohibited. 
Internet communications are not 100% secure and therefore we ask that you 
acknowledge this. If you have received this email in error, please notify the 
sender or contact +44(0)20 8943 7000 or [EMAIL PROTECTED] immediately, and 
delete this email and any attachments and copies from your system. Thank you. 

LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex TW11 0LY, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mac OSX fonts in R plots

2007-08-09 Thread Fernando Diaz
I had been looking for information about including OSX fonts in R
plots for a long time and never quite found the answer.  I spent an
hour or so gathering together the following solution which, as far as
I have tested, works.  I'm posting this for feedback and and
archiving.  I'd be interested in any caveats about the brittleness of
the technique.

Thanks.

F

-

1. Find font
system font path: /Library/Fonts/
2. Extract ttf (if necessary) with fondu [http://fondu.sourceforge.net/]
eg, fondu -force Optima.dfont
3. ttf2asm for each ttf file, stripping the Copyright and warning
3. copy files to
RHOME/library/grDevices/afm
(usually,

/Library/Frameworks/R.framework/Versions/2.5/Resources/library/grDevices/afm
)
4. R code to use the font; eg,
newfont= 
Type1Font(Optima,c(OptimaRegular.afm,OptimaBold.afm,OptimaItalic.afm,OptimaBoldItalic.afm))
pdf(newfont.pdf,version = 1.4,family=newfont)
plot(rnorm,col=red)
dev.off()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Memory problem

2007-08-09 Thread Gang Chen
I got a long list of error message repeating with the following 3  
lines when running the loop at the end of this mail:

R(580,0xa000ed88) malloc: *** vm_allocate(size=327680) failed (error  
code=3)
R(580,0xa000ed88) malloc: *** error: can't allocate region
R(580,0xa000ed88) malloc: *** set a breakpoint in szone_error to debug

There are 2 big arrays, IData (54x64x50x504) and Stat (4x64x50x9), in  
the code. They would only use about 0.8GB of memory. However when I  
check the memory usage during the looping, the memory usage keeps  
growing and finally reaches the memory limit of my computer, 4GB, and  
spills the above error message.

Is there something in the loop about lme that is causing memory  
leaking? How can I clean up the memory usage in the loop?

Thank you very much for your help,
Gang




tag - 0; dimx-54; dimy-64; dimz-50; NoF-8; NoFile-504;

IData - array(data=NA, dim=c(dimx, dimy, dimz, NoFile));
Stat - array(data=NA, dim=c(dimx, dimy, dimz, NoF));

for (i in 1:NoFile) {
IData[,,,i] - fill in the data for array IData here;
}

for (i in 1:dimx) {
for (j in 1:dimy) {
for (k in 1:dimz) {
for (m in 1:NoFile) {
Model$Beta[m] - IData[i, j, k, m];
}
try(fit.lme - lme(Beta ~ group*session*difficulty+FTND, random =  
~1|Subj, Model), tag - 1);
if (tag != 1) {
   Stat[i, j, k,] - anova(fit.lme)$F[-1];  
}
else {
Stat[i, j, k,] - rep(0, NoF-1);
}
tag - 0;
}
}
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rcmdr window border lost

2007-08-09 Thread Peter Dalgaard
Andy Weller wrote:
 OK, I tried completely removing and reinstalling R, but this has not 
 worked - I am still missing window borders for Rcmdr. I am certain that 
 everything is installed correctly and that all dependencies are met - 
 there must be something trivial I am missing?!

 Thanks in advance, Andy

 Andy Weller wrote:
   
 Dear all,

 I have recently lost my Rcmdr window borders (all my other programs have 
 borders)! I am unsure of what I have done, although I have recently 
 update.packages() in R... How can I reclaim them?

 I am using:
 Ubuntu Linux (Feisty)
 R version 2.5.1
 R Commander Version 1.3-5

 
This sort of behaviour is usually the fault of the window manager, not 
R/Rcmdr/tcltk. It's the WM's job to supply the various window 
decorations on a new window, so either it never got told that there was 
a window, or it somehow got into a confused state. Did you try 
restarting the WM (i.e., log out/in or reboot)? And which WM are we 
talking about?

Same combination works fine on Fedora 7, except for a load of messages 
saying

Warning: X11 protocol error: BadWindow (invalid Window parameter)


 I have deleted the folder: /usr/local/lib/R/site-library/Rcmdr and 
 reinstalled Rcmdr with: install.packages(Rcmdr, dep=TRUE)

 This has not solved my problem though.

 Maybe I need to reinstall something else as well?

 Thanks in advance, Andy
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tcltk error on Linux

2007-08-09 Thread Seth Falcon
Hi Mark,

Prof Brian Ripley [EMAIL PROTECTED] writes:
 On Thu, 9 Aug 2007, Mark W Kimpel wrote:

 I am having trouble getting tcltk package to load on openSuse 10.2
 running R-devel. I have specifically put my /usr/share/tcl directory in
 my PATH, but R doesn't seem to see it. I also have installed tk on my
 system. Any ideas on what the problem is?

Any chance you are running R on a remote server using an ssh session?
If that is the case, you may have an ssh/X11 config issue that
prevents using tcl/tk from such a session.

Rerun the configure script for R and verify that tcl/tk support is
listed in the summary.

 Also, note that I have some warning messages on starting up R, not sure
 what they mean or if they are pertinent.

 Those are coming from a Bioconductor package: again you must be using 
 development versions with R-devel and those are not stable (last time I 
 looked even Biobase would not install, and the packages change
 daily).

BioC devel tracks R-devel, but not on a daily basis -- because R
changes daily.  The recent issues with Biobase are a result of changes
to R and have already been fixed.

 If you have all those packages in your startup, please don't -- there will 
 be a considerable performance hit so only load them when you need
 them.

Presumably, that's why they are there in the first place.  The warning
messages are a problem and suggest some needed improvements to the
methods packages.  These are being worked on.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Systematically biased count data regression model

2007-08-09 Thread Matthew and Kim Bowser
Dear all,

I am attempting to explain patterns of arthropod family richness
(count data) using a regression model.  It seems to be able to do a
pretty good job as an explanatory model (i.e. demonstrating
relationships between dependent and independent variables), but it has
systematic problems as a predictive model:  It is biased high at low
observed values of family richness and biased low at high observed
values of family richness (see attached pdf).  I have tried diverse
kinds of reasonable regression models mostly as in Zeileis, et al.
(2007), as well as transforming my variables, both with only small
improvements.

Do you have suggestions for making a model that would perform better
as a predictive model?

Thank you for your time.

Sincerely,

Matthew Bowser

STEP student
USFWS Kenai National Wildlife Refuge
Soldotna, Alaska, USA

M.Sc. student
University of Alaska Fairbanks
Fairbankse, Alaska, USA

Reference

Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
count data in R. Technical Report 53, Department of Statistics and
Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

Code

`data` -
structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318,
0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2,
0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38,
0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288,
0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162,
0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306,
0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208,
0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25,
0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462,
0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275,
0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135,
0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325,
0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047,
-0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0,
0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467,
-0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228,
0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 

Re: [R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Greg Snow
Ted, 

Thanks for your thoughts.  I don't take it as the start of a flame war
(I don't want that either).

My original intent was to get the original posters out of the mode of
thinking they want to match what the spreadsheet does and into thinking
about what message they are trying to get across.  To get them (and
possibly others) thinking I made the statements a bit more bold than my
actual position (I did include a couple of qualifiers).  Now that there
has been a couple of days to think about it, your post adds some good
depth to the discussion.

I think the most important point (which I think we agree on) is not to
just add something to a graph because you can (or someone else did), but
to think through if it is benificial or not (which will depend on the
graph, data, questions, etc.).

There are ways to combine graphs and tables, sparklines are an upcoming
way of including the power of graphs into a table.  Another approach for
the bar graph example would be to first replace the bargraph with a
dotplot, then put the numbers into the margin so that they are properly
lined up and not distracting from the points.

I still think that anytime anyone is tempted to add data values to a
graph they should ask themselves if that is an admission that the graph
is not appropriate and would be better replaced by either a table (if
the goal really is to look up specific values) or a better graph.
Sometimes the answer will be yes, the question of interest, or the
obvious follow-up question, will be answered by adding some additional
information.  Then the next question should be: which information to
include? And where to put it?

Can you imagine what Minard's graph would have looked like if he had
included the numbers every time the total changed by 100, and put the
temperatures as numbers instead of a line graph in the main plot at
every 1 degree change?  

Thanks for adding depth to the discussion,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Wednesday, August 08, 2007 3:53 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Subject: Re: how to include bar values in a barplot?
 
 Greg, I'm going to join issue with your here! Not that I'll 
 go near advocating Excel-style graphics (abominable, and 
 the Patrick Burns URL which you cite is remarkable in its 
 restraint). Also, I'm aware that this is potential flame-war 
 territory --  again, I want to avoid that too.
 
 However, this is the second time you have intervened on this 
 theme (previously Mon 6 August), along with John Kane on Wed 
 1 August and again today on similar lines, and I think it's 
 time an alternative point of view was presented, to 
 counteract (I hope usefully) what seems to be a draconianly 
 prescriptive approach to the presentation of information.
 
 On 07-Aug-07 21:37:50, Greg Snow wrote:
  Generally adding the numbers to a graph accomplishes 2 things:
 
  1) it acts as an admission that your graph is a failure
 
 Generally, I disagree. Different elements in a display serve 
 different purposes, according to the psychological aspects of 
 visual preception.
 
 Sizes, proportions, colours etc. of shapes (bars in a 
 histogram, the marks representing points in a scatterplot, 
 ... ) are interpreted, so to speak, intuitively -- the 
 resulting perception is formed by processes which are hard to 
 ascertain consciously, and the overall effect can only be 
 ascertained by looking at it, and noting what impression one 
 has formed. They stimulate mental responses in the domain of 
 perception of spatial relationships.
 
 Numbers, and text, on the other hand, while still shapes from 
 the optical point of view, up to the point of their impact on 
 the retina, provoke different perceptions. They are 
 interpreted analytically
 stimulating mental responses in the domains of language and number.
 
 There is no Law whatever which requires that the two must be 
 separated.
 
 It may be that adding any annotation to a graph or diagram 
 will interfere with the intuitive imterpretation that the 
 diagram is intended to stimulate, with no associated benefit.
 
 It may be that presenting numerical/textual information 
 within a graphical/diagrammatic context will interfere with 
 the analytic
 interpretation wich is desired, with no associated benefit.
 
 In such cases, it is clearly (and as a matter of fact to be 
 decided in each case) better to separate the two apsects.
 
 It may, however, be that both can be combined in such a way 
 that each enhances the other; and also the simultaneous 
 perception of both aspects induces a cartesian-product 
 richness of interpretation where each element of the 
 graphical presentation combines with each element of the 
 textual/numerical presentation to generate a perception which 
 could not possibly have been realised 

Re: [R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Greg Snow
Gabor,

Putting the numbers in the bars is an improvement over putting them over
the bars, but if the numbers are large relative to the bars, this could
still create a fuzzy top to the bars making them harder to compare.
This also has the problem of the poorly laid out table, numbers are
easiest to compare if they are aligned (and vertical comparisons are
easier than horizontal).

There is also the issue of scale.  You can shrink a barplot quite a bit
and still get a good overview of the relationships, but if you need the
numbers inside the plot, then either the numbers become to small to
easily read, or the numbers stay big and overwhelm the plot.

The best approach is to switch to a dotplot with the numbers in the
margin (Frank has suggested this as well).  If you need to stay with the
bar plot (some lay people are still more comfortable with them until we
can educate them to prefer the dot plots) then I would suggest doing
horizontal bars with the numbers in the margin (vertically aligned).  If
the vertical bars are necessary, then putting the numbers below the bars
(but separated enough that they don't interfere with a clear zero point)
seems the safest approach.
 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Gabor 
 Grothendieck
 Sent: Thursday, August 09, 2007 6:55 AM
 To: Frank E Harrell Jr
 Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
 Subject: Re: [R] Subject: Re: how to include bar values in a barplot?
 
 You could put the numbers inside the bars in which case it 
 would not add to the height of the bar:
 
 x - 1:5
 names(x) - letters[1:5]
 bp - barplot(x)
 text(bp, x - .02 * diff(par(usr)[3:4]), x)
 
 On 8/9/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote:
   Greg, I'm going to join issue with your here! Not that 
 I'll go near 
   advocating Excel-style graphics (abominable, and the 
 Patrick Burns 
   URL which you cite is remarkable in its restraint). Also, 
 I'm aware 
   that this is potential flame-war territory --  again, I want to 
   avoid that too.
  
   However, this is the second time you have intervened on 
 this theme 
   (previously Mon 6 August), along with John Kane on Wed 1 
 August and 
   again today on similar lines, and I think it's time an 
 alternative 
   point of view was presented, to counteract (I hope usefully) what 
   seems to be a draconianly prescriptive approach to the 
 presentation 
   of information.
 
 
  ---snip---
 
  Ted,
 
  You make many excellent points and provide much food for 
 thought.  I 
  still think that Greg's points are valid too, and in this 
 particular 
  case, bar plots are a bad choice and adding numbers at variable 
  heights causes a perception error as I wrote previously.
 
  Thanks for your elaboration on this important subject.
 
  Frank
 
  
   On 07-Aug-07 21:37:50, Greg Snow wrote:
   Generally adding the numbers to a graph accomplishes 2 things:
  
   1) it acts as an admission that your graph is a failure
  
   Generally, I disagree. Different elements in a display serve 
   different purposes, according to the psychological 
 aspects of visual preception.
 
  . . .
 
  --
  Frank E Harrell Jr   Professor and Chair   School 
 of Medicine
   Department of Biostatistics   
 Vanderbilt University
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Systematically biased count data regression model

2007-08-09 Thread paulandpen
Matthew

it is possible that your results are suffering from heterogeneity, it may be 
that your model performs well at the aggregate level and this would explain 
good aggregate fit levels and decent predictive performance etc,

you could perhaps look at a 'latent' approach to modelling your data, in 
other words, see if there is something unique in the cases/data/observations 
in the lower and upper levels of the model (where prediction is poor) and 
whether it is justified that you model these count areas as spearate and 
unique from the generic aggregate level model (in other words there is 
something unobserved/unmeasurted or latent etc in your popn of observations 
that could causing some observations to behave uniquely overall

hth

thanks Paul
- Original Message - 
From: Matthew and Kim Bowser [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, August 10, 2007 1:43 AM
Subject: [R] Systematically biased count data regression model


 Dear all,

 I am attempting to explain patterns of arthropod family richness
 (count data) using a regression model.  It seems to be able to do a
 pretty good job as an explanatory model (i.e. demonstrating
 relationships between dependent and independent variables), but it has
 systematic problems as a predictive model:  It is biased high at low
 observed values of family richness and biased low at high observed
 values of family richness (see attached pdf).  I have tried diverse
 kinds of reasonable regression models mostly as in Zeileis, et al.
 (2007), as well as transforming my variables, both with only small
 improvements.

 Do you have suggestions for making a model that would perform better
 as a predictive model?

 Thank you for your time.

 Sincerely,

 Matthew Bowser

 STEP student
 USFWS Kenai National Wildlife Refuge
 Soldotna, Alaska, USA

 M.Sc. student
 University of Alaska Fairbanks
 Fairbankse, Alaska, USA

 Reference

 Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
 count data in R. Technical Report 53, Department of Statistics and
 Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
 http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

 Code

 `data` -
 structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
 0.23, 0.239, -0.12, 

Re: [R] small sample techniques

2007-08-09 Thread Nair, Murlidharan T
Thanks, that discussion was helpful. Well, I have another question 
I am comparing two proportions for its deviation from the hypothesized
difference of zero. My manually calculated z ratio is 1.94. 
But, when I calculate it using prop.test, it uses Pearson's chi-squared
test and the X-squared value that it gives it 0.74. Is there a function
in R where I can calculate the z ratio? Which is 


   ('p1-'p2)-(p1-p2)
 Z= 
 S
('p1-'p2)

Where S is the standard error estimate of the difference between two
independent proportions

Dummy example 
This is how I use it 
prop.test(c(30,23),c(300,300))


Cheers../Murli





-Original Message-
From: Moshe Olshansky [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 09, 2007 12:01 AM
To: Rolf Turner; r-help@stat.math.ethz.ch
Cc: Nair, Murlidharan T; Moshe Olshansky
Subject: Re: [R] small sample techniques

Well, this an explanation of what is done in the
paired t-test (and why the number of df is as it is).
I was too lazy to write all this.
It is nice that some list members are less lazy!

--- Rolf Turner [EMAIL PROTECTED] wrote:

 
 On 9/08/2007, at 2:57 PM, Moshe Olshansky wrote:
 
  As Thomas Lumley noted, there exist several
 versions
  of t-test.
 
   snip
 
  If you use t3 - t.test(x,y,paired=TRUE) then
 equal
  sample sizes are assumed and the number of degrees
 of
  freedom is 4 (5-1).
 
   This is seriously misleading.  The assumption is
 not that the sample  
 sizes
   are equal, but rather that there is ***just one
 sample***, namely  
 the sample of differences.
 
   More explicitly the assumptions are that
 
   x_i - y_i
 
   are i.i.d. Gaussian with mean mu and variance
 sigma^2.
 
   One is trying to conduct inference about mu, of
 course.
 
   It should also be noted that it is a crucial
 assumption for the  
 ``non-paired''
   t-test that the two samples be ***independent*** of
 each other, as  
 well as
   being Gaussian.
 
   None of this is however germane to Nair's original
 question; it is  
 clear
   that he is interested in a two-independent-sample
 t-test.
 
   cheers,
 
   Rolf Turner
 

##
 Attention: 
 This e-mail message is privileged and confidential.
 If you are not the 
 intended recipient please delete the message and
 notify the sender. 
 Any views or opinions presented are solely those of
 the author.
 
 This e-mail has been scanned and cleared by
 MailMarshal 
 www.marshalsoftware.com

##


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help using gPath

2007-08-09 Thread Emilio Gagliardi
Hi Paul,

I'm sorry for not posting code, I wasn't sure if it would be helpful without
the data...should I post the code and a sample of the data?  I will remember
to do that next time!


 grid.gedit(gPath(ylabel.text.382), gp=gpar(fontsize=16))


OK, I think my confusion comes from the notation that current.grobTree()
produces and what strings are required in order to make changes to the
underlying grobs.
But, from what you've provided, it looks like I can access each grob with
its unique name, regardless of which parent it is nested in...that helps


 like to remove the left border on the first panel.  I'd like to adjust the


 I'd guess you'd have to remove the grob background.rect.345 and then
 draw in just the sides you want, which would require getting to the
 right viewport, for which you'll need to study the viewport tree (see
 current.vpTree())


I did some digging into this and it seems pretty complicated, is there an
example anywhere that makes sense to the beginner? The whole viewport grob
relationship is not clear to me. So, accessing viewports and removing
objects and drawing new ones is beyond me at this point. I can get my mind
around your example below because I can see the object I want to modify in
the viewer, and the code changes a property of that object, click enter, and
bang the object changes.  When you start talking external pointers and
finding viewports and pushing and popping grobs I just get lost. I found the
viewports for the grobTree, it looks like this:

viewport[ROOT]-(viewport[layout]-(viewport[axis_h_1_1]-(viewport[bottom_axis]-(viewport[labels],
viewport[ticks])),
viewport[axis_h_1_2]-(viewport[bottom_axis]-(viewport[labels],
viewport[ticks])),
viewport[axis_v_1_1]-(viewport[left_axis]-(viewport[labels],
viewport[ticks])), viewport[panel_1_1], viewport[panel_1_2],
viewport[strip_h_1_1], viewport[strip_h_1_2], viewport[strip_v_1_1]))

at that point I was like, ok, I'm done. :S


 Something like ...

 grid.gedit(geom_bar.rect, gp=gpar(col=green))


 Again, it would really help to have some code to run.


My apologies, I thought the grobTree was sufficient in this case.  Thanks
very much for your help.

emilio

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to apply functions over rows of multiple matrices

2007-08-09 Thread Johannes Hüsing
Dear ExpRts,
I would like to perform a function with two arguments
over the rows of two matrices. There are a couple of
*applys (including mApply in Hmisc) but I haven't found
out how to do it straightforward.

Applying to row indices works, but looks like a poor hack
to me:

sens - function(test, gold) {
  if (any(gold==1)) {
sum(test[which(gold==1)]/sum(which(gold==1)))
  } else NA
}

numtest - 6
numsubj - 20

newtest - array(rbinom(numtest*numsubj, 1, .5),
dim=c(numsubj, numtest))
goldstandard - array(rbinom(numtest*numsubj, 1, .5),
dim=c(numsubj, numtest))

t(sapply(1:nrow(newtest), function(i) {
sens(newtest[i,], goldstandard[i,])}))

Is there any shortcut to sapply over the indices?

Best wishes


Johannes

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Seasonality

2007-08-09 Thread Alberto Monteiro
I have a time series x = f(t), where t is taken for each
month. What is the best function to detect if _x_ has a seasonal
variation? If there is such seasonal effect, what is the
best function to estimate it?

Function arima has a seasonal parameter, but I guess this is
too complex to be useful.

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to apply functions over rows of multiple matrices

2007-08-09 Thread Gabor Grothendieck
Is sens really what you want?  The denominator is the indexes,
e.g. if a row in goldstandard were c(0, 0, 1, 1) then you would
be dividing by 3+4.  Also test[which(gold == 1)] is the same
as test[gold == 1] which is the same test * gold since gold
has only 0 and 1's in it.  Perhaps what you really intend is to
take the average over those elements in each row of the first matrix
which correspond to 1's in the second in the corresponding
row of the second.  In that case its just:

rowSums(newtest * goldstandard) / rowSums(goldstandard)

On 8/9/07, Johannes Hüsing [EMAIL PROTECTED] wrote:
 Dear ExpRts,
 I would like to perform a function with two arguments
 over the rows of two matrices. There are a couple of
 *applys (including mApply in Hmisc) but I haven't found
 out how to do it straightforward.

 Applying to row indices works, but looks like a poor hack
 to me:

 sens - function(test, gold) {
  if (any(gold==1)) {
sum(test[which(gold==1)]/sum(which(gold==1)))
  } else NA
 }

 numtest - 6
 numsubj - 20

 newtest - array(rbinom(numtest*numsubj, 1, .5),
dim=c(numsubj, numtest))
 goldstandard - array(rbinom(numtest*numsubj, 1, .5),
dim=c(numsubj, numtest))

 t(sapply(1:nrow(newtest), function(i) {
sens(newtest[i,], goldstandard[i,])}))

 Is there any shortcut to sapply over the indices?

 Best wishes


 Johannes

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need Help: Installing/Using xtable package

2007-08-09 Thread M. Jankowski
Hi all,

Let me know if I need to ask this question of the bioconductor group.
I used the bioconductor utility to install this package and also the
CRAN package.install function.

My computer crashed a week ago. Today I reinstalled all my
bioconductor/R packages. One of my scripts is giving me the following
error:

in my script I set:
library(xtable)
print.xtable(

and receive this error:
Error : could not find function print.xtable

This is a new error and I cannot find the source.

I reinstalled xtable with the messages below(which are the same
whether I use CRAN or bioconductor):

Any help is appreciated! Thanks!
Matt


 biocLite(xtable)
Running biocinstall version 2.0.8 with R version 2.5.1
Your version of R requires version 2.0 of Bioconductor.
Warning in install.packages(pkgs = pkgs, repos = repos, dependencies = dependenc
ies,  :
 argument 'lib' is missing: using '/home/mdj/R/i486-pc-linux-gnu-library
/2.5'
trying URL 'http://cran.fhcrc.org/src/contrib/xtable_1.5-1.tar.gz'
Content type 'application/x-gzip' length 134758 bytes
opened URL
==
downloaded 131Kb

* Installing *source* package 'xtable' ...
** R
** data
** inst
** preparing package for lazy loading
** help
  Building/Updating help pages for package 'xtable'
 Formats: text html latex example
  print.xtable  texthtmllatex
  stringtexthtmllatex
  table.attributes  texthtmllatex
  tli   texthtmllatex
  xtabletexthtmllatex   example
** building package indices ...
* DONE (xtable)

The downloaded packages are in
/tmp/RtmpGThCuI/downloaded_packages


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Michael Cassin
Hi,

I've been having similar experiences and haven't been able to
substantially improve the efficiency using the guidance in the I/O
Manual.

Could anyone advise on how to improve the following scan()?  It is not
based on my real file, please assume that I do need to read in
characters, and can't do any pre-processing of the file, etc.

## Create Sample File
write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,row.names=FALSE)
q()

**New Session**
#R
system(ls -l big.csv)
system(free -m)
big1-matrix(scan(big.csv,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
system(free -m)

The file is approximately 9MB, but approximately 50-60MB is used to
read it in.

object.size(big1) is 56MB, or 56 bytes per string, which seems excessive.

Regards, Mike

Configuration info:
 sessionInfo()
R version 2.5.1 (2007-06-27)
x86_64-redhat-linux-gnu
locale:
C
attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

# uname -a
Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37 MSD
2007 x86_64 x86_64 x86_64 GNU/Linux



== Quoted Text 
From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
 Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)




 The R Data Import/Export Manual points out several ways in which you
can use read.csv more efficiently.

 On Tue, 26 Jun 2007, ivo welch wrote:

  dear R experts:
 
 I am of course no R experts, but use it regularly.  I thought I would
 share some experimentation  with memory use.  I run a linux machine
 with about 4GB of memory, and R 2.5.0.

 upon startup, gc() reports

 used (Mb) gc trigger (Mb) max used (Mb)
 Ncells 268755 14.4 407500 21.8   35 18.7
 Vcells 139137  1.1 786432  6.0   444750  3.4

 This is my baseline.  linux 'top' reports 48MB as baseline.  This
 includes some of my own routines that are always loaded.  Good..


 Next, I created a s.csv file with 22 variables and 500,000
 observations, taking up an uncompressed disk space of 115MB.  The
 resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).

 s= read.csv(s.csv);
 object.size(s);

 [1] 84002712


 here is where things get more interesting.  after the read.csv() is
 finished, gc() reports

   used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells   270505 14.58349948 446.0 11268682 601.9
 Vcells 10639515 81.2   34345544 262.1 42834692 326.9

 I was a big surprised by this---R had 928MB intermittent memory in
 use.  More interestingly, this is also similar to what linux 'top'
 reports as memory use of the R process (919MB, probably 1024 vs. 1000
 B/MB), even after the read.csv() is finished and gc() has been run.
 Nothing seems to have been released back to the OS.

 Now,

 rm(s)
 gc()
 used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells 270541 14.56679958 356.8 11268755 601.9
 Vcells 139481  1.1   27476536 209.7 42807620 326.6

 linux 'top' now reports 650MB of memory use (though R itself uses only
 15.6Mb).  My guess is that It leaves the trigger memory of 567MB plus
 the base 48MB.


 There are two interesting observations for me here:  first, to read a
 .csv file, I need to have at least 10-15 times as much memory as the
 file that I want to read---a lot more than the factor of 3-4 that I
 had expected.  The moral is that IF R can read a .csv file, one need
 not worry too much about running into memory constraints lateron.  {R
 Developers---reducing read.csv's memory requirement a little would be
 nice.  of course, you have more than enough on your plate, already.}

 Second, memory is not returned fully to the OS.  This is not
 necessarily a bad thing, but good to know.

 Hope this helps...

 Sincerely,

 /iaw

 __
 R-help_at_stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
Brian D. Ripley,  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
If we add quote = FALSE to the write.csv statement its twice as fast
reading it in.

On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Hi,

 I've been having similar experiences and haven't been able to
 substantially improve the efficiency using the guidance in the I/O
 Manual.

 Could anyone advise on how to improve the following scan()?  It is not
 based on my real file, please assume that I do need to read in
 characters, and can't do any pre-processing of the file, etc.

 ## Create Sample File
 write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,row.names=FALSE)
 q()

 **New Session**
 #R
 system(ls -l big.csv)
 system(free -m)
 big1-matrix(scan(big.csv,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
 system(free -m)

 The file is approximately 9MB, but approximately 50-60MB is used to
 read it in.

 object.size(big1) is 56MB, or 56 bytes per string, which seems excessive.

 Regards, Mike

 Configuration info:
  sessionInfo()
 R version 2.5.1 (2007-06-27)
 x86_64-redhat-linux-gnu
 locale:
 C
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 [7] base

 # uname -a
 Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37 MSD
 2007 x86_64 x86_64 x86_64 GNU/Linux



 == Quoted Text 
 From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
  Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)




  The R Data Import/Export Manual points out several ways in which you
 can use read.csv more efficiently.

  On Tue, 26 Jun 2007, ivo welch wrote:

   dear R experts:
  
  I am of course no R experts, but use it regularly.  I thought I would
  share some experimentation  with memory use.  I run a linux machine
  with about 4GB of memory, and R 2.5.0.
 
  upon startup, gc() reports
 
  used (Mb) gc trigger (Mb) max used (Mb)
  Ncells 268755 14.4 407500 21.8   35 18.7
  Vcells 139137  1.1 786432  6.0   444750  3.4
 
  This is my baseline.  linux 'top' reports 48MB as baseline.  This
  includes some of my own routines that are always loaded.  Good..
 
 
  Next, I created a s.csv file with 22 variables and 500,000
  observations, taking up an uncompressed disk space of 115MB.  The
  resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).
 
  s= read.csv(s.csv);
  object.size(s);
 
  [1] 84002712
 
 
  here is where things get more interesting.  after the read.csv() is
  finished, gc() reports
 
used (Mb) gc trigger  (Mb) max used  (Mb)
  Ncells   270505 14.58349948 446.0 11268682 601.9
  Vcells 10639515 81.2   34345544 262.1 42834692 326.9
 
  I was a big surprised by this---R had 928MB intermittent memory in
  use.  More interestingly, this is also similar to what linux 'top'
  reports as memory use of the R process (919MB, probably 1024 vs. 1000
  B/MB), even after the read.csv() is finished and gc() has been run.
  Nothing seems to have been released back to the OS.
 
  Now,
 
  rm(s)
  gc()
  used (Mb) gc trigger  (Mb) max used  (Mb)
  Ncells 270541 14.56679958 356.8 11268755 601.9
  Vcells 139481  1.1   27476536 209.7 42807620 326.6
 
  linux 'top' now reports 650MB of memory use (though R itself uses only
  15.6Mb).  My guess is that It leaves the trigger memory of 567MB plus
  the base 48MB.
 
 
  There are two interesting observations for me here:  first, to read a
  .csv file, I need to have at least 10-15 times as much memory as the
  file that I want to read---a lot more than the factor of 3-4 that I
  had expected.  The moral is that IF R can read a .csv file, one need
  not worry too much about running into memory constraints lateron.  {R
  Developers---reducing read.csv's memory requirement a little would be
  nice.  of course, you have more than enough on your plate, already.}
 
  Second, memory is not returned fully to the OS.  This is not
  necessarily a bad thing, but good to know.
 
  Hope this helps...
 
  Sincerely,
 
  /iaw
 
  __
  R-help_at_stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  --
 Brian D. Ripley,  ripley_at_stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Need Help: Installing/Using xtable package

2007-08-09 Thread Peter Dalgaard
M. Jankowski wrote:
 Hi all,

 Let me know if I need to ask this question of the bioconductor group.
 I used the bioconductor utility to install this package and also the
 CRAN package.install function.

 My computer crashed a week ago. Today I reinstalled all my
 bioconductor/R packages. One of my scripts is giving me the following
 error:

 in my script I set:
 library(xtable)
 print.xtable(

 and receive this error:
 Error : could not find function print.xtable

 This is a new error and I cannot find the source.
   
Looks like the current xtable is no longer exporting its print methods. 
Why were you calling print.xtable explicitly in the first place?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Michael Cassin
Thanks for looking, but my file has quotes.  It's also 400MB, and I don't
mind waiting, but don't have 6x the memory to read it in.

On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:

 If we add quote = FALSE to the write.csv statement its twice as fast
 reading it in.

 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
  Hi,
 
  I've been having similar experiences and haven't been able to
  substantially improve the efficiency using the guidance in the I/O
  Manual.
 
  Could anyone advise on how to improve the following scan()?  It is not
  based on my real file, please assume that I do need to read in
  characters, and can't do any pre-processing of the file, etc.
 
  ## Create Sample File
  write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,
 row.names=FALSE)
  q()
 
  **New Session**
  #R
  system(ls -l big.csv)
  system(free -m)
  big1-matrix(scan(big.csv
 ,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
  system(free -m)
 
  The file is approximately 9MB, but approximately 50-60MB is used to
  read it in.
 
  object.size(big1) is 56MB, or 56 bytes per string, which seems
 excessive.
 
  Regards, Mike
 
  Configuration info:
   sessionInfo()
  R version 2.5.1 (2007-06-27)
  x86_64-redhat-linux-gnu
  locale:
  C
  attached base packages:
  [1] stats graphics  grDevices utils
 datasets  methods
  [7] base
 
  # uname -a
  Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37 MSD
  2007 x86_64 x86_64 x86_64 GNU/Linux
 
 
 
  == Quoted Text 
  From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
   Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)
 
 
 
 
   The R Data Import/Export Manual points out several ways in which you
  can use read.csv more efficiently.
 
   On Tue, 26 Jun 2007, ivo welch wrote:
 
dear R experts:
   
   I am of course no R experts, but use it regularly.  I thought I would
   share some experimentation  with memory use.  I run a linux machine
   with about 4GB of memory, and R 2.5.0.
  
   upon startup, gc() reports
  
   used (Mb) gc trigger (Mb) max used (Mb)
   Ncells 268755 14.4 407500 21.8   35 18.7
   Vcells 139137  1.1 786432  6.0   444750  3.4
  
   This is my baseline.  linux 'top' reports 48MB as baseline.  This
   includes some of my own routines that are always loaded.  Good..
  
  
   Next, I created a s.csv file with 22 variables and 500,000
   observations, taking up an uncompressed disk space of 115MB.  The
   resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).
  
   s= read.csv(s.csv);
   object.size(s);
  
   [1] 84002712
  
  
   here is where things get more interesting.  after the read.csv() is
   finished, gc() reports
  
 used (Mb) gc trigger  (Mb) max used  (Mb)
   Ncells   270505 14.58349948 446.0 11268682 601.9
   Vcells 10639515 81.2   34345544 262.1 42834692 326.9
  
   I was a big surprised by this---R had 928MB intermittent memory in
   use.  More interestingly, this is also similar to what linux 'top'
   reports as memory use of the R process (919MB, probably 1024 vs. 1000
   B/MB), even after the read.csv() is finished and gc() has been run.
   Nothing seems to have been released back to the OS.
  
   Now,
  
   rm(s)
   gc()
   used (Mb) gc trigger  (Mb) max used  (Mb)
   Ncells 270541 14.56679958 356.8 11268755 601.9
   Vcells 139481  1.1   27476536 209.7 42807620 326.6
  
   linux 'top' now reports 650MB of memory use (though R itself uses only
   15.6Mb).  My guess is that It leaves the trigger memory of 567MB plus
   the base 48MB.
  
  
   There are two interesting observations for me here:  first, to read a
   .csv file, I need to have at least 10-15 times as much memory as the
   file that I want to read---a lot more than the factor of 3-4 that I
   had expected.  The moral is that IF R can read a .csv file, one need
   not worry too much about running into memory constraints lateron.  {R
   Developers---reducing read.csv's memory requirement a little would be
   nice.  of course, you have more than enough on your plate, already.}
  
   Second, memory is not returned fully to the OS.  This is not
   necessarily a bad thing, but good to know.
  
   Hope this helps...
  
   Sincerely,
  
   /iaw
  
   __
   R-help_at_stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
   --
  Brian D. Ripley,  ripley_at_stats.ox.ac.uk
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595
 
  __
  R-help@stat.math.ethz.ch mailing list
  

Re: [R] Need Help: Installing/Using xtable package

2007-08-09 Thread Seth Falcon
Peter Dalgaard [EMAIL PROTECTED] writes:

 M. Jankowski wrote:
 Hi all,

 Let me know if I need to ask this question of the bioconductor group.
 I used the bioconductor utility to install this package and also the
 CRAN package.install function.

 My computer crashed a week ago. Today I reinstalled all my
 bioconductor/R packages. One of my scripts is giving me the following
 error:

 in my script I set:
 library(xtable)
 print.xtable(

 and receive this error:
 Error : could not find function print.xtable

 This is a new error and I cannot find the source.
   
 Looks like the current xtable is no longer exporting its print methods. 
 Why were you calling print.xtable explicitly in the first place?

Indeed, xtable now has a namespace.  The S3 methods are not exported
because they should not be called directly; rather, the generic
function (in this case print) should be called.

The addition of the namespace is really a good.  Yes, it will cause
some hicups for folks who were calling the methods directory (tsk
tsk).  But the addition fixes breakage that was occuring due to
internal xtable helper functions being masked.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need Help: Installing/Using xtable package

2007-08-09 Thread M. Jankowski
Ok, I got it now.

Just: print(xtable(...),)

Thanks!
Matt

On 8/9/07, Seth Falcon [EMAIL PROTECTED] wrote:
 Peter Dalgaard [EMAIL PROTECTED] writes:

  M. Jankowski wrote:
  Hi all,
 
  Let me know if I need to ask this question of the bioconductor group.
  I used the bioconductor utility to install this package and also the
  CRAN package.install function.
 
  My computer crashed a week ago. Today I reinstalled all my
  bioconductor/R packages. One of my scripts is giving me the following
  error:
 
  in my script I set:
  library(xtable)
  print.xtable(
 
  and receive this error:
  Error : could not find function print.xtable
 
  This is a new error and I cannot find the source.
 
  Looks like the current xtable is no longer exporting its print methods.
  Why were you calling print.xtable explicitly in the first place?

 Indeed, xtable now has a namespace.  The S3 methods are not exported
 because they should not be called directly; rather, the generic
 function (in this case print) should be called.

 The addition of the namespace is really a good.  Yes, it will cause
 some hicups for folks who were calling the methods directory (tsk
 tsk).  But the addition fixes breakage that was occuring due to
 internal xtable helper functions being masked.

 + seth

 --
 Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
 BioC: http://bioconductor.org/
 Blog: http://userprimary.net/user/


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] small sample techniques

2007-08-09 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Nair, 
 Murlidharan T
 Sent: Thursday, August 09, 2007 9:19 AM
 To: Moshe Olshansky; Rolf Turner; r-help@stat.math.ethz.ch
 Subject: Re: [R] small sample techniques
 
 Thanks, that discussion was helpful. Well, I have another question 
 I am comparing two proportions for its deviation from the hypothesized
 difference of zero. My manually calculated z ratio is 1.94. 
 But, when I calculate it using prop.test, it uses Pearson's 
 chi-squared
 test and the X-squared value that it gives it 0.74. Is there 
 a function
 in R where I can calculate the z ratio? Which is 
 
 
('p1-'p2)-(p1-p2)
  Z= 
S
   ('p1-'p2)
 
 Where S is the standard error estimate of the difference between two
 independent proportions
 
 Dummy example 
 This is how I use it 
 prop.test(c(30,23),c(300,300))
 
 
 Cheers../Murli
 
 

Murli,

I think you need to recheck you computations.  You can run a t-test on your 
data in a variety of ways.  Here is one: 

 x-c(rep(1,30),rep(0,270))
 y-c(rep(1,23),rep(0,277))
 t.test(x,y)

Welch Two Sample t-test

data:  x and y 
t = 1.0062, df = 589.583, p-value = 0.3147
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.02221086  0.06887752 
sample estimates:
 mean of x  mean of y 
0.1000 0.0767 

Hope this is helpful,

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
Another thing you could try would be reading it into a data base and then
from there into R.

The devel version of sqldf has this capability.   That is it will use RSQLite
to read the file directly into the database without going through R at all
and then read it from there into R so its a completely different process.
The RSQLite software has no capability of dealing with quotes (they will
be regarded as ordinary characters) but a single gsub can remove them
afterwards.  This won't work if there are commas within the quotes but
in that case you could read each row as a single record and then
split it yourself in R.

Try this

library(sqldf)
# next statement grabs the devel version software that does this
source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)

gc()
f - file(big.csv)
DF - sqldf(select * from f, file.format = list(header = TRUE,
row.names = FALSE))
gc()

For more info see the man page from the devel version and the home page:

http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
http://code.google.com/p/sqldf/


On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Thanks for looking, but my file has quotes.  It's also 400MB, and I don't
 mind waiting, but don't have 6x the memory to read it in.


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  If we add quote = FALSE to the write.csv statement its twice as fast
  reading it in.
 
  On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
   Hi,
  
   I've been having similar experiences and haven't been able to
   substantially improve the efficiency using the guidance in the I/O
   Manual.
  
   Could anyone advise on how to improve the following scan()?  It is not
   based on my real file, please assume that I do need to read in
   characters, and can't do any pre-processing of the file, etc.
  
   ## Create Sample File
  
 write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,row.names=FALSE)
   q()
  
   **New Session**
   #R
   system(ls -l big.csv)
   system(free -m)
  
 big1-matrix(scan(big.csv,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
   system(free -m)
  
   The file is approximately 9MB, but approximately 50-60MB is used to
   read it in.
  
   object.size(big1) is 56MB, or 56 bytes per string, which seems
 excessive.
  
   Regards, Mike
  
   Configuration info:
sessionInfo()
   R version 2.5.1 (2007-06-27)
   x86_64-redhat-linux-gnu
   locale:
   C
   attached base packages:
   [1] stats graphics  grDevices utils datasets
 methods
   [7] base
  
   # uname -a
   Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37 MSD
   2007 x86_64 x86_64 x86_64 GNU/Linux
  
  
  
   == Quoted Text 
   From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)
  
  
  
  
The R Data Import/Export Manual points out several ways in which you
   can use read.csv more efficiently.
  
On Tue, 26 Jun 2007, ivo welch wrote:
  
 dear R experts:

I am of course no R experts, but use it regularly.  I thought I would
share some experimentation  with memory use.  I run a linux machine
with about 4GB of memory, and R 2.5.0.
   
upon startup, gc() reports
   
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 268755 14.4 407500 21.8   35 18.7
Vcells 139137   1.1 786432  6.0   444750  3.4
   
This is my baseline.  linux 'top' reports 48MB as baseline.  This
includes some of my own routines that are always loaded.  Good..
   
   
Next, I created a s.csv file with 22 variables and 500,000
observations, taking up an uncompressed disk space of 115MB.  The
resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).
   
s= read.csv(s.csv);
object.size(s);
   
[1] 84002712
   
   
here is where things get more interesting.  after the read.csv() is
finished, gc() reports
   
  used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   270505 14.58349948 446.0 11268682 601.9
Vcells 10639515 81.2   34345544 262.1 42834692 326.9
   
I was a big surprised by this---R had 928MB intermittent memory in
use.  More interestingly, this is also similar to what linux 'top'
reports as memory use of the R process (919MB, probably 1024 vs. 1000
B/MB), even after the read.csv() is finished and gc() has been run.
Nothing seems to have been released back to the OS.
   
Now,
   
rm(s)
gc()
used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 270541 14.56679958 356.8 11268755 601.9
Vcells 139481   1.1   27476536 209.7 42807620 326.6
   
linux 'top' now reports 650MB of memory use (though R itself uses only
15.6Mb).  My guess is that It leaves the trigger memory of 567MB plus
the base 48MB.
   
   
There are two interesting observations for me here:  first, to read a
.csv file, I need to have at least 10-15 times as much memory as the
file that I want to read---a lot more than the factor of 

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
Just one other thing.

The command in my prior post reads the data into an in-memory database.
If you find that is a problem then you can read it into a disk-based
database by adding the dbname argument to the sqldf call
naming the database.  The database need not exist.  It will
be created by sqldf and then deleted when its through:

DF - sqldf(select * from f, dbname = tempfile(),
  file.format = list(header = TRUE, row.names = FALSE))


On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Another thing you could try would be reading it into a data base and then
 from there into R.

 The devel version of sqldf has this capability.   That is it will use RSQLite
 to read the file directly into the database without going through R at all
 and then read it from there into R so its a completely different process.
 The RSQLite software has no capability of dealing with quotes (they will
 be regarded as ordinary characters) but a single gsub can remove them
 afterwards.  This won't work if there are commas within the quotes but
 in that case you could read each row as a single record and then
 split it yourself in R.

 Try this

 library(sqldf)
 # next statement grabs the devel version software that does this
 source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)

 gc()
 f - file(big.csv)
 DF - sqldf(select * from f, file.format = list(header = TRUE,
 row.names = FALSE))
 gc()

 For more info see the man page from the devel version and the home page:

 http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
 http://code.google.com/p/sqldf/


 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
  Thanks for looking, but my file has quotes.  It's also 400MB, and I don't
  mind waiting, but don't have 6x the memory to read it in.
 
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
   If we add quote = FALSE to the write.csv statement its twice as fast
   reading it in.
  
   On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
Hi,
   
I've been having similar experiences and haven't been able to
substantially improve the efficiency using the guidance in the I/O
Manual.
   
Could anyone advise on how to improve the following scan()?  It is not
based on my real file, please assume that I do need to read in
characters, and can't do any pre-processing of the file, etc.
   
## Create Sample File
   
  write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,row.names=FALSE)
q()
   
**New Session**
#R
system(ls -l big.csv)
system(free -m)
   
  big1-matrix(scan(big.csv,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
system(free -m)
   
The file is approximately 9MB, but approximately 50-60MB is used to
read it in.
   
object.size(big1) is 56MB, or 56 bytes per string, which seems
  excessive.
   
Regards, Mike
   
Configuration info:
 sessionInfo()
R version 2.5.1 (2007-06-27)
x86_64-redhat-linux-gnu
locale:
C
attached base packages:
[1] stats graphics  grDevices utils datasets
  methods
[7] base
   
# uname -a
Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37 MSD
2007 x86_64 x86_64 x86_64 GNU/Linux
   
   
   
== Quoted Text 
From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
 Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)
   
   
   
   
 The R Data Import/Export Manual points out several ways in which you
can use read.csv more efficiently.
   
 On Tue, 26 Jun 2007, ivo welch wrote:
   
  dear R experts:
 
 I am of course no R experts, but use it regularly.  I thought I would
 share some experimentation  with memory use.  I run a linux machine
 with about 4GB of memory, and R 2.5.0.

 upon startup, gc() reports

 used (Mb) gc trigger (Mb) max used (Mb)
 Ncells 268755 14.4 407500 21.8   35 18.7
 Vcells 139137   1.1 786432  6.0   444750  3.4

 This is my baseline.  linux 'top' reports 48MB as baseline.  This
 includes some of my own routines that are always loaded.  Good..


 Next, I created a s.csv file with 22 variables and 500,000
 observations, taking up an uncompressed disk space of 115MB.  The
 resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).

 s= read.csv(s.csv);
 object.size(s);

 [1] 84002712


 here is where things get more interesting.  after the read.csv() is
 finished, gc() reports

   used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells   270505 14.58349948 446.0 11268682 601.9
 Vcells 10639515 81.2   34345544 262.1 42834692 326.9

 I was a big surprised by this---R had 928MB intermittent memory in
 use.  More interestingly, this is also similar to what linux 'top'
 reports as memory use of the R process (919MB, probably 1024 vs. 1000
 B/MB), even after the read.csv() is finished and gc() has been run.
 Nothing 

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Michael Cassin
I really appreciate the advice and this database solution will be useful to
me for other problems, but in this case I  need to address the specific
problem of scan and read.* using so much memory.

Is this expected behaviour? Can the memory usage be explained, and can it be
made more efficient?  For what it's worth, I'd be glad to try to help if the
code for scan is considered to be worth reviewing.

Regards, Mike

On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:

 Just one other thing.

 The command in my prior post reads the data into an in-memory database.
 If you find that is a problem then you can read it into a disk-based
 database by adding the dbname argument to the sqldf call
 naming the database.  The database need not exist.  It will
 be created by sqldf and then deleted when its through:

 DF - sqldf(select * from f, dbname = tempfile(),
   file.format = list(header = TRUE, row.names = FALSE))


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Another thing you could try would be reading it into a data base and
 then
  from there into R.
 
  The devel version of sqldf has this capability.   That is it will use
 RSQLite
  to read the file directly into the database without going through R at
 all
  and then read it from there into R so its a completely different
 process.
  The RSQLite software has no capability of dealing with quotes (they will
  be regarded as ordinary characters) but a single gsub can remove them
  afterwards.  This won't work if there are commas within the quotes but
  in that case you could read each row as a single record and then
  split it yourself in R.
 
  Try this
 
  library(sqldf)
  # next statement grabs the devel version software that does this
  source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)
 
  gc()
  f - file(big.csv)
  DF - sqldf(select * from f, file.format = list(header = TRUE,
  row.names = FALSE))
  gc()
 
  For more info see the man page from the devel version and the home page:
 
  http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
  http://code.google.com/p/sqldf/
 
 
  On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
   Thanks for looking, but my file has quotes.  It's also 400MB, and I
 don't
   mind waiting, but don't have 6x the memory to read it in.
  
  
   On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
If we add quote = FALSE to the write.csv statement its twice as fast
reading it in.
   
On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Hi,

 I've been having similar experiences and haven't been able to
 substantially improve the efficiency using the guidance in the I/O
 Manual.

 Could anyone advise on how to improve the following scan()?  It is
 not
 based on my real file, please assume that I do need to read in
 characters, and can't do any pre-processing of the file, etc.

 ## Create Sample File

   write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,
 row.names=FALSE)
 q()

 **New Session**
 #R
 system(ls -l big.csv)
 system(free -m)

   big1-matrix(scan(big.csv
 ,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
 system(free -m)

 The file is approximately 9MB, but approximately 50-60MB is used
 to
 read it in.

 object.size(big1) is 56MB, or 56 bytes per string, which seems
   excessive.

 Regards, Mike

 Configuration info:
  sessionInfo()
 R version 2.5.1 (2007-06-27)
 x86_64-redhat-linux-gnu
 locale:
 C
 attached base packages:
 [1] stats graphics  grDevices utils datasets
   methods
 [7] base

 # uname -a
 Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37
 MSD
 2007 x86_64 x86_64 x86_64 GNU/Linux



 == Quoted Text 
 From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
  Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)




  The R Data Import/Export Manual points out several ways in which
 you
 can use read.csv more efficiently.

  On Tue, 26 Jun 2007, ivo welch wrote:

   dear R experts:
  
  I am of course no R experts, but use it regularly.  I thought I
 would
  share some experimentation  with memory use.  I run a linux
 machine
  with about 4GB of memory, and R 2.5.0.
 
  upon startup, gc() reports
 
  used (Mb) gc trigger (Mb) max used (Mb)
  Ncells 268755 14.4 407500 21.8   35 18.7
  Vcells 139137   1.1 786432  6.0   444750  3.4
 
  This is my baseline.  linux 'top' reports 48MB as
 baseline.  This
  includes some of my own routines that are always loaded.  Good..
 
 
  Next, I created a s.csv file with 22 variables and 500,000
  observations, taking up an uncompressed disk space of
 115MB.  The
  resulting object.size() after a read.csv() is 84,002,712 bytes
 (80MB).
 
  s= read.csv(s.csv);
  object.size(s);
   

Re: [R] small sample techniques

2007-08-09 Thread Nair, Murlidharan T
n=300
30% taking A relief from pain
23% taking B relief from pain
Question; If there is no difference are we likely to get a 7% difference?

Hypothesis
H0: p1-p2=0
H1: p1-p2!=0 (not equal to)

1Weighed average of two sample proportion
300(0.30)+300(0.23)
--- = 0.265
  300+300
2Std Error estimate of the difference between two independent proportions
  sqrt((0.265 *0.735)*((1/300)+(1/300))) = 0.03603

3Evaluation of the difference between sample proportion as a deviation from 
the hypothesized difference of zero
 ((0.30-0.23)-(0))/0.03603 = 1.94


z did not approach 1.96 hence H0 is not rejected. 

This is what I was trying to do using prop.test. 

prop.test(c(30,23),c(300,300)) 

What function should I use? 


-Original Message-
From: [EMAIL PROTECTED] on behalf of Nordlund, Dan (DSHS/RDA)
Sent: Thu 8/9/2007 1:26 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] small sample techniques
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Nair, 
 Murlidharan T
 Sent: Thursday, August 09, 2007 9:19 AM
 To: Moshe Olshansky; Rolf Turner; r-help@stat.math.ethz.ch
 Subject: Re: [R] small sample techniques
 
 Thanks, that discussion was helpful. Well, I have another question 
 I am comparing two proportions for its deviation from the hypothesized
 difference of zero. My manually calculated z ratio is 1.94. 
 But, when I calculate it using prop.test, it uses Pearson's 
 chi-squared
 test and the X-squared value that it gives it 0.74. Is there 
 a function
 in R where I can calculate the z ratio? Which is 
 
 
('p1-'p2)-(p1-p2)
  Z= 
S
   ('p1-'p2)
 
 Where S is the standard error estimate of the difference between two
 independent proportions
 
 Dummy example 
 This is how I use it 
 prop.test(c(30,23),c(300,300))
 
 
 Cheers../Murli
 
 

Murli,

I think you need to recheck you computations.  You can run a t-test on your 
data in a variety of ways.  Here is one: 

 x-c(rep(1,30),rep(0,270))
 y-c(rep(1,23),rep(0,277))
 t.test(x,y)

Welch Two Sample t-test

data:  x and y 
t = 1.0062, df = 589.583, p-value = 0.3147
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.02221086  0.06887752 
sample estimates:
 mean of x  mean of y 
0.1000 0.0767 

Hope this is helpful,

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
One other idea.  Don't use byrow = TRUE.  Matrices are stored in column
order so that might be more efficient.  You can always transpose it later.
Haven't tested it to see if it helps.

On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:

 I really appreciate the advice and this database solution will be useful to
 me for other problems, but in this case I  need to address the specific
 problem of scan and read.* using so much memory.

 Is this expected behaviour? Can the memory usage be explained, and can it be
 made more efficient?  For what it's worth, I'd be glad to try to help if the
 code for scan is considered to be worth reviewing.

 Regards, Mike


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Just one other thing.
 
  The command in my prior post reads the data into an in-memory database.
  If you find that is a problem then you can read it into a disk-based
  database by adding the dbname argument to the sqldf call
  naming the database.  The database need not exist.  It will
  be created by sqldf and then deleted when its through:
 
  DF - sqldf(select * from f, dbname = tempfile(),
file.format = list(header = TRUE, row.names = FALSE))
 
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
   Another thing you could try would be reading it into a data base and
 then
   from there into R.
  
   The devel version of sqldf has this capability.   That is it will use
 RSQLite
   to read the file directly into the database without going through R at
 all
   and then read it from there into R so its a completely different
 process.
   The RSQLite software has no capability of dealing with quotes (they will
   be regarded as ordinary characters) but a single gsub can remove them
   afterwards.  This won't work if there are commas within the quotes but
   in that case you could read each row as a single record and then
   split it yourself in R.
  
   Try this
  
   library(sqldf)
   # next statement grabs the devel version software that does this
  
 source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)
  
   gc()
   f - file(big.csv)
   DF - sqldf(select * from f, file.format = list(header = TRUE,
   row.names = FALSE))
   gc()
  
   For more info see the man page from the devel version and the home page:
  
   http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
   http://code.google.com/p/sqldf/
  
  
   On 8/9/07, Michael Cassin  [EMAIL PROTECTED] wrote:
Thanks for looking, but my file has quotes.  It's also 400MB, and I
 don't
mind waiting, but don't have 6x the memory to read it in.
   
   
On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 If we add quote = FALSE to the write.csv statement its twice as fast
 reading it in.

 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
  Hi,
 
  I've been having similar experiences and haven't been able to
  substantially improve the efficiency using the guidance in the I/O
  Manual.
 
  Could anyone advise on how to improve the following scan()?  It is
 not
  based on my real file, please assume that I do need to read in
  characters, and can't do any pre-processing of the file, etc.
 
  ## Create Sample File
 
   
 write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,row.names=FALSE)
  q()
 
  **New Session**
  #R
  system(ls -l big.csv)
  system(free -m)
 
   
 big1-matrix(scan(big.csv,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
  system(free -m)
 
  The file is approximately 9MB, but approximately 50-60MB is used
 to
  read it in.
 
  object.size(big1) is 56MB, or 56 bytes per string, which seems
excessive.
 
  Regards, Mike
 
  Configuration info:
   sessionInfo()
  R version 2.5.1 (2007-06-27)
  x86_64-redhat-linux-gnu
  locale:
  C
  attached base packages:
  [1] stats graphics  grDevices utils datasets
methods
  [7] base
 
  # uname -a
  Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37
 MSD
  2007 x86_64 x86_64 x86_64 GNU/Linux
 
 
 
  == Quoted Text 
  From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
   Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)
 
 
 
 
   The R Data Import/Export Manual points out several ways in which
 you
  can use read.csv more efficiently.
 
   On Tue, 26 Jun 2007, ivo welch wrote:
 
dear R experts:
   
   I am of course no R experts, but use it regularly.  I thought I
 would
   share some experimentation  with memory use.  I run a linux
 machine
   with about 4GB of memory, and R 2.5.0.
  
   upon startup, gc() reports
  
   used (Mb) gc trigger (Mb) max used (Mb)
   Ncells 268755 14.4 407500 21.8   35 18.7
   Vcells 139137   1.1 786432  6.0   444750  3.4
  
   This is my baseline.  linux 'top' reports 48MB as 

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Charles C. Berry
On Thu, 9 Aug 2007, Michael Cassin wrote:

 I really appreciate the advice and this database solution will be useful to
 me for other problems, but in this case I  need to address the specific
 problem of scan and read.* using so much memory.

 Is this expected behaviour? Can the memory usage be explained, and can it be
 made more efficient?  For what it's worth, I'd be glad to try to help if the
 code for scan is considered to be worth reviewing.

Mike,

This does not seem to be an issue with scan() per se.

Notice the difference in size of big2, big3, and bigThree here:

 big2 - rep(letters,length=1e6)
 object.size(big2)/1e6
[1] 4.000856
 big3 - paste(big2,big2,sep='')
 object.size(big3)/1e6
[1] 36.2

 cat(big2, file='lotsaletters.txt', sep='\n')
 bigTwo - scan('lotsaletters.txt',what='')
Read 100 items
 object.size(bigTwo)/1e6
[1] 4.000856
 cat(big3, file='moreletters.txt', sep='\n')
 bigThree - scan('moreletters.txt',what='')
Read 100 items
 object.size(bigThree)/1e6
[1] 4.000856
 all.equal(big3,bigThree)
[1] TRUE


Chuck

p.s.
 version
_
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  5.1
year   2007
month  06
day27
svn rev42083
language   R
version.string R version 2.5.1 (2007-06-27)



 Regards, Mike

 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:

 Just one other thing.

 The command in my prior post reads the data into an in-memory database.
 If you find that is a problem then you can read it into a disk-based
 database by adding the dbname argument to the sqldf call
 naming the database.  The database need not exist.  It will
 be created by sqldf and then deleted when its through:

 DF - sqldf(select * from f, dbname = tempfile(),
   file.format = list(header = TRUE, row.names = FALSE))


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Another thing you could try would be reading it into a data base and
 then
 from there into R.

 The devel version of sqldf has this capability.   That is it will use
 RSQLite
 to read the file directly into the database without going through R at
 all
 and then read it from there into R so its a completely different
 process.
 The RSQLite software has no capability of dealing with quotes (they will
 be regarded as ordinary characters) but a single gsub can remove them
 afterwards.  This won't work if there are commas within the quotes but
 in that case you could read each row as a single record and then
 split it yourself in R.

 Try this

 library(sqldf)
 # next statement grabs the devel version software that does this
 source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)

 gc()
 f - file(big.csv)
 DF - sqldf(select * from f, file.format = list(header = TRUE,
 row.names = FALSE))
 gc()

 For more info see the man page from the devel version and the home page:

 http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
 http://code.google.com/p/sqldf/


 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Thanks for looking, but my file has quotes.  It's also 400MB, and I
 don't
 mind waiting, but don't have 6x the memory to read it in.


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 If we add quote = FALSE to the write.csv statement its twice as fast
 reading it in.

 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Hi,

 I've been having similar experiences and haven't been able to
 substantially improve the efficiency using the guidance in the I/O
 Manual.

 Could anyone advise on how to improve the following scan()?  It is
 not
 based on my real file, please assume that I do need to read in
 characters, and can't do any pre-processing of the file, etc.

 ## Create Sample File

 write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,
 row.names=FALSE)
 q()

 **New Session**
 #R
 system(ls -l big.csv)
 system(free -m)

 big1-matrix(scan(big.csv
 ,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
 system(free -m)

 The file is approximately 9MB, but approximately 50-60MB is used
 to
 read it in.

 object.size(big1) is 56MB, or 56 bytes per string, which seems
 excessive.

 Regards, Mike

 Configuration info:
 sessionInfo()
 R version 2.5.1 (2007-06-27)
 x86_64-redhat-linux-gnu
 locale:
 C
 attached base packages:
 [1] stats graphics  grDevices utils datasets
 methods
 [7] base

 # uname -a
 Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37
 MSD
 2007 x86_64 x86_64 x86_64 GNU/Linux



 == Quoted Text 
 From: Prof Brian Ripley ripley_at_stats.ox.ac.uk
  Date: Tue, 26 Jun 2007 17:53:28 +0100 (BST)




  The R Data Import/Export Manual points out several ways in which
 you
 can use read.csv more efficiently.

  On Tue, 26 Jun 2007, ivo welch wrote:

  dear R experts:
 
 I am of course no R experts, but use it regularly.  I thought I
 would
 share some experimentation  with memory use.  I run a linux
 machine
 with about 4GB of 

[R] depreciation of $ for atomic vectors

2007-08-09 Thread Ido M. Tamir
Dear All,

I would like to know why $ was deprecated for atomic vectors and
what I can use instead.

I got used to the following idiom for working with
data frames:

df - data.frame(start=1:5,end=10:6)
apply(df,1,function(row){ return(row$start + row$end) })

I have a data.frame with named columns and
use each row to do something. I would like the
named index ($) because the column position
in the data frame changes from time to time. 
The data frame is read from files.

thank you very much,

ido 




'$' returns 'NULL' (with a warning) except for recursive
 objects, and is only discussed in the section below on recursive
 objects.  Its use on non-recursive objects was deprecated in R
 2.5.0.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] depreciation of $ for atomic vectors

2007-08-09 Thread Gabor Grothendieck
Try this:

DF - data.frame(start=1:5,end=10:6)
# apply(DF,1,function(row){ return(row$start + row$end) })

DF$start + DF$end

apply(DF, 1, function(row) row[[start]] + row[[end]])

apply(DF, 1, function(row) row[start] + row[end])


On 8/9/07, Ido M. Tamir [EMAIL PROTECTED] wrote:
 Dear All,

 I would like to know why $ was deprecated for atomic vectors and
 what I can use instead.

 I got used to the following idiom for working with
 data frames:

 df - data.frame(start=1:5,end=10:6)
 apply(df,1,function(row){ return(row$start + row$end) })

 I have a data.frame with named columns and
 use each row to do something. I would like the
 named index ($) because the column position
 in the data frame changes from time to time.
 The data frame is read from files.

 thank you very much,

 ido




 '$' returns 'NULL' (with a warning) except for recursive
 objects, and is only discussed in the section below on recursive
 objects.  Its use on non-recursive objects was deprecated in R
 2.5.0.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
Try it as a factor:

 big2 - rep(letters,length=1e6)
 object.size(big2)/1e6
[1] 4.000856
 object.size(as.factor(big2))/1e6
[1] 4.001184

 big3 - paste(big2,big2,sep='')
 object.size(big3)/1e6
[1] 36.2
 object.size(as.factor(big3))/1e6
[1] 4.001184


On 8/9/07, Charles C. Berry [EMAIL PROTECTED] wrote:
 On Thu, 9 Aug 2007, Michael Cassin wrote:

  I really appreciate the advice and this database solution will be useful to
  me for other problems, but in this case I  need to address the specific
  problem of scan and read.* using so much memory.
 
  Is this expected behaviour? Can the memory usage be explained, and can it be
  made more efficient?  For what it's worth, I'd be glad to try to help if the
  code for scan is considered to be worth reviewing.

 Mike,

 This does not seem to be an issue with scan() per se.

 Notice the difference in size of big2, big3, and bigThree here:

  big2 - rep(letters,length=1e6)
  object.size(big2)/1e6
 [1] 4.000856
  big3 - paste(big2,big2,sep='')
  object.size(big3)/1e6
 [1] 36.2
 
  cat(big2, file='lotsaletters.txt', sep='\n')
  bigTwo - scan('lotsaletters.txt',what='')
 Read 100 items
  object.size(bigTwo)/1e6
 [1] 4.000856
  cat(big3, file='moreletters.txt', sep='\n')
  bigThree - scan('moreletters.txt',what='')
 Read 100 items
  object.size(bigThree)/1e6
 [1] 4.000856
  all.equal(big3,bigThree)
 [1] TRUE


 Chuck

 p.s.
  version
_
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)
 

 
  Regards, Mike
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 
  Just one other thing.
 
  The command in my prior post reads the data into an in-memory database.
  If you find that is a problem then you can read it into a disk-based
  database by adding the dbname argument to the sqldf call
  naming the database.  The database need not exist.  It will
  be created by sqldf and then deleted when its through:
 
  DF - sqldf(select * from f, dbname = tempfile(),
file.format = list(header = TRUE, row.names = FALSE))
 
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Another thing you could try would be reading it into a data base and
  then
  from there into R.
 
  The devel version of sqldf has this capability.   That is it will use
  RSQLite
  to read the file directly into the database without going through R at
  all
  and then read it from there into R so its a completely different
  process.
  The RSQLite software has no capability of dealing with quotes (they will
  be regarded as ordinary characters) but a single gsub can remove them
  afterwards.  This won't work if there are commas within the quotes but
  in that case you could read each row as a single record and then
  split it yourself in R.
 
  Try this
 
  library(sqldf)
  # next statement grabs the devel version software that does this
  source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)
 
  gc()
  f - file(big.csv)
  DF - sqldf(select * from f, file.format = list(header = TRUE,
  row.names = FALSE))
  gc()
 
  For more info see the man page from the devel version and the home page:
 
  http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
  http://code.google.com/p/sqldf/
 
 
  On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
  Thanks for looking, but my file has quotes.  It's also 400MB, and I
  don't
  mind waiting, but don't have 6x the memory to read it in.
 
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  If we add quote = FALSE to the write.csv statement its twice as fast
  reading it in.
 
  On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
  Hi,
 
  I've been having similar experiences and haven't been able to
  substantially improve the efficiency using the guidance in the I/O
  Manual.
 
  Could anyone advise on how to improve the following scan()?  It is
  not
  based on my real file, please assume that I do need to read in
  characters, and can't do any pre-processing of the file, etc.
 
  ## Create Sample File
 
  write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,
  row.names=FALSE)
  q()
 
  **New Session**
  #R
  system(ls -l big.csv)
  system(free -m)
 
  big1-matrix(scan(big.csv
  ,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
  system(free -m)
 
  The file is approximately 9MB, but approximately 50-60MB is used
  to
  read it in.
 
  object.size(big1) is 56MB, or 56 bytes per string, which seems
  excessive.
 
  Regards, Mike
 
  Configuration info:
  sessionInfo()
  R version 2.5.1 (2007-06-27)
  x86_64-redhat-linux-gnu
  locale:
  C
  attached base packages:
  [1] stats graphics  grDevices utils datasets
  methods
  [7] base
 
  # uname -a
  Linux ***.com 2.6.9-023stab044.4-smp #1 SMP Thu May 24 17:20:37
  MSD
  2007 x86_64 x86_64 x86_64 

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Charles C. Berry

I do not see how this helps Mike's case:

 res - (as.character(1:1e6))
 object.size(res)
[1] 3624
 object.size(as.factor(res))
[1] 4224


Anyway, my point was that if two character vectors for which all.equal() 
yields TRUE can differ by almost an order of magnitude in object.size(), 
and the smaller of the two was read in by scan(), then Mike will have to 
dig deeper than scan() to see how to reduce the size of a character vector 
in R.


On Thu, 9 Aug 2007, Gabor Grothendieck wrote:

 Try it as a factor:

 big2 - rep(letters,length=1e6)
 object.size(big2)/1e6
 [1] 4.000856
 object.size(as.factor(big2))/1e6
 [1] 4.001184

 big3 - paste(big2,big2,sep='')
 object.size(big3)/1e6
 [1] 36.2
 object.size(as.factor(big3))/1e6
 [1] 4.001184


 On 8/9/07, Charles C. Berry [EMAIL PROTECTED] wrote:
 On Thu, 9 Aug 2007, Michael Cassin wrote:

 I really appreciate the advice and this database solution will be useful to
 me for other problems, but in this case I  need to address the specific
 problem of scan and read.* using so much memory.

 Is this expected behaviour? Can the memory usage be explained, and can it be
 made more efficient?  For what it's worth, I'd be glad to try to help if the
 code for scan is considered to be worth reviewing.

 Mike,

 This does not seem to be an issue with scan() per se.

 Notice the difference in size of big2, big3, and bigThree here:

 big2 - rep(letters,length=1e6)
 object.size(big2)/1e6
 [1] 4.000856
 big3 - paste(big2,big2,sep='')
 object.size(big3)/1e6
 [1] 36.2

 cat(big2, file='lotsaletters.txt', sep='\n')
 bigTwo - scan('lotsaletters.txt',what='')
 Read 100 items
 object.size(bigTwo)/1e6
 [1] 4.000856
 cat(big3, file='moreletters.txt', sep='\n')
 bigThree - scan('moreletters.txt',what='')
 Read 100 items
 object.size(bigThree)/1e6
 [1] 4.000856
 all.equal(big3,bigThree)
 [1] TRUE


 Chuck

 p.s.
 version
_
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)



 Regards, Mike

 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:

 Just one other thing.

 The command in my prior post reads the data into an in-memory database.
 If you find that is a problem then you can read it into a disk-based
 database by adding the dbname argument to the sqldf call
 naming the database.  The database need not exist.  It will
 be created by sqldf and then deleted when its through:

 DF - sqldf(select * from f, dbname = tempfile(),
   file.format = list(header = TRUE, row.names = FALSE))


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Another thing you could try would be reading it into a data base and
 then
 from there into R.

 The devel version of sqldf has this capability.   That is it will use
 RSQLite
 to read the file directly into the database without going through R at
 all
 and then read it from there into R so its a completely different
 process.
 The RSQLite software has no capability of dealing with quotes (they will
 be regarded as ordinary characters) but a single gsub can remove them
 afterwards.  This won't work if there are commas within the quotes but
 in that case you could read each row as a single record and then
 split it yourself in R.

 Try this

 library(sqldf)
 # next statement grabs the devel version software that does this
 source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)

 gc()
 f - file(big.csv)
 DF - sqldf(select * from f, file.format = list(header = TRUE,
 row.names = FALSE))
 gc()

 For more info see the man page from the devel version and the home page:

 http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
 http://code.google.com/p/sqldf/


 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Thanks for looking, but my file has quotes.  It's also 400MB, and I
 don't
 mind waiting, but don't have 6x the memory to read it in.


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 If we add quote = FALSE to the write.csv statement its twice as fast
 reading it in.

 On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
 Hi,

 I've been having similar experiences and haven't been able to
 substantially improve the efficiency using the guidance in the I/O
 Manual.

 Could anyone advise on how to improve the following scan()?  It is
 not
 based on my real file, please assume that I do need to read in
 characters, and can't do any pre-processing of the file, etc.

 ## Create Sample File

 write.csv(matrix(as.character(1:1e6),ncol=10,byrow=TRUE),big.csv,
 row.names=FALSE)
 q()

 **New Session**
 #R
 system(ls -l big.csv)
 system(free -m)

 big1-matrix(scan(big.csv
 ,sep=,,what=character(0),skip=1,n=1e6),ncol=10,byrow=TRUE)
 system(free -m)

 The file is approximately 9MB, but approximately 50-60MB is used
 to
 read it in.

 object.size(big1) is 56MB, or 56 

Re: [R] tcltk error on Linux

2007-08-09 Thread Mark W Kimpel
Seth and Brian,

Today and downloaded and installed the latest R-devel and tcltk now 
works. My suspicion is that Tcl was not on my path when R-devel was 
installed previously.

BTW, I had though that is was a courtesy to cc: the maintainers of a 
package when writing either R-devel or R-help about a specific package. 
For tcltk, I see:
Maintainer:R Core Team [EMAIL PROTECTED]

If it is not appropriate to write R-core regarding packages they 
maintain, would it perhaps not be better to remove them as maintainers 
or, not suggest that people cc: maintainers of packages? Just an idea.

As for the suggestion that I not use R-devel, I do that because I 
sometimes use BioC packages that have just been published and are only 
available in the devel versions of BioC. Are you suggesting that only 
people who can debug things themselves, and thus who do not need to 
write to R-devel, use R-devel? As an open-source user, I thought the 
philosophy was that it was useful to have users willing to test beta 
versions of software and have those users report problems to developers. 
If that is not the case, please put a stronger warning on R-devel and 
warn users not to use it unless they are willing to debug and take 
care of all problems themselves.

Thanks,

Mark

---

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work,  Mobile  VoiceMail
(317) 663-0513 Home (no voice mail please)

**

Seth Falcon wrote:
 Hi Mark,
 
 Prof Brian Ripley [EMAIL PROTECTED] writes:
 On Thu, 9 Aug 2007, Mark W Kimpel wrote:

 I am having trouble getting tcltk package to load on openSuse 10.2
 running R-devel. I have specifically put my /usr/share/tcl directory in
 my PATH, but R doesn't seem to see it. I also have installed tk on my
 system. Any ideas on what the problem is?
 
 Any chance you are running R on a remote server using an ssh session?
 If that is the case, you may have an ssh/X11 config issue that
 prevents using tcl/tk from such a session.
 
 Rerun the configure script for R and verify that tcl/tk support is
 listed in the summary.
 
 Also, note that I have some warning messages on starting up R, not sure
 what they mean or if they are pertinent.
 Those are coming from a Bioconductor package: again you must be using 
 development versions with R-devel and those are not stable (last time I 
 looked even Biobase would not install, and the packages change
 daily).
 
 BioC devel tracks R-devel, but not on a daily basis -- because R
 changes daily.  The recent issues with Biobase are a result of changes
 to R and have already been fixed.
 
 If you have all those packages in your startup, please don't -- there will 
 be a considerable performance hit so only load them when you need
 them.
 
 Presumably, that's why they are there in the first place.  The warning
 messages are a problem and suggest some needed improvements to the
 methods packages.  These are being worked on.
 
 + seth


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RMySQL loading error

2007-08-09 Thread Clara Anton
Hi,

I am having problems loading RMySQL.

I am using MySQL 5.0,  R version 2.5.1, and RMySQL with Windows XP.
When I try to load rMySQL I get the following error:

  require(RMySQL)
Loading required package: RMySQL
Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library 
'C:/PROGRA~1/R/R-25~1.1/library/RMySQL/libs/RMySQL.dll':
  LoadLibrary failure:  Invalid access to memory location.


I did not get any errors while installing MySQL or RMySQL. It seems that 
there are other people with similar problems, although I could not find 
any hint on how to try to solve the problem.
Any help, hint or advice would be greatly appreciated.

Thanks,

Clara Anton

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Prof Brian Ripley
On Thu, 9 Aug 2007, Charles C. Berry wrote:

 On Thu, 9 Aug 2007, Michael Cassin wrote:

 I really appreciate the advice and this database solution will be useful to
 me for other problems, but in this case I  need to address the specific
 problem of scan and read.* using so much memory.

 Is this expected behaviour?

Yes, and documented in the 'R Internals' manual.  That is basic reading 
for people wishing to comment on efficiency issues in R.

 Can the memory usage be explained, and can it be
 made more efficient?  For what it's worth, I'd be glad to try to help if the
 code for scan is considered to be worth reviewing.

 Mike,

 This does not seem to be an issue with scan() per se.

 Notice the difference in size of big2, big3, and bigThree here:

 big2 - rep(letters,length=1e6)
 object.size(big2)/1e6
 [1] 4.000856
 big3 - paste(big2,big2,sep='')
 object.size(big3)/1e6
 [1] 36.2

On a 32-bit computer every R object has an overhead of 24 or 28 bytes. 
Character strings are R objects, but in some functions such as rep (and 
scan for up to 10,000 distinct strings) the objects can be shared.  More 
string objects will be shared in 2.6.0 (but factors are designed to be 
efficient at storing character vectors with few values).

On a 64-bit computer the overhead is usually double.  So I would expect 
just over 56 bytes/string for distinct short strings (and that is what 
big3 gives).

But 56Mb is really not very much (tiny on a 64-bit computer), and 1 
million items is a lot.

[...]


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] S4 based package giving strange error at install time, but not at check time

2007-08-09 Thread Rajarshi Guha
Hi, I have a S4 based package package that was loading fine on R  
2.5.0 on both OS X and
Linux. I was checking the package against 2.5.1 and doing R CMD check
does not give any warnings. So I next built the package and installed
it. Though the package installed fine I noticed the following message:

Loading required package: methods
Error in loadNamespace(package, c(which.lib.loc, lib.loc),  
keep.source = keep.source) :
 in 'fingerprint' methods specified for export, but none  
defined: fold, euc.vector, distance, random.fingerprint,  
as.character, length, show
During startup - Warning message:
package fingerprint in options(defaultPackages) was not found

However, I can load the package in R with no errors being reported and
it seems that the functions are working fine.

Looking at the sources I see that my NAMESPACES file contains the
following:

importFrom(methods)
exportClasses(fingerprint)
exportMethods(fold, euc.vector, distance, random.fingerprint,
as.character, length, show)
export(fp.sim.matrix, fp.to.matrix, fp.factor.matrix,
fp.read.to.matrix, fp.read, moe.lf, bci.lf, cdk.lf)

and all the exported methods are defined. As an example consider the
'fold' method. It's defined as

setGeneric(fold, function(fp) standardGeneric(fold))
setMethod(fold, fingerprint,
   function(fp) {
 ## code for the function snipped
   })

Since the method has been defined I can't see why I should see the
error during install time, but nothing when the package is checked.

Any pointers would be appreciated.

---
Rajarshi Guha  [EMAIL PROTECTED]
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
---
Bus error -- driver executed.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] small sample techniques

2007-08-09 Thread Greg Snow
30 is not 30% of 300 (it is 10%), so your prop.test below is testing
something different from your hand calculations.  Try:

 prop.test(c(.30,.23)*300,c(300,300), correct=FALSE)

2-sample test for equality of proportions without continuity
correction

data:  c(0.3, 0.23) * 300 out of c(300, 300) 
X-squared = 3.7736, df = 1, p-value = 0.05207
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.000404278  0.140404278 
sample estimates:
prop 1 prop 2 
  0.30   0.23 

 sqrt(3.7736)
[1] 1.942576

Notice that the square root of the X-squared value matches your hand
calculations (with rounding error).  This is true if Yates continuty
correction is not used (the correct=FALSE in the call to prop.test).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Nair, 
 Murlidharan T
 Sent: Thursday, August 09, 2007 1:02 PM
 To: Nordlund, Dan (DSHS/RDA); r-help@stat.math.ethz.ch
 Subject: Re: [R] small sample techniques
 
 n=300
 30% taking A relief from pain
 23% taking B relief from pain
 Question; If there is no difference are we likely to get a 7% 
 difference?
 
 Hypothesis
 H0: p1-p2=0
 H1: p1-p2!=0 (not equal to)
 
 1Weighed average of two sample proportion
 300(0.30)+300(0.23)
 --- = 0.265
   300+300
 2Std Error estimate of the difference between two independent 
 2proportions
   sqrt((0.265 *0.735)*((1/300)+(1/300))) = 0.03603
 
 3Evaluation of the difference between sample proportion as a 
 deviation 
 3from the hypothesized difference of zero
  ((0.30-0.23)-(0))/0.03603 = 1.94
 
 
 z did not approach 1.96 hence H0 is not rejected. 
 
 This is what I was trying to do using prop.test. 
 
 prop.test(c(30,23),c(300,300)) 
 
 What function should I use? 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Nordlund, 
 Dan (DSHS/RDA)
 Sent: Thu 8/9/2007 1:26 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] small sample techniques
  
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Nair, 
  Murlidharan T
  Sent: Thursday, August 09, 2007 9:19 AM
  To: Moshe Olshansky; Rolf Turner; r-help@stat.math.ethz.ch
  Subject: Re: [R] small sample techniques
  
  Thanks, that discussion was helpful. Well, I have another 
 question I 
  am comparing two proportions for its deviation from the 
 hypothesized 
  difference of zero. My manually calculated z ratio is 1.94.
  But, when I calculate it using prop.test, it uses Pearson's 
  chi-squared test and the X-squared value that it gives it 0.74. Is 
  there a function in R where I can calculate the z ratio? Which is
  
  
 ('p1-'p2)-(p1-p2)
   Z= 
   S
  ('p1-'p2)
  
  Where S is the standard error estimate of the difference 
 between two 
  independent proportions
  
  Dummy example
  This is how I use it
  prop.test(c(30,23),c(300,300))
  
  
  Cheers../Murli
  
  
 
 Murli,
 
 I think you need to recheck you computations.  You can run a 
 t-test on your data in a variety of ways.  Here is one: 
 
  x-c(rep(1,30),rep(0,270))
  y-c(rep(1,23),rep(0,277))
  t.test(x,y)
 
 Welch Two Sample t-test
 
 data:  x and y
 t = 1.0062, df = 589.583, p-value = 0.3147 alternative 
 hypothesis: true difference in means is not equal to 0
 95 percent confidence interval:
  -0.02221086  0.06887752
 sample estimates:
  mean of x  mean of y
 0.1000 0.0767 
 
 Hope this is helpful,
 
 Dan
 
 Daniel J. Nordlund
 Research and Data Analysis
 Washington State Department of Social and Health Services 
 Olympia, WA  98504-5204
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
The examples were just artificially created data.  We don't know what the
real case is but if each entry is distinct then factors won't help; however,
if they are not distinct then there is a huge potential savings.  Also
if they are
really numeric, as in your example, then storing them as numeric rather than
character or factor could give substantial savings.  So it all depends on the
nature of the data but the way its stored does seem to make a potentially
large difference.

 # distinct elements
 res - as.character(1:1e6)
 object.size(res)/1e6
[1] 36.2
 object.size(as.factor(res))/1e6
[1] 40.00022
 object.size(as.numeric(res))/1e6
[1] 8.24

 # non-distinct elements
 res2 - as.character(rep(1:100, length = 1e6))
 object.size(res2)/1e6
[1] 36.2
 object.size(as.factor(res2))/1e6
[1] 4.003824
 object.size(as.numeric(res2))/1e6
[1] 8.24




On 8/9/07, Charles C. Berry [EMAIL PROTECTED] wrote:

 I do not see how this helps Mike's case:

  res - (as.character(1:1e6))
  object.size(res)
 [1] 3624
  object.size(as.factor(res))
 [1] 4224


 Anyway, my point was that if two character vectors for which all.equal()
 yields TRUE can differ by almost an order of magnitude in object.size(),
 and the smaller of the two was read in by scan(), then Mike will have to
 dig deeper than scan() to see how to reduce the size of a character vector
 in R.


 On Thu, 9 Aug 2007, Gabor Grothendieck wrote:

  Try it as a factor:
 
  big2 - rep(letters,length=1e6)
  object.size(big2)/1e6
  [1] 4.000856
  object.size(as.factor(big2))/1e6
  [1] 4.001184
 
  big3 - paste(big2,big2,sep='')
  object.size(big3)/1e6
  [1] 36.2
  object.size(as.factor(big3))/1e6
  [1] 4.001184
 
 
  On 8/9/07, Charles C. Berry [EMAIL PROTECTED] wrote:
  On Thu, 9 Aug 2007, Michael Cassin wrote:
 
  I really appreciate the advice and this database solution will be useful 
  to
  me for other problems, but in this case I  need to address the specific
  problem of scan and read.* using so much memory.
 
  Is this expected behaviour? Can the memory usage be explained, and can it 
  be
  made more efficient?  For what it's worth, I'd be glad to try to help if 
  the
  code for scan is considered to be worth reviewing.
 
  Mike,
 
  This does not seem to be an issue with scan() per se.
 
  Notice the difference in size of big2, big3, and bigThree here:
 
  big2 - rep(letters,length=1e6)
  object.size(big2)/1e6
  [1] 4.000856
  big3 - paste(big2,big2,sep='')
  object.size(big3)/1e6
  [1] 36.2
 
  cat(big2, file='lotsaletters.txt', sep='\n')
  bigTwo - scan('lotsaletters.txt',what='')
  Read 100 items
  object.size(bigTwo)/1e6
  [1] 4.000856
  cat(big3, file='moreletters.txt', sep='\n')
  bigThree - scan('moreletters.txt',what='')
  Read 100 items
  object.size(bigThree)/1e6
  [1] 4.000856
  all.equal(big3,bigThree)
  [1] TRUE
 
 
  Chuck
 
  p.s.
  version
 _
  platform   i386-pc-mingw32
  arch   i386
  os mingw32
  system i386, mingw32
  status
  major  2
  minor  5.1
  year   2007
  month  06
  day27
  svn rev42083
  language   R
  version.string R version 2.5.1 (2007-06-27)
 
 
 
  Regards, Mike
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 
  Just one other thing.
 
  The command in my prior post reads the data into an in-memory database.
  If you find that is a problem then you can read it into a disk-based
  database by adding the dbname argument to the sqldf call
  naming the database.  The database need not exist.  It will
  be created by sqldf and then deleted when its through:
 
  DF - sqldf(select * from f, dbname = tempfile(),
file.format = list(header = TRUE, row.names = FALSE))
 
 
  On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Another thing you could try would be reading it into a data base and
  then
  from there into R.
 
  The devel version of sqldf has this capability.   That is it will use
  RSQLite
  to read the file directly into the database without going through R at
  all
  and then read it from there into R so its a completely different
  process.
  The RSQLite software has no capability of dealing with quotes (they will
  be regarded as ordinary characters) but a single gsub can remove them
  afterwards.  This won't work if there are commas within the quotes but
  in that case you could read each row as a single record and then
  split it yourself in R.
 
  Try this
 
  library(sqldf)
  # next statement grabs the devel version software that does this
  source(http://sqldf.googlecode.com/svn/trunk/R/sqldf.R;)
 
  gc()
  f - file(big.csv)
  DF - sqldf(select * from f, file.format = list(header = TRUE,
  row.names = FALSE))
  gc()
 
  For more info see the man page from the devel version and the home page:
 
  http://sqldf.googlecode.com/svn/trunk/man/sqldf.Rd
  http://code.google.com/p/sqldf/
 
 
  On 8/9/07, Michael Cassin [EMAIL PROTECTED] wrote:
  Thanks for looking, 

Re: [R] RMySQL loading error

2007-08-09 Thread Prof Brian Ripley
On Thu, 9 Aug 2007, Clara Anton wrote:

 Hi,

 I am having problems loading RMySQL.

 I am using MySQL 5.0,  R version 2.5.1, and RMySQL with Windows XP.

More exact versions would be helpful.

 When I try to load rMySQL I get the following error:

  require(RMySQL)
 Loading required package: RMySQL
 Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
 'C:/PROGRA~1/R/R-25~1.1/library/RMySQL/libs/RMySQL.dll':
  LoadLibrary failure:  Invalid access to memory location.


 I did not get any errors while installing MySQL or RMySQL. It seems that
 there are other people with similar problems, although I could not find
 any hint on how to try to solve the problem.

It is there, unfortunately along with a lot of uniformed speculation.

 Any help, hint or advice would be greatly appreciated.

The most likely solution is to update (or downdate) your MySQL.  You 
possibly got RMySQL from the CRAN Extras site, and if so this is covered 
in the ReadMe there:

   The build of RMySQL_0.6-0 is known to work with MySQL 5.0.21 and 5.0.45,
   and known not to work (it crashes on startup) with 5.0.41.

Usually the message is the one you show, but I have seen R crash.  The 
issue is the MySQL client DLL: that from 5.0.21 or 5.0.45 works in 5.0.41.

All the reports of problems I have seen are for MySQL versions strictly 
between 5.0.21 and 5.0.45.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL loading error

2007-08-09 Thread Gabor Grothendieck
This was just discussed:

https://www.stat.math.ethz.ch/pipermail/r-help/2007-August/138142.html

On 8/9/07, Clara Anton [EMAIL PROTECTED] wrote:
 Hi,

 I am having problems loading RMySQL.

 I am using MySQL 5.0,  R version 2.5.1, and RMySQL with Windows XP.
 When I try to load rMySQL I get the following error:

   require(RMySQL)
 Loading required package: RMySQL
 Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
 'C:/PROGRA~1/R/R-25~1.1/library/RMySQL/libs/RMySQL.dll':
  LoadLibrary failure:  Invalid access to memory location.


 I did not get any errors while installing MySQL or RMySQL. It seems that
 there are other people with similar problems, although I could not find
 any hint on how to try to solve the problem.
 Any help, hint or advice would be greatly appreciated.

 Thanks,

 Clara Anton

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tukey HSD

2007-08-09 Thread Kurt Debono

Hi,
I was wondering if you could help me:
The following are the first few lines of my data set:

subject group condition depvar
s1  c   ver 114.87
s1  c   feet114.87
s1  c   body114.87
s2  c   ver 73.54
s2  c   feet64.32
s2  c   body61.39
s3  a   ver 114.87
s3  a   feet97.21
s3  a   body103.31 etc.

I entered the following ANOVA command:


dat - read.table(mydata.txt, header=T)
summary(aov(depvar ~ group * condition + Error(subject), data=dat))


Error: subject
 Df  Sum Sq Mean Sq F value Pr(F)
group  1   443.3   443.3  1.0314 0.3185
Residuals 28 12035.3   429.8

Error: Within
   Df  Sum Sq Mean Sq F value   Pr(F)
condition2  615.82  307.91  6.6802 0.002501 **
group:condition  2   61.51   30.75  0.6672 0.517168
Residuals   56 2581.18   46.09
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1





I cannot find a way to perform a Tukey HSD on the main effect of condition. 
since the ANOVA formula contains the 'error' command.


Could you help me please?
Kurt

_
[[trailing spam removed]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] S4 based package giving strange error at install time, but not at check time

2007-08-09 Thread Prof Brian Ripley
On Thu, 9 Aug 2007, Rajarshi Guha wrote:

 Hi, I have a S4 based package package that was loading fine on R
 2.5.0 on both OS X and
 Linux. I was checking the package against 2.5.1 and doing R CMD check
 does not give any warnings. So I next built the package and installed
 it. Though the package installed fine I noticed the following message:

 Loading required package: methods
 Error in loadNamespace(package, c(which.lib.loc, lib.loc),
 keep.source = keep.source) :
 in 'fingerprint' methods specified for export, but none
 defined: fold, euc.vector, distance, random.fingerprint,
 as.character, length, show
 During startup - Warning message:
 package fingerprint in options(defaultPackages) was not found
   ^^^

Do you have this package in your startup files or the environment variable 
R_DEFAULT_PACKAGES?  R CMD check should not look there: whatever you are 
quoting above seems to.

 However, I can load the package in R with no errors being reported and
 it seems that the functions are working fine.

 Looking at the sources I see that my NAMESPACES file contains the
 following:

 importFrom(methods)

That should specify what to import, or be imports(methods).  See 
'Writing R Extensions'.

 exportClasses(fingerprint)
 exportMethods(fold, euc.vector, distance, random.fingerprint,
 as.character, length, show)
 export(fp.sim.matrix, fp.to.matrix, fp.factor.matrix,
 fp.read.to.matrix, fp.read, moe.lf, bci.lf, cdk.lf)

 and all the exported methods are defined. As an example consider the
 'fold' method. It's defined as

 setGeneric(fold, function(fp) standardGeneric(fold))
 setMethod(fold, fingerprint,
   function(fp) {
 ## code for the function snipped
   })

 Since the method has been defined I can't see why I should see the
 error during install time, but nothing when the package is checked.

 Any pointers would be appreciated.

 ---
 Rajarshi Guha  [EMAIL PROTECTED]
 GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
 ---
 Bus error -- driver executed.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory problem

2007-08-09 Thread Gang Chen
It seems the problem lies in this line:

 try(fit.lme - lme(Beta ~ group*session*difficulty+FTND, random =  
 ~1|Subj, Model), tag - 1);

As lme fails for most iterations in the loop, the 'try' function  
catches one error message for each failed iteration. But the puzzling  
part is, why does the memory usage keep accumulating? Does each error  
message keep stored accumulatively in the buffer or something else?  
Or something is wrong with the way I'm using 'try'?

Thanks,
Gang


On Aug 9, 2007, at 10:36 AM, Gang Chen wrote:

 I got a long list of error message repeating with the following 3
 lines when running the loop at the end of this mail:

 R(580,0xa000ed88) malloc: *** vm_allocate(size=327680) failed (error
 code=3)
 R(580,0xa000ed88) malloc: *** error: can't allocate region
 R(580,0xa000ed88) malloc: *** set a breakpoint in szone_error to debug

 There are 2 big arrays, IData (54x64x50x504) and Stat (4x64x50x9), in
 the code. They would only use about 0.8GB of memory. However when I
 check the memory usage during the looping, the memory usage keeps
 growing and finally reaches the memory limit of my computer, 4GB, and
 spills the above error message.

 Is there something in the loop about lme that is causing memory
 leaking? How can I clean up the memory usage in the loop?

 Thank you very much for your help,
 Gang

 


 tag - 0; dimx-54; dimy-64; dimz-50; NoF-8; NoFile-504;

 IData - array(data=NA, dim=c(dimx, dimy, dimz, NoFile));
 Stat - array(data=NA, dim=c(dimx, dimy, dimz, NoF));

 for (i in 1:NoFile) {
 IData[,,,i] - fill in the data for array IData here;
 }

 for (i in 1:dimx) {
 for (j in 1:dimy) {
 for (k in 1:dimz) {
 for (m in 1:NoFile) {
   Model$Beta[m] - IData[i, j, k, m];
 }
 try(fit.lme - lme(Beta ~ group*session*difficulty+FTND, random =
 ~1|Subj, Model), tag - 1);
 if (tag != 1) {
  Stat[i, j, k,] - anova(fit.lme)$F[-1];  
 }
 else {
 Stat[i, j, k,] - rep(0, NoF-1);
 }
 tag - 0;
 }
 }
 }

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Systematically biased count data regression model

2007-08-09 Thread Matthew and Kim Bowser
Dear Paul,

Thank you very much for your comment.  I will apply the 'latent'
approach you suggested.

Sincerely,

Matthew Bowser

On 8/9/07, paulandpen [EMAIL PROTECTED] wrote:
 Matthew

 it is possible that your results are suffering from heterogeneity, it may be
 that your model performs well at the aggregate level and this would explain
 good aggregate fit levels and decent predictive performance etc,

 you could perhaps look at a 'latent' approach to modelling your data, in
 other words, see if there is something unique in the cases/data/observations
 in the lower and upper levels of the model (where prediction is poor) and
 whether it is justified that you model these count areas as spearate and
 unique from the generic aggregate level model (in other words there is
 something unobserved/unmeasurted or latent etc in your popn of observations
 that could causing some observations to behave uniquely overall

 hth

 thanks Paul
 - Original Message -
 From: Matthew and Kim Bowser [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Friday, August 10, 2007 1:43 AM
 Subject: [R] Systematically biased count data regression model


  Dear all,
 
  I am attempting to explain patterns of arthropod family richness
  (count data) using a regression model.  It seems to be able to do a
  pretty good job as an explanatory model (i.e. demonstrating
  relationships between dependent and independent variables), but it has
  systematic problems as a predictive model:  It is biased high at low
  observed values of family richness and biased low at high observed
  values of family richness (see attached pdf).  I have tried diverse
  kinds of reasonable regression models mostly as in Zeileis, et al.
  (2007), as well as transforming my variables, both with only small
  improvements.
 
  Do you have suggestions for making a model that would perform better
  as a predictive model?
 
  Thank you for your time.
 
  Sincerely,
 
  Matthew Bowser
 
  STEP student
  USFWS Kenai National Wildlife Refuge
  Soldotna, Alaska, USA
 
  M.Sc. student
  University of Alaska Fairbanks
  Fairbankse, Alaska, USA
 
  Reference
 
  Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
  count data in R. Technical Report 53, Department of Statistics and
  Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
  http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.
 
  Code
 
  `data` -
  structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
  9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
  12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
  1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
  5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
  5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
  7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
  10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
  3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
  16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
  4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
  6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
  0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
  2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
  159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
  175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
  161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
  165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
  165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
  175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
  167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
  178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
  173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
  170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
  170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
  162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
  166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
  172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
  171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
  171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
  160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
  176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
  168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
  166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
  0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
  0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
  0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
  0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
  0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
  0.289, 0.475, 

[R] a question on lda{MASS}

2007-08-09 Thread Weiwei Shi
hi,

assume
val is the test data while m is lda model value by using CV=F

x = predict(m, val)

val2 = val[, 1:(ncol(val)-1)] # the last column is class label

# col is sample, row is variable

then I am wondering if

x$x == (apply(val2*m$scaling), 2, sum)

i.e., the scaling (is it coeff vector?) times val data and sum is the
discrimant result $x?

Thanks.

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot table with sapply - labeling problems

2007-08-09 Thread jim holtman
Here is a modified script that should work.  In many cases where you
want the names of the element of the list you are processing, you
should work with the names:

test-as.data.frame(cbind(round(runif(50,0,5)),round(runif(50,0,3)),round(runif(50,0,4
sapply(test, table)-vardist
sapply(test, function(x) round(table(x)/sum(table(x))*100,1) )-vardist1
  par(mfrow=c(1,3))
# you need to use the 'names' and then index into the variable
# your original 'x' did not have a names associated with it
sapply(names(vardist1), function(x) barplot(vardist1[[x]],
ylim=c(0,100),main=Varset1,xlab=x))
  par(mfrow=c(1,1))



On 8/9/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi List,

 I am trying to label a barplot group with variable names when using
 sapply unsucessfully.
 I can't seem to extract the names for the indiviual plots:

 test-as.data.frame(cbind(round(runif(50,0,5)),round(runif(50,0,3)),roun
 d(runif(50,0,4
 sapply(test, table)-vardist
 sapply(test, function(x) round(table(x)/sum(table(x))*100,1) )-vardist1
   par(mfrow=c(1,3))
 sapply(vardist1, function(x) barplot(x,
 ylim=c(0,100),main=Varset1,xlab=names(x)))
   par(mfrow=c(1,1))

 Names don't show up although names(vardist) works.

 Also I would like to put a single Title on this plot instead of
 repeating Varset three times.

 Any hints appreciated.

 Thanx
 Herry

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Systematically biased count data regression model

2007-08-09 Thread Matthew and Kim Bowser
Dear all,

I received a very helpful response from someone who requested
anonymity, but to whom I am grateful.

PLEASE do not quote my name or email (I am trying to stay off spam lists)

Matthew:  I think this is just a reflection of
the fact the model does not fit perfectly.  The
example below is a simple linear regression that
is highly significant but has R-square of
16%.  This model as well is biased high at low
observed values of y and biased low at high values observed values of y

set.seed(1)
n - 200
m - data.frame(x=rnorm(n,mean=10,sd=2))
m$y - m$x + rnorm(n,sd=4)# simulate using intercept 0, slope 1
f - lm(y ~ x,data=m)
print(summary(f))
#
# Call:
# lm(formula = y ~ x, data = m)
#
# Residuals:
#  Min   1Q   Median   3Q  Max
# -11.7310  -2.1709  -0.1009   2.6733  10.3446
#
# Coefficients:
# Estimate Std. Error t value Pr(|t|)
# (Intercept)   0.6274 1.5830   0.3960.692
# x 0.9538 0.1546   6.170 3.77e-09 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 4.052 on 198 degrees of freedom
# Multiple R-Squared: 0.1613, Adjusted R-squared: 0.157
# F-statistic: 38.07 on 1 and 198 DF,  p-value: 3.773e-09
#
plot(m$y,f$fitted.values,xlab=Observed,ylab=Predicted)
lines(lowess(m$y,f$fitted.values),col=red,lty=2)
abline(c(0,1))
legend(topleft,lty=c(2,1),col=c(red,black),legend=c(Loess,45-degree))
- Show quoted text -

At 2007-08-09  08:43, Matthew and Kim Bowser wrote:
Dear all,

I am attempting to explain patterns of arthropod family richness
(count data) using a regression model.  It seems to be able to do a
pretty good job as an explanatory model (i.e. demonstrating
relationships between dependent and independent variables), but it has
systematic problems as a predictive model:  It is biased high at low
observed values of family richness and biased low at high observed
values of family richness (see attached pdf).  I have tried diverse
kinds of reasonable regression models mostly as in Zeileis, et al.
(2007), as well as transforming my variables, both with only small
improvements.

Do you have suggestions for making a model that would perform better
as a predictive model?

Thank you for your time.

Sincerely,

Matthew Bowser

STEP student
USFWS Kenai National Wildlife Refuge
Soldotna, Alaska, USA

M.Sc. student
University of Alaska Fairbanks
Fairbankse, Alaska, USA

Reference

Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
count data in R. Technical Report 53, Department of Statistics and
Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

[snip]


#This appears to be a decent explanatory model, but as a predictive
model it is systematically biased.  It is biased high at low observed
values of D and biased low at high values observed values of D.

- Show quoted text -

On 8/9/07, Matthew and Kim Bowser [EMAIL PROTECTED] wrote:
 Dear all,

 I am attempting to explain patterns of arthropod family richness
 (count data) using a regression model.  It seems to be able to do a
 pretty good job as an explanatory model (i.e. demonstrating
 relationships between dependent and independent variables), but it has
 systematic problems as a predictive model:  It is biased high at low
 observed values of family richness and biased low at high observed
 values of family richness (see attached pdf).  I have tried diverse
 kinds of reasonable regression models mostly as in Zeileis, et al.
 (2007), as well as transforming my variables, both with only small
 improvements.

 Do you have suggestions for making a model that would perform better
 as a predictive model?

 Thank you for your time.

 Sincerely,

 Matthew Bowser

 STEP student
 USFWS Kenai National Wildlife Refuge
 Soldotna, Alaska, USA

 M.Sc. student
 University of Alaska Fairbanks
 Fairbankse, Alaska, USA

 Reference

 Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
 count data in R. Technical Report 53, Department of Statistics and
 Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
 http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

 Code

 `data` -
 structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 

[R] odfWeave processing error, file specific

2007-08-09 Thread Aric Gregson
Hello, 

I hope there is a simple explanation for this. I have been using
odfWeave with great satisfaction in R 2.5.0. Unfortunately, I cannot
get beyond the following error message with a particular file. I have
copied and pasted into new files and the same error pops up. It looks
like the error is occurring before any of the R code is run (?).

Any suggestions on how to track this down and fix it?

odfWeave('balf.odt', 'balfout.odt')
  Copying  balf.odt
  Setting wd to  /tmp/Rtmpz0aWPf/odfWeave09155238949
  Unzipping ODF file using unzip -o balf.odt
Archive:  balf.odt
 extracting: mimetype
   creating: Configurations2/statusbar/
  inflating: Configurations2/accelerator/current.xml
   creating: Configurations2/floater/
   creating: Configurations2/popupmenu/
   creating: Configurations2/progressbar/
   creating: Configurations2/menubar/
   creating: Configurations2/toolbar/
   creating: Configurations2/images/Bitmaps/
  inflating: layout-cache
  inflating: content.xml
  inflating: styles.xml
  inflating: meta.xml
  inflating: Thumbnails/thumbnail.png
  inflating: settings.xml
  inflating: META-INF/manifest.xml

  Removing  balf.odt
  Creating a Pictures directory

  Pre-processing the contents
Error: cc$parentId == parentId is not TRUE

Thanks,

aric

--
IMPORTANT WARNING:  This email (and any attachments) is only...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting by number of observations in a factor

2007-08-09 Thread Ron Crump
Hi,

I generally do my data preparation externally to R, so I
this is a bit unfamiliar to me, but a colleague has asked
me how to do certain data manipulations within R.

Anyway, basically I can get his large file into a dataframe.
One of the columns is a management group code (mg). There may be
varying numbers of observations per management group, and
he would like to subset the dataframe such that there are
always at least n per management group.

I presume I can get to this using table or tapply, then
(and I'm not sure how on this bit) creating a column nmg
containing the number of observations that corresponds to
mg for that row, then simply subsetting.

So, am I on the right track? If so how do I actually do it, and
is there an easier method than I am considering.

Thanks for your help,
Ron

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Systematically biased count data regression model

2007-08-09 Thread Gabor Grothendieck
Perhaps you don't really need to predict the precise count.
Maybe its good enough to predict whether the count is above
or below average.  In that case the model is 74% correct on a
holdout sample of the last 54 points based on a model of the
first 200 points.

 # create model on first 200 and predict on rest
 DD - data$D  mean(data$D)
 mod - glm(DD ~., data[-1], family = binomial, subset = 1:200)
 tab - table(predict(mod, data[201:254,-1], type = resp)  .5, DD[201:254])
 sum(tab * diag(2)) / sum(tab)
[1] 0.7407407


On 8/9/07, Matthew and Kim Bowser [EMAIL PROTECTED] wrote:
 Dear all,

 I am attempting to explain patterns of arthropod family richness
 (count data) using a regression model.  It seems to be able to do a
 pretty good job as an explanatory model (i.e. demonstrating
 relationships between dependent and independent variables), but it has
 systematic problems as a predictive model:  It is biased high at low
 observed values of family richness and biased low at high observed
 values of family richness (see attached pdf).  I have tried diverse
 kinds of reasonable regression models mostly as in Zeileis, et al.
 (2007), as well as transforming my variables, both with only small
 improvements.

 Do you have suggestions for making a model that would perform better
 as a predictive model?

 Thank you for your time.

 Sincerely,

 Matthew Bowser

 STEP student
 USFWS Kenai National Wildlife Refuge
 Soldotna, Alaska, USA

 M.Sc. student
 University of Alaska Fairbanks
 Fairbankse, Alaska, USA

 Reference

 Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
 count data in R. Technical Report 53, Department of Statistics and
 Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
 http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

 Code

 `data` -
 structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318,
 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2,
 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38,
 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288,
 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162,
 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306,
 0.28, 0.279, 0.208, 

Re: [R] Systematically biased count data regression model

2007-08-09 Thread paulandpen
Matthew,

In response to that post, I am afraid I have to disagree.  I think a poor model 
fit (eg 16%) is a reflection of a lot of unmeasured factors and therefore 
random error in the model.  This would explain why overall predictive 
performance is poor (eg a lot of error in the model)  Your situation is 
different.  You are having trouble predicting extreme values, so there is 
something systematic (eg your model works well in the middle and worse at the 
tails) not poorly overall.

As the post does reflect, you are suffering from error in prediction, that is a 
fact of life, as others have stated, and most of us who suffer from prediction 
error experience it at the more extreme values.

Thanks Paul   



 Matthew and Kim Bowser [EMAIL PROTECTED] wrote:
 
 Dear all,
 
 I received a very helpful response from someone who requested
 anonymity, but to whom I am grateful.
 
 PLEASE do not quote my name or email (I am trying to stay off spam 
 lists)
 
 Matthew:  I think this is just a reflection of
 the fact the model does not fit perfectly.  The
 example below is a simple linear regression that
 is highly significant but has R-square of
 16%.  This model as well is biased high at low
 observed values of y and biased low at high values observed values of y
 
 set.seed(1)
 n - 200
 m - data.frame(x=rnorm(n,mean=10,sd=2))
 m$y - m$x + rnorm(n,sd=4)# simulate using intercept 0, slope 1
 f - lm(y ~ x,data=m)
 print(summary(f))
 #
 # Call:
 # lm(formula = y ~ x, data = m)
 #
 # Residuals:
 #  Min   1Q   Median   3Q  Max
 # -11.7310  -2.1709  -0.1009   2.6733  10.3446
 #
 # Coefficients:
 # Estimate Std. Error t value Pr(|t|)
 # (Intercept)   0.6274 1.5830   0.3960.692
 # x 0.9538 0.1546   6.170 3.77e-09 ***
 # ---
 # Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 #
 # Residual standard error: 4.052 on 198 degrees of freedom
 # Multiple R-Squared: 0.1613, Adjusted R-squared: 0.157
 # F-statistic: 38.07 on 1 and 198 DF,  p-value: 3.773e-09
 #
 plot(m$y,f$fitted.values,xlab=Observed,ylab=Predicted)
 lines(lowess(m$y,f$fitted.values),col=red,lty=2)
 abline(c(0,1))
 legend(topleft,lty=c(2,1),col=c(red,black),legend=c(Loess,45-deg
 ree))
 - Show quoted text -
 
 At 2007-08-09  08:43, Matthew and Kim Bowser wrote:
 Dear all,
 
 I am attempting to explain patterns of arthropod family richness
 (count data) using a regression model.  It seems to be able to do a
 pretty good job as an explanatory model (i.e. demonstrating
 relationships between dependent and independent variables), but it has
 systematic problems as a predictive model:  It is biased high at low
 observed values of family richness and biased low at high observed
 values of family richness (see attached pdf).  I have tried diverse
 kinds of reasonable regression models mostly as in Zeileis, et al.
 (2007), as well as transforming my variables, both with only small
 improvements.
 
 Do you have suggestions for making a model that would perform better
 as a predictive model?
 
 Thank you for your time.
 
 Sincerely,
 
 Matthew Bowser
 
 STEP student
 USFWS Kenai National Wildlife Refuge
 Soldotna, Alaska, USA
 
 M.Sc. student
 University of Alaska Fairbanks
 Fairbankse, Alaska, USA
 
 Reference
 
 Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
 count data in R. Technical Report 53, Department of Statistics and
 Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
 http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.
 
 [snip]
 
 
 #This appears to be a decent explanatory model, but as a predictive
 model it is systematically biased.  It is biased high at low observed
 values of D and biased low at high values observed values of D.
 
 - Show quoted text -
 
 On 8/9/07, Matthew and Kim Bowser [EMAIL PROTECTED] wrote:
  Dear all,
 
  I am attempting to explain patterns of arthropod family richness
  (count data) using a regression model.  It seems to be able to do a
  pretty good job as an explanatory model (i.e. demonstrating
  relationships between dependent and independent variables), but it has
  systematic problems as a predictive model:  It is biased high at low
  observed values of family richness and biased low at high observed
  values of family richness (see attached pdf).  I have tried diverse
  kinds of reasonable regression models mostly as in Zeileis, et al.
  (2007), as well as transforming my variables, both with only small
  improvements.
 
  Do you have suggestions for making a model that would perform better
  as a predictive model?
 
  Thank you for your time.
 
  Sincerely,
 
  Matthew Bowser
 
  STEP student
  USFWS Kenai National Wildlife Refuge
  Soldotna, Alaska, USA
 
  M.Sc. student
  University of Alaska Fairbanks
  Fairbankse, Alaska, USA
 
  Reference
 
  Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
  count data in R. Technical Report 53, Department of Statistics and
  Mathematics, 

Re: [R] Systematically biased count data regression model

2007-08-09 Thread Gabor Grothendieck
I guess I should not have been so quick to make that conclusion since
it seems that 74% of the values in the holdout set are FALSE so simply
guessing FALSE for each one would give us 74% accuracy:

 table(DD[201:254])

FALSE  TRUE
   4014

 40/54
[1] 0.7407407


On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Perhaps you don't really need to predict the precise count.
 Maybe its good enough to predict whether the count is above
 or below average.  In that case the model is 74% correct on a
 holdout sample of the last 54 points based on a model of the
 first 200 points.

  # create model on first 200 and predict on rest
  DD - data$D  mean(data$D)
  mod - glm(DD ~., data[-1], family = binomial, subset = 1:200)
  tab - table(predict(mod, data[201:254,-1], type = resp)  .5, 
  DD[201:254])
  sum(tab * diag(2)) / sum(tab)
 [1] 0.7407407


 On 8/9/07, Matthew and Kim Bowser [EMAIL PROTECTED] wrote:
  Dear all,
 
  I am attempting to explain patterns of arthropod family richness
  (count data) using a regression model.  It seems to be able to do a
  pretty good job as an explanatory model (i.e. demonstrating
  relationships between dependent and independent variables), but it has
  systematic problems as a predictive model:  It is biased high at low
  observed values of family richness and biased low at high observed
  values of family richness (see attached pdf).  I have tried diverse
  kinds of reasonable regression models mostly as in Zeileis, et al.
  (2007), as well as transforming my variables, both with only small
  improvements.
 
  Do you have suggestions for making a model that would perform better
  as a predictive model?
 
  Thank you for your time.
 
  Sincerely,
 
  Matthew Bowser
 
  STEP student
  USFWS Kenai National Wildlife Refuge
  Soldotna, Alaska, USA
 
  M.Sc. student
  University of Alaska Fairbanks
  Fairbankse, Alaska, USA
 
  Reference
 
  Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
  count data in R. Technical Report 53, Department of Statistics and
  Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
  http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.
 
  Code
 
  `data` -
  structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
  9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
  12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
  1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
  5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
  5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
  7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
  10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
  3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
  16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
  4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
  6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
  0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
  2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
  159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
  175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
  161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
  165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
  165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
  175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
  167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
  178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
  173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
  170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
  170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
  162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
  166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
  172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
  171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
  171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
  160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
  176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
  168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
  166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
  0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
  0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
  0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
  0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
  0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
  0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
  0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
  0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
  0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 

Re: [R] small sample techniques

2007-08-09 Thread Moshe Olshansky
Hi Murli,

First of all, regarding prop.test, you made a typo:
you should have used prop.test(c(69,90),c(300,300))
which gives you the squared value of 3.4228, and it's
square root is 1.85 which is not too far from 1.94.

I would use Fisher Exact Test (fisher.test).  Two
sided test has a p-value of 0.06411 so you do not
reject H0, One sided test (i.e. H1 is that the first
probability of success is smaller than the second) has
a p-value of 0.03206, so you reject H0 (with 95%
confidence level).
You get similar results with two-sided and one-sided
t-test.

Moshe.

P.S. if you use paired t-test you get nonsense since
it uses pairwise differences, and in your case only 21
of 300 differences are non-zero!

--- Nair, Murlidharan T [EMAIL PROTECTED] wrote:

 n=300
 30% taking A relief from pain
 23% taking B relief from pain
 Question; If there is no difference are we likely to
 get a 7% difference?
 
 Hypothesis
 H0: p1-p2=0
 H1: p1-p2!=0 (not equal to)
 
 1Weighed average of two sample proportion
 300(0.30)+300(0.23)
 --- = 0.265
   300+300
 2Std Error estimate of the difference between two
 independent proportions
   sqrt((0.265 *0.735)*((1/300)+(1/300))) =
 0.03603
 
 3Evaluation of the difference between sample
 proportion as a deviation from the hypothesized
 difference of zero
  ((0.30-0.23)-(0))/0.03603 = 1.94
 
 
 z did not approach 1.96 hence H0 is not rejected. 
 
 This is what I was trying to do using prop.test. 
 
 prop.test(c(30,23),c(300,300)) 
 
 What function should I use? 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of
 Nordlund, Dan (DSHS/RDA)
 Sent: Thu 8/9/2007 1:26 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] small sample techniques
  
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On
 Behalf Of Nair, 
  Murlidharan T
  Sent: Thursday, August 09, 2007 9:19 AM
  To: Moshe Olshansky; Rolf Turner;
 r-help@stat.math.ethz.ch
  Subject: Re: [R] small sample techniques
  
  Thanks, that discussion was helpful. Well, I have
 another question 
  I am comparing two proportions for its deviation
 from the hypothesized
  difference of zero. My manually calculated z ratio
 is 1.94. 
  But, when I calculate it using prop.test, it uses
 Pearson's 
  chi-squared
  test and the X-squared value that it gives it
 0.74. Is there 
  a function
  in R where I can calculate the z ratio? Which is 
  
  
 ('p1-'p2)-(p1-p2)
   Z= 
   S
  ('p1-'p2)
  
  Where S is the standard error estimate of the
 difference between two
  independent proportions
  
  Dummy example 
  This is how I use it 
  prop.test(c(30,23),c(300,300))
  
  
  Cheers../Murli
  
  
 
 Murli,
 
 I think you need to recheck you computations.  You
 can run a t-test on your data in a variety of ways. 
 Here is one: 
 
  x-c(rep(1,30),rep(0,270))
  y-c(rep(1,23),rep(0,277))
  t.test(x,y)
 
 Welch Two Sample t-test
 
 data:  x and y 
 t = 1.0062, df = 589.583, p-value = 0.3147
 alternative hypothesis: true difference in means is
 not equal to 0 
 95 percent confidence interval:
  -0.02221086  0.06887752 
 sample estimates:
  mean of x  mean of y 
 0.1000 0.0767 
 
 Hope this is helpful,
 
 Dan
 
 Daniel J. Nordlund
 Research and Data Analysis
 Washington State Department of Social and Health
 Services
 Olympia, WA  98504-5204
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman
Does this do what you want?  It creates a new dataframe with those
'mg' that have at least a certain number of observation.

 set.seed(2)
 # create some test data
 x - data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20)
 # split the data into subsets based on 'mg'
 x.split - split(x, x$mg)
 str(x.split)
List of 4
 $ A:'data.frame':  7 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 1 1 1 1 1 1 1
  ..$ data: int [1:7] 1 4 7 12 14 18 20
 $ B:'data.frame':  3 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 2 2 2
  ..$ data: int [1:3] 9 15 19
 $ C:'data.frame':  4 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 3 3 3 3
  ..$ data: int [1:4] 2 3 10 11
 $ D:'data.frame':  6 obs. of  2 variables:
  ..$ mg  : Factor w/ 4 levels A,B,C,D: 4 4 4 4 4 4
  ..$ data: int [1:6] 5 6 8 13 16 17
 # only choose subsets with at 5 observations
 x.5 - lapply(x.split, function(a) {
+ if (nrow(a) = 5) return(a)
+ else return(NULL)
+ })
 # create new dataframe with these observations
 x.new - do.call('rbind', x.5)
 x.new
 mg data
A.1   A1
A.4   A4
A.7   A7
A.12  A   12
A.14  A   14
A.18  A   18
A.20  A   20
D.5   D5
D.6   D6
D.8   D8
D.13  D   13
D.16  D   16
D.17  D   17




On 8/9/07, Ron Crump [EMAIL PROTECTED] wrote:
 Hi,

 I generally do my data preparation externally to R, so I
 this is a bit unfamiliar to me, but a colleague has asked
 me how to do certain data manipulations within R.

 Anyway, basically I can get his large file into a dataframe.
 One of the columns is a management group code (mg). There may be
 varying numbers of observations per management group, and
 he would like to subset the dataframe such that there are
 always at least n per management group.

 I presume I can get to this using table or tapply, then
 (and I'm not sure how on this bit) creating a column nmg
 containing the number of observations that corresponds to
 mg for that row, then simply subsetting.

 So, am I on the right track? If so how do I actually do it, and
 is there an easier method than I am considering.

 Thanks for your help,
 Ron

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Seasonality

2007-08-09 Thread Felix Andrews
?monthplot

?stl


On 8/10/07, Alberto Monteiro [EMAIL PROTECTED] wrote:
 I have a time series x = f(t), where t is taken for each
 month. What is the best function to detect if _x_ has a seasonal
 variation? If there is such seasonal effect, what is the
 best function to estimate it?

 Function arima has a seasonal parameter, but I guess this is
 too complex to be useful.

 Alberto Monteiro

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Felix Andrews / 安福立
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
voice:+86_1051404394 (in China)
mobile:+86_13522529265 (in China)
mobile:+61_410400963 (in Australia)
xmpp:[EMAIL PROTECTED]
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Systematically biased count data regression model

2007-08-09 Thread Gabor Grothendieck
Here is one other idea.   Since we are not doing that well with
the entire data set lets look at a portion and see if we can do
better there.  This line of code seems to show that D is related
to T:

plot(data)

so lets try conditioning D ~ T on all combos of the factor levels

library(lattice)
xyplot(D ~ T | Hemlock * Snow * Alpine, data, layout = c(2, 4)

from which it appears there is a much clearer association between
D and T when Alpine = 1.  Thus lets condition on Alpine = 1, run
it over again and eliminate the non-significant variables:

mod - glm.nb(D ~ Day + NDVI + T, data = data, subset = Alpine == 1)
summary(mod)

plot(data$D[data$Alpine == 1], mod$fitted.values)
lines(lowess(data$D[data$Alpine == 1], mod$fitted.values), lty = 2)
abline(a = 0, b = 1)

This time its still slightly biased at the low end but not elsewhere
although we have paid a price for this by only looking at the 40 Alpine
points (out of 254 points).



On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 I guess I should not have been so quick to make that conclusion since
 it seems that 74% of the values in the holdout set are FALSE so simply
 guessing FALSE for each one would give us 74% accuracy:

  table(DD[201:254])

 FALSE  TRUE
   4014

  40/54
 [1] 0.7407407


 On 8/9/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Perhaps you don't really need to predict the precise count.
  Maybe its good enough to predict whether the count is above
  or below average.  In that case the model is 74% correct on a
  holdout sample of the last 54 points based on a model of the
  first 200 points.
 
   # create model on first 200 and predict on rest
   DD - data$D  mean(data$D)
   mod - glm(DD ~., data[-1], family = binomial, subset = 1:200)
   tab - table(predict(mod, data[201:254,-1], type = resp)  .5, 
   DD[201:254])
   sum(tab * diag(2)) / sum(tab)
  [1] 0.7407407
 
 
  On 8/9/07, Matthew and Kim Bowser [EMAIL PROTECTED] wrote:
   Dear all,
  
   I am attempting to explain patterns of arthropod family richness
   (count data) using a regression model.  It seems to be able to do a
   pretty good job as an explanatory model (i.e. demonstrating
   relationships between dependent and independent variables), but it has
   systematic problems as a predictive model:  It is biased high at low
   observed values of family richness and biased low at high observed
   values of family richness (see attached pdf).  I have tried diverse
   kinds of reasonable regression models mostly as in Zeileis, et al.
   (2007), as well as transforming my variables, both with only small
   improvements.
  
   Do you have suggestions for making a model that would perform better
   as a predictive model?
  
   Thank you for your time.
  
   Sincerely,
  
   Matthew Bowser
  
   STEP student
   USFWS Kenai National Wildlife Refuge
   Soldotna, Alaska, USA
  
   M.Sc. student
   University of Alaska Fairbanks
   Fairbankse, Alaska, USA
  
   Reference
  
   Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
   count data in R. Technical Report 53, Department of Statistics and
   Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
   http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.
  
   Code
  
   `data` -
   structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
   9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
   12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
   1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
   5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
   5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
   7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
   10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
   3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
   16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
   4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
   6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
   0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
   2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
   159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
   175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
   161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
   165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
   165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
   175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
   167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
   178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
   173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
   170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
   170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
   162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
   166, 172, 174, 

Re: [R] odfWeave processing error, file specific

2007-08-09 Thread Kuhn, Max
Aric,

Can you send me a reproducible example (code and odt file) plus the
results if sessionInfo()?

Thanks,

Max

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Aric Gregson
Sent: Thursday, August 09, 2007 6:56 PM
To: r-help@stat.math.ethz.ch
Subject: [R] odfWeave processing error, file specific

Hello, 

I hope there is a simple explanation for this. I have been using
odfWeave with great satisfaction in R 2.5.0. Unfortunately, I cannot
get beyond the following error message with a particular file. I have
copied and pasted into new files and the same error pops up. It looks
like the error is occurring before any of the R code is run (?).

Any suggestions on how to track this down and fix it?

odfWeave('balf.odt', 'balfout.odt')
  Copying  balf.odt
  Setting wd to  /tmp/Rtmpz0aWPf/odfWeave09155238949
  Unzipping ODF file using unzip -o balf.odt
Archive:  balf.odt
 extracting: mimetype
   creating: Configurations2/statusbar/
  inflating: Configurations2/accelerator/current.xml
   creating: Configurations2/floater/
   creating: Configurations2/popupmenu/
   creating: Configurations2/progressbar/
   creating: Configurations2/menubar/
   creating: Configurations2/toolbar/
   creating: Configurations2/images/Bitmaps/
  inflating: layout-cache
  inflating: content.xml
  inflating: styles.xml
  inflating: meta.xml
  inflating: Thumbnails/thumbnail.png
  inflating: settings.xml
  inflating: META-INF/manifest.xml

  Removing  balf.odt
  Creating a Pictures directory

  Pre-processing the contents
Error: cc$parentId == parentId is not TRUE

Thanks,

aric

--
IMPORTANT WARNING:  This email (and any attachments) is\ onl...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Systematically biased count data regression model

2007-08-09 Thread Steven McKinney
Hi Matthew,

You may be experiencing the classic
'regression towards the mean' phenomenon,
in which case shrinkage estimation may help
with prediction (extremely low and high values
need to be shrunk back towards the mean)

Here's a reference that discusses the issue 
in a manner somewhat related to your situation,
and it has plenty of good references


Application of Shrinkage Techniques in Logistic Regression Analysis: A Case 
Study

E. W. Steyerberg

Statistica Neerlandica, 2001, vol. 55, issue 1, pages 76-88 




Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Matthew and Kim Bowser
Sent: Thu 8/9/2007 8:43 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Systematically biased count data regression model
 
Dear all,

I am attempting to explain patterns of arthropod family richness
(count data) using a regression model.  It seems to be able to do a
pretty good job as an explanatory model (i.e. demonstrating
relationships between dependent and independent variables), but it has
systematic problems as a predictive model:  It is biased high at low
observed values of family richness and biased low at high observed
values of family richness (see attached pdf).  I have tried diverse
kinds of reasonable regression models mostly as in Zeileis, et al.
(2007), as well as transforming my variables, both with only small
improvements.

Do you have suggestions for making a model that would perform better
as a predictive model?

Thank you for your time.

Sincerely,

Matthew Bowser

STEP student
USFWS Kenai National Wildlife Refuge
Soldotna, Alaska, USA

M.Sc. student
University of Alaska Fairbanks
Fairbankse, Alaska, USA

Reference

Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
count data in R. Technical Report 53, Department of Statistics and
Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

Code

`data` -
structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318,

[R] compute ROC curve?

2007-08-09 Thread gallon li
Hello,

i have continuous test results for dieased and nondiseased subjects, say X
and Y. Both are vectors of numbers.

is there any R function which can generate the step function of ROC curve
automatically?

Thanks!

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread Ron Crump
Jim,

 Does this do what you want?  It creates a new dataframe with those
 'mg' that have at least a certain number of observation.

Looks good. I also have an alternative solution which appears to work,
so I'll see which is quicker on the big data set in question.

My solution:

mgsize - as.data.frame(table(in$mg))
in2 - merge(in,mgsize,by.x=mg,by.y=Var1)
out - subset(in2, Freq  1, select= -Freq)

Thanks for your help.

Ron.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error message: image not found

2007-08-09 Thread Jungeun Song
I have a R version 2.4 and I installed R version 2.5(current version)  
on Mac OS X 10.4.10. I tried dyn.load to load a object code  
compiled from C source. I got the following error message:

Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library '/Users/jusong/Desktop/BPM/R/group.so':
   dlopen(/Users/jusong/Desktop/BPM/R/group.so, 6): Library not  
loaded: /Library/Frameworks/R.framework/Versions/2.4/Resources/lib/ 
libR.dylib
   Referenced from: /Users/jusong/Desktop/BPM/R/group.so
   Reason: image not found


How can I fix this problem?


Best,
Jungeun Song




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman
Here is an even faster way:

 # faster way
 x.mg.size - table(x$mg)  # count occurance
 x.mg.5 - names(x.mg.size)[x.mg.size  5]  # select greater than 5
 x.new1 - subset(x, x$mg %in% x.mg.5)  # use in the subset
 x.new1
   mg data
1   A1
4   A4
5   D5
6   D6
7   A7
8   D8
12  A   12
13  D   13
14  A   14
16  D   16
17  D   17
18  A   18
20  A   20


On 8/9/07, Ron Crump [EMAIL PROTECTED] wrote:
 Jim,

  Does this do what you want?  It creates a new dataframe with those
  'mg' that have at least a certain number of observation.

 Looks good. I also have an alternative solution which appears to work,
 so I'll see which is quicker on the big data set in question.

 My solution:

 mgsize - as.data.frame(table(in$mg))
 in2 - merge(in,mgsize,by.x=mg,by.y=Var1)
 out - subset(in2, Freq  1, select= -Freq)

 Thanks for your help.

 Ron.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tukey HSD

2007-08-09 Thread Richard M. Heiberger
Please see the R-help message
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/105165.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] small sample techniques

2007-08-09 Thread Daniel Nordlund
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Nair, 
 Murlidharan T
 Sent: Thursday, August 09, 2007 12:02 PM
 To: Nordlund, Dan (DSHS/RDA); r-help@stat.math.ethz.ch
 Subject: Re: [R] small sample techniques
 
 n=300
 30% taking A relief from pain
 23% taking B relief from pain
 Question; If there is no difference are we likely to get a 7% 
 difference?
 
 Hypothesis
 H0: p1-p2=0
 H1: p1-p2!=0 (not equal to)
 
 1Weighed average of two sample proportion
 300(0.30)+300(0.23)
 --- = 0.265
   300+300
 2Std Error estimate of the difference between two 
 independent proportions
   sqrt((0.265 *0.735)*((1/300)+(1/300))) = 0.03603
 
 3Evaluation of the difference between sample proportion as a 
 deviation from the hypothesized difference of zero
  ((0.30-0.23)-(0))/0.03603 = 1.94
 
 
 z did not approach 1.96 hence H0 is not rejected. 
 
 This is what I was trying to do using prop.test. 
 
 prop.test(c(30,23),c(300,300)) 
 
 What function should I use? 
 
 

I sent this from work but it seems to have disappeared into the luminiferous 
ether.

The proportion test above indicates that p1=0.1 and p2=0.0767.  But in your 
t-test you specify p1=0.3 and p2=0.23.  Which is correct?  If p1=0.3 and 
p2=0.23, then use

prop.test(c(.30*300,.23*300),c(300,300))

Hope this is helpful,

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.