Re: [R] Awk and Vilno

2007-06-13 Thread Tim Churches
Rogerio Porto wrote:
 Hey,
 
 What we should really compare is the four situations:
 R alone
 R + awk
 R + vilno
 R + awk + vilno
 and maybe R + SAS Data step
 and see what scripts are more  elegant (read 'short and understandable')

I don't think that short and understandable necessarily go hand-in-hand.
Sometimes longer scripts which are more explicit and use less tricky
syntax shortcuts are much easier to understand a year or two later. Ease
and speed of script writing (taking into account learning curve and time
taken to consult scripting language documentation) are important, as is
the ability to re-visit scripts or examine someone else's script and be
able to work out what it does and how it works is vital, and speed of
execution also counts with large datasets. Also ubiquity of the tool,
whether it is freely available on many platforms, either pre-installed
or in an easy-to-install form are also considerations.

 what do you guys think of creating a R-wiki page for syntax
 comparisons among the various options to enhance R use?
 
 I already have two sugestions:
 
 1) syntax examples for using R and other tools to manipulate
 and analyze large datasets (with a concise description of the
 datasets);
 
 2) syntax examples for using R and other tools (or R alone) to clean
 and prepare datasets (simple and very small datasets, for didatic
 purposes).

The ability of the tools to scale to large or very large datasets is
also a consideration, as is their speed when dealing with such large data.

 I think this could be interesting for R users and to promote other
 software tools, since it seems there is a lot of R users that use
 other tools also.
 
 Besides that, questions on those two above subjects are prevalent
 at this list. Thus a wiki page seems to be the right place to discuss
 and teach this to other users.
 
 What do you think?

Yes, happy to contribute R + Python examples to such wiki pages. Please
post the URL.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate similar to SPSS

2007-04-25 Thread Tim Churches
Andrew Robinson [EMAIL PROTECTED] wrote:
 can I suggest, without offending, that you purchase and read Peter
 Dalgaard's Introductory Statistics with R or Michael Crawley's
 Statistics: An Introduction using R or Venables and Ripley's Modern
 Applied Statistics with S or Maindonald and Braun's Data Analysis
 and Graphics Using R: An Example-based Approach,
 or download and read An Introduction to R 
 http://cran.r-project.org/doc/manuals/R-intro.pdf
 or one of the numerous contributed documents at
 http://cran.r-project.org/other-docs.html

For Natalie, who is an SPSS user, may I strongly recommend R FOR SAS AND SPSS 
USERS by Bob Muenchen at http://oit.utk.edu/scc/RforSASSPSSusers.pdf

This is a really, really excellent document which has proven to be an 
invaluable resource in introducing my SAS and SPSS using collegaues tot he 
delights or R.

And it is free (as in available at no cost).

Tim C

 On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote:
  Hi,
  
  Does anyone know if: with R can you take a set of numbers and 
 aggregate
  them like you can in SPSS? For example, could you calculate the 
 percentage
  of people who smoke based on a dataset like the following:
  
  smoke = 1
  non-smoke = 2
  
  variable
  1
  1
  1
  2
  2
  1
  1
  1
  2
  2
  2
  2
  2
  2
  
  
  When aggregated, SPSS can tell you what percentage of persons are 
 smokers
  based on the frequency of 1's and 2's. Can R statistical package do a
  similar thing?
  
  Thanks,
  
  Nat
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Andrew Robinson  
 Department of Mathematics and StatisticsTel: +61-3-8344-9763
 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
 http://www.ms.unimelb.edu.au/~andrewpr
 http://blogs.mbs.edu/fishing-in-the-bay/
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reduced Error Logistic Regression, and R?

2007-04-25 Thread Tim Churches
This news item in a data mining newsletter makes various claims for a technique 
called Reduced Error Logistic Regression: 
http://www.kdnuggets.com/news/2007/n08/12i.html

In brief, are these (ambitious) claims justified and if so, has this technique 
been implemented in R (or does anyone have any plans to do so)? 

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sas.get problem

2007-04-11 Thread Tim Churches
John Kane wrote:
 How do I make this change? I naively have tried by
 a) list sas.get and copy to editor
 b) reload R without loading Hmisc
 c) made recommended changes to sas.get
 d) stuck a sas.get -  in front of the function and
 ran it. 

Here is what I do, until Frank fixes the problem in the Hmisc package
itself:

a) list sas.get and copy to editor
b) make the change to line 127 as described
c) preface the function with sas.get - 
d) save that as sas_get_fixed.R
e) reload R and load Hmisc
f) source(sas_get_fixed.R)

The final step will mask the original, broken sas.get function with the
fixed version.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sas.get problem

2007-04-10 Thread Tim Churches
John Kane wrote:
 I  have 3 SAS files all in the directory F:/sas, two
 data files
 and a format file :
 form.ea1.sas7bdat
 form.ea2.sas7bdat
 sas.fmts.sas7bdat
 
 F is a USB.
 
 I am trying import them to R using sas.get.
 
 I have not used SAS since I was downloading data from 
 mainframe
 and having to write JCL.  I had forgotten how bizarre
 SAS can be.
 I currently have not even figured out how to load the
 files into SAS but
 they look fine since I can import them with no problem
 into SPSS.
 
 I am using R2.4.1 under Windows XP
 SAS files were created with SAS 9.x
 They convert easily into SPSS 14
 
 I
 n the example below I have tried various versions of
 the file names with
 with no luck.
 Can anyone suggest some approach(s) that I might take.
 
 Example.
 
 library(Hmisc)
 mydata - sas.get(library=F:/sas, mem=form.ea1,
  format.library=sas.fmts.sas7bdat,
sasprog = 'C:Program Files/SAS/SAS
 9.1/sas.exe')
 
 Error message  (one of several that I have gotten
 while trying various things.)
 The filename, directory name, or volume label syntax
 is incorrect.
 Error in sas.get(library = F:/sas, mem = form.ea1,
 format.library = sas.fmts.sas7bdat,  :
 SAS job failed with status 1
 In addition: Warning messages:
 1: sas.fmts.sas7bdat/formats.sc? or formats.sas7bcat 
 not found. Formatting ignored.
  in: sas.get(library = F:/sas, mem = form.ea1,
 format.library = sas.fmts.sas7bdat,
 2: 'cmd' execution failed with error code 1 in:
 shell(cmd, wait = TRUE, intern = output)

The sas.get function in the Hmisc library is broken under Windows.

Change line 127 from:

status - sys(paste(shQuote(sasprog), shQuote(sasin), -log,
shQuote(log.file)), output = FALSE)

to:

status - system(paste(shQuote(sasprog), shQuote(sasin), -log,
shQuote(log.file)))

I found this fix in the R-help archives, sorry, don't have the original
to hand so I can't give proper attribution, but the fix is not due to
me. But it does work for me. I believe Frank Harrell has been notified
of the problem and the fix. Once patched and working correctly, the
sas.get function in the Hmisc library is fantastic - thanks Frank!

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Datamining-package rattle() Errors

2007-02-27 Thread Tim Churches
j.joshua thomas wrote:
 Dear Group
 
 I have few errors while installing package rattle from CRAN
 
 i do the installing from the local zip files...
 
  I am using R 2.4.0 do i have to upgrade to R2.4.1 ?

You *do* have to read the r-help posting guide and take exact heed of
what it suggests: http://www.r-project.org/posting-guide.html

Tim C

 ~~
 
 utils:::menuInstallLocal()
 package 'rattle' successfully unpacked and MD5 sums checked
 updating HTML package descriptions
 help(rattle)
 No documentation for 'rattle' in specified packages and libraries:
 you could try 'help.search(rattle)'
 library(rattle)
 Rattle, Graphical interface for data mining using R, Version 2.2.0.
 Copyright (C) 2006 [EMAIL PROTECTED], GPL
 Type rattle() to shake, rattle, and roll your data.
 Warning message:
 package 'rattle' was built under R version 2.4.1
 rattle()
 Error in rattle() : could not find function gladeXMLNew
 In addition: Warning message:
 there is no package called 'RGtk2' in: library(package, lib.loc = lib.loc,
 character.only = TRUE, logical = TRUE,
 local({pkg - select.list(sort(.packages(all.available = TRUE)))
 + if(nchar(pkg)) library(pkg, character.only=TRUE)})
 update.packages(ask='graphics')
 
 
 On 2/28/07, Roberto Perdisci [EMAIL PROTECTED] wrote:
 Hi,
 out of curiosity, what is the name of the package you found?

 Roberto

 On 2/27/07, j.joshua thomas [EMAIL PROTECTED] wrote:
 Dear Group,

 I have found the package.

 Thanks very much


 JJ
 ---


 On 2/28/07, j.joshua thomas [EMAIL PROTECTED] wrote:

 I couldn't locate package rattle?  Need some one's help.


 JJ
 ---



 On 2/28/07, Daniel Nordlund [EMAIL PROTECTED] wrote:
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:
 [EMAIL PROTECTED]
 On Behalf Of j.joshua thomas
 Sent: Tuesday, February 27, 2007 5:52 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Datamining-package-?

 Hi again,
 The idea of preprocessing is mainly based on the need to prepare
 the
 data
 before they are actually used in pattern extraction.or feed the
 data
 into EA's (Genetic Algorithm) There are no standard practice yet
 however,
 the frequently used on are

 1. the extraction of derived attributes that is quantities that
 accompany
 but not directly related to the data patterns and may prove
 meaningful
 or
 increase the understanding of the patterns

 2. the removal of some existing attributes that should be of no
 concern to
 the mining process and its insignificance

 So i looking for a package that can do this two above mentioned
 points
 Initially i would like to visualize the data into pattern and
 understand the
 patterns.


 snip

 Joshua,

 You might take a look at the package rattle on CRAN for initially
 looking at your data and doing some basic data mining.

 Hope this is helpful,

 Dan

 Daniel Nordlund
 Bothell, WA, USA

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Lecturer J. Joshua Thomas
 KDU College Penang Campus
 Research Student,
 University Sains Malaysia



 --
 Lecturer J. Joshua Thomas
 KDU College Penang Campus
 Research Student,
 University Sains Malaysia

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RPy and the evil underscore

2007-02-25 Thread Tim Churches
Alberto Vieira Ferreira Monteiro wrote:
 I seems like I will join two threads :-)

Please address RPy-specific questions to the Rpy mailing list, where
they will be answered swiftly and without annoyance to everyone else on
this general r-help mailing list.

 Ok, RPy was installed (in Fedora Core 4, yum -y install rpy), and it 
 is running. However, I have a doubt, and the (meagre) documentation
 doesn't seem to address it.

 In python, when I do this:
 
 import rpy
 rpy.r.setwd(/mypath)
 rpy.r.source(myfile.r)
 
 Everything happens as expected. But now, there's
 a problem if I try to use a function in myfile:
 
 x = my_function(1)
 x = r.my_function(1)
 x = rpy.my_function(1)
 x = rpy.r.my_function(1)
 
 None of them work: the problem is that the _ is mistreated.
 If the function has . instead of _, it works:
 
 x = rpy.r.my_function(1)
 
 This is weird: I must write the R soutine with a ., but then
 rpy translates it to _!

Object identifiers cannot begin with an underscore in R, but they can in
Python. To avoid having to confusingly special-case this difference, the
RPy designers elected to translate underscores in Python object names to
dots in R object names.

All this is clearly documented in the RPy manual at
http://rpy.sourceforge.net/rpy/doc/rpy_html/R-objects-look-up.html#R-objects-look-up

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Google, hard disc drives and R

2007-02-18 Thread Tim Churches
A recent paper from Google Labs, interesting in many respects, not the
least the exclusive use of R for data analysis and graphics (alas not
cited in the approved manner):

http://labs.google.com/papers/disk_failures.pdf

Perhaps some of the eminences grises of the R Foundation could prevail
upon Google to make some the data reported in the paper available for
inclusion in an R library or two, for pedagogical purposes?

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up or avoid the for-loops in this example?

2007-02-14 Thread Tim Churches
Any advice, tips, clues or pointers to resources on how best to speed up
or, better still, avoid the loops in the following example code much
appreciated. My actual dataset has several tens of thousands of rows and
lots of columns, and these loops take a rather long time to run.
Everything else which I need to do is done using vectors and those parts
all run very quickly indeed. I spent quite a while doing searches on
r-help and re-reading the various manuals, but couldn't find any
existing relevant advice. I am sure the solution is obvious, but it
escapes me.

Tim C

# create an example data frame, multiple events per subject

year - c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983)
event.of.interest - c(F,T,T,F,F,F,T,F,T,T,F)
subject - c(1,1,1,2,2,3,3,3,3,4,4)
df - data.frame(cbind(subject,year,event.of.interest))

# add a per-subject sequence number

df$subject.seq - 1
for (i in 2:nrow(df)) {
 if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] -
df$subject.seq[i-1] + 1
}
df

# add an event sequence number which is zero until the first
# event of interest for that subject happens, and then increments
# thereafter

df$event.seq - 0
for (i in 1:nrow(df)) {
 if (df$subject.seq[i] == 1 ) {
current.event.seq - 0
 }
 if (event.of.interest[i] == 1 | current.event.seq  0)
current.event.seq - current.event.seq + 1
 df$event.seq[i] - current.event.seq
}
df

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up or avoid the for-loops in this example?

2007-02-14 Thread Tim Churches
jim holtman wrote:
 On 2/14/07, Tim Churches [EMAIL PROTECTED] wrote:
 Any advice, tips, clues or pointers to resources on how best to speed up
 or, better still, avoid the loops in the following example code much
 appreciated. My actual dataset has several tens of thousands of rows and
 lots of columns, and these loops take a rather long time to run.
 Everything else which I need to do is done using vectors and those parts
 all run very quickly indeed. I spent quite a while doing searches on
 r-help and re-reading the various manuals, but couldn't find any
 existing relevant advice. I am sure the solution is obvious, but it
 escapes me.

 Tim C

 # create an example data frame, multiple events per subject

 year - c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983)
 event.of.interest - c(F,T,T,F,F,F,T,F,T,T,F)
 subject - c(1,1,1,2,2,3,3,3,3,4,4)
 df - data.frame(cbind(subject,year,event.of.interest))

 # add a per-subject sequence number

 df$subject.seq - 1
 for (i in 2:nrow(df)) {
 if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] -
 df$subject.seq[i-1] + 1
 }
 df
 
 # add an event sequence number which is zero until the first
 # event of interest for that subject happens, and then increments
 # thereafter

 df$event.seq - 0
 for (i in 1:nrow(df)) {
 if (df$subject.seq[i] == 1 ) {
current.event.seq - 0
 }
 if (event.of.interest[i] == 1 | current.event.seq  0)
 current.event.seq - current.event.seq + 1
 df$event.seq[i] - current.event.seq
 }
 df
 
 
 
 try:
 
 df - data.frame(cbind(subject,year,event.of.interest))
 df - do.call(rbind,by(df, df$subject, function(z){z$subject.seq -
 seq(nrow(z)); z}))
 df
  subject year event.of.interest subject.seq
 1.11 1980 0   1
 1.21 1982 1   2
 1.31 1996 1   3
 2.42 1985 0   1
 2.52 1987 0   2
 3.63 1990 0   1
 3.73 1991 1   2
 3.83 1992 0   3
 3.93 1999 1   4
 4.10   4 1972 1   1
 4.11   4 1983 0   2
 # determine first event
 df - do.call(rbind, by(df, df$subject, function(x){
 + # determine first event
 + .first - cumsum(x$event.of.interest)
 + # create sequence after first non-zero
 + .first - cumsum(.first  0)
 + x$event.seq - .first
 + x
 + }))
 df
subject year event.of.interest subject.seq event.seq
 1.1.11 1980 0   1 0
 1.1.21 1982 1   2 1
 1.1.31 1996 1   3 2
 2.2.42 1985 0   1 0
 2.2.52 1987 0   2 0
 3.3.63 1990 0   1 0
 3.3.73 1991 1   2 1
 3.3.83 1992 0   3 2
 3.3.93 1999 1   4 3
 4.4.10   4 1972 1   1 1
 4.4.11   4 1983 0   2 2

Thanks Jim, that works a treat, over an order of magnitude faster than
the for-loops.

Anders Nielsen also provided this solution:

  df$subject.seq-unlist(tapply(df$subject,
  df$subject,
  function(x)1:length(x)
   )
)

Doing it that way is about 5 times faster than using rbind(). But Jim's
use of cumsum on the logical vector is very nifty.

I have now combined Jim's function with Anders' column-oriented approach
and the result is that my code now runs about two orders of magnitude
faster.

Many thanks,

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up or avoid the for-loops in this example?

2007-02-14 Thread Tim Churches
Marc Schwartz wrote:
 OK, here is one possible solution, though perhaps with a bit more time,
 there may be more optimal approaches. 
 
 Using your example data above, but first noting that you do not want to
 use:
 
   df - data.frame(cbind(subject,year,event.of.interest))
 
 Using cbind() first, creates a matrix and causes all columns to be
 coerced to a common data type, obviating the benefit of data frames to
 be able to handle multiple data types. 

Yes, quite right, the cbind() was unnecessary. I'm not making my real
data frame that way, however.

 So, now on to the solution:
 
 # First, order the data frame by increasing order of
 # subject number and decreasing order for event.of.interest
 # This ensures that these columns are properly sorted
 # to facilitate the subsequent code. 
 
 df - df[order(df$subject, -df$event.of.interest), ]
 
 
 So, 'df' will look like:
 
 df
subject year event.of.interest
 21 1982  TRUE
 31 1996  TRUE
 11 1980 FALSE
 42 1985 FALSE
 52 1987 FALSE
 73 1991  TRUE
 93 1999  TRUE
 63 1990 FALSE
 83 1992 FALSE
 10   4 1972  TRUE
 11   4 1983 FALSE
 
 
 # Now use the combinations of sapply(), rle(), seq() and unlist() to
 # generate per subject sequences. Note that rle() returns:
 #
 #  rle(df$subject)
 # Run Length Encoding
 #   lengths: int [1:4] 3 2 4 2
 #   values : num [1:4] 1 2 3 4
 #
 # See ?rle, ?seq, ?sapply and ?unlist
 
 df$subject.seq - unlist(sapply(rle(df$subject)$lengths, 
 function(x) seq(x)))
 
 
 So, 'df' now looks like:
 
 df
subject year event.of.interest subject.seq
 21 1982  TRUE   1
 31 1996  TRUE   2
 11 1980 FALSE   3
 42 1985 FALSE   1
 52 1987 FALSE   2
 73 1991  TRUE   1
 93 1999  TRUE   2
 63 1990 FALSE   3
 83 1992 FALSE   4
 10   4 1972  TRUE   1
 11   4 1983 FALSE   2
 
 
 # Now set event.seq to all 0's
 
 df$event.seq - 0
 
 
 So, 'df' now looks like:
 
 df
subject year event.of.interest subject.seq event.seq
 21 1982  TRUE   1 0
 31 1996  TRUE   2 0
 11 1980 FALSE   3 0
 42 1985 FALSE   1 0
 52 1987 FALSE   2 0
 73 1991  TRUE   1 0
 93 1999  TRUE   2 0
 63 1990 FALSE   3 0
 83 1992 FALSE   4 0
 10   4 1972  TRUE   1 0
 11   4 1983 FALSE   2 0
 
 
 # Get the unique subject id's
 # See ?unique
 
 subj.id - unique(df$subject)
 
 
 # Now get the indices for each subject where event.of.interest
 # is TRUE.  See ?which
 
 events - sapply(subj.id, 
  function(x) which(df$subject == x  df$event.of.interest))
 
 
 So, 'events' looks like:
 
 events
 [[1]]
 [1] 1 2
 
 [[2]]
 integer(0)
 
 [[3]]
 [1] 6 7
 
 [[4]]
 [1] 10
 
 
 # Now use sapply() on the above list to create
 # individual sequences per list element:
 
 seq - sapply(events, function(x) seq(along = x))
 
 
 So 'seq' looks like:
 
 seq
 [[1]]
 [1] 1 2
 
 [[2]]
 integer(0)
 
 [[3]]
 [1] 1 2
 
 [[4]]
 [1] 1
 
 
 # So, for the final step, assign the event sequence values in 'seq' to
 # the row indices in 'events':
 
 df$event.seq[unlist(events)] - unlist(seq)
 
 
 So, 'df' now looks like this:
 
 df
subject year event.of.interest subject.seq event.seq
 21 1982  TRUE   1 1
 31 1996  TRUE   2 2
 11 1980 FALSE   3 0
 42 1985 FALSE   1 0
 52 1987 FALSE   2 0
 73 1991  TRUE   1 1
 93 1999  TRUE   2 2
 63 1990 FALSE   3 0
 83 1992 FALSE   4 0
 10   4 1972  TRUE   1 1
 11   4 1983 FALSE   2 0
 
 
 HTH,
 
 Marc SChwartz

Wow, that's very trick, or tricky. It works but it is a bit slower and
more complex than the Holtzman/Nielsen approach. But some interesting
ides there which I shall bear in mind.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Adding error bars to a trellis barchart display

2006-05-13 Thread Tim Churches
Chris Bergstresser wrote:
 Hi all --
 
I'm using trellis to generate bar charts, but there's no built-in
 function to generate error bars or confidence intervals, as far as I
 can tell.  I assumed I could just write my own panel function to add
 them, so I searched the archive, and found a posting from the author
 of the package stating ... placing multiple bars side by side needs
 specialized calculations, which are done within panel.barchart. To add
 bars to these, you will need to reproduce those calculations.
Just so I'm clear on this -- there's no capacity to add bars to the
 plot, nor to find out the coordinates of the bars in the graphs
 themselves.  If you want them, you have to completely rewrite
 panel.barchart.  Is this correct?  Are there really so few people
 using error bars with bar charts?

One of our projects does confidence intervals on bar cahrts produced
using teh lattice library. It is quite feasible without too much effort
- see:
http://members.optusnet.com.au/tchur/NetEpi-Analysis-0-8-Screenshot-5.png

Sorry I don't have time to extract the code which does this right now,
but you can dissect it out yourself from the NetEpi-Analysis-0.8 tarball
at http://sourceforge.net/project/showfiles.php?group_id=123700 -
although the R code is embedded in Python classes, which might make
extrication a bit more difficult (and which is why I don't have time to
do it right now). But from memory the chunk of R code which overrides
the default panel function is fairly self-contained and you should be
able to identify it fairly easily - just grep the source code for likely
strings such as panel.barchart to discover where it is.

Other screenshots can be downloaded from
http://sourceforge.net/project/showfiles.php?group_id=123700 if anyone
is interested.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using GDD fonts

2006-04-12 Thread Tim Churches
Luiz Rodrigo Tozzi wrote:
 Hi
 
 I was searching for some X replacement for my job in R and i found the GDD
 
 I installed it and I match all the system requirements. My problem
 (maybe a dumb one) is that every plot comes with no font and i cant
 find a simgle example of a plot WITH FONT DETAIL in the list
 
 can anybody help me?
 
 a simple example:
 
 library(GDD)
 GDD(something.png, type=png, width = 700, height = 500)
 par(cex.axis=0.65,lab=c(12,12,0),mar=c(2.5, 2.5, 2.5, 2.5))
 plot(rnorm(100))
 mtext(Something,side=3,padj=-0.33,cex=1)
 dev.off()
 
 thanks in advance!

This might help - we found that we needed to install the MS TT fonts and
make sure that GDD can find them, as per the README. :

Simon Urbanek [EMAIL PROTECTED] wrote:
 Tim,

 On Jun 9, 2005, at 3:51 AM, Tim CHURCHES wrote:

 I tried GDD 0.1-7 with Lattice graphs in R 2.1.0 (on Linux). It
 doesn't segfault now but it is still not producing any usable output
 - the output png file is produced but nly with a few lines on it.
 Still the alpha channel problem? Have you been able to produce any
 Lattice graphs with it?

 I know of no such problem, I tested a few lattice graphics and they
 worked. Can you, please, send me reproducible example and your output?
 Also send me, please output of
 library(GDD)
 .Call(gdd_look_up_font, NULL)

Sorry, my laziness. GDD was unable to find any fonts. After I installed
the MS TT fonts and set their location as per the GDD README, it worked
perfectly with both old-style R graphics and lattice graphics. The
output looks very nice indeed. We'll do a bit more testing (and let you
know if we find any problems), but it looks like we can at last drop the
requirement for Xvfb when using R in a Web application. Great work! From
our point of view, GDD solves one the biggest problem with R for Web
applications.

Cheers,

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Histogram over a Large Data Set (DB): How?

2005-11-18 Thread Tim Churches
Eric Eide wrote:
 Sean == Sean Davis [EMAIL PROTECTED] writes:
 
   Sean Have you tried just grabbing the whole column using dbGetQuery?
   Sean Try doing this:
   Sean 
   Sean spams - dbGetQuery(con,select unixtime from email limit
   Sean 100)
   Sean 
   Sean Then increase from 1,000,000 to 1.5 million, to 2 million, etc.
   Sean until you break something (run out of memory), if you do at all.
 
 Yes, you are right.  For the example problem that I posed, R can indeed 
 process
 the entire query result in memory.  (The R process grows to 240MB, though!)
 
   Sean However, the BETTER way to do this, if you already have the data
   Sean in the database is to allow the database to do the histogram for
   Sean you.  For example, to get a count of spams by day, in MySQL do
   Sean something like: [...]
 
 Yes, again you are right --- the particular problem that I posed is probably
 better handled by formulating a more sophisticated SQL query.
 
 But really, my goal isn't to solve the the example problem that I posed ---
 rather, it is to understand how people use R to process very large data sets.
 The research project that I'm working on will eventually need to deal with
 query results that cannot fit in main memory, and for which the built-in
 statistical facilities of most DBMSs will be insufficient.
 
 Some of my colleagues have previously written their analyses by hand, using
 various scripting languages to read and process records from a DB in chunks.
 Writing things in this way, however, can be tedious and error-prone.  Instead
 of taking this approach, I would like to be able to use existing statistics
 packages that have the ability to deal with large datasets in good ways.
 
 So, I seek to understand the ways that people deal with these sorts of
 situations in R.  Your advice is very helpful --- one should solve problems in
 the simplest ways available! --- but I would still like to understand the
 harder cases, and how one can use general R functions in combination with
 DBI's `dbApply' and `fetch' interfaces, which divide results into chunks.

You might be interested in our project: NetEpi Analysis, which aims to
provide interactive exploratory data analysis and basic epidemiological
analysis via both a Web front end and a Python programmatic API (forgive
the redundancy in programmatic API) for datests up to around 30
million rows (and as many columns as you like) on 32 bit platforms -
hundreds of millions of rows should be feasible on 64-bit platforms. It
stores data column-wise in memory-mapped on-disc arrays, and uses set
operations on ordinal indexes to permit rapid subsetting and
cross-tabulation of categorical (factored) data. It is written in
Python, but uses R for graphics and some (but not all) statistical
calculations (and for model fitting when we get round to providing
facilities for same).

See http://www.netepi.org - still in alpha, with an update coming out by
December. Although it is aimed at epidemiological analysis (of large
administrative health datasets), I dare say it might be useful for
exploring large databases of spam too.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] mid-p CIs for common odds ratio

2005-10-19 Thread Tim Churches
mantelhaen.test() gives the exact conditional p-value (for independence) and 
confidence intervals (CIs)for the common odds ratio for a stratified 2x2 table. 
The epitools package by Tomas Aragon (available via CRAN) contains functions 
which use fisher.test() to calculate mid-p exact p-values and CIs for the CMLE 
odds ratio for a single 2x2 table. The mid-p p-value for independence for a 
stratified 2x2 table is easy to calculate using mantelhaen.test(), but can 
anyone suggest a method for calculation of mid-p CIs for the common odds ratio? 
A search in the usual places draws a blank (but I am sure someone will 
immediately prove me wrong on that point...). Thanks to Andy Dean (of Epi-Info 
fame), I have a copy of public domain Pascal code from 1991 by David Martin and 
Harland Austin which calculates mid-p CIs for the common odds ratio by finding 
polynomial roots. Before trying to replicate that code in R (or C), I was 
wondering if anyone could suggest a better or easier way?

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Tim Churches
MJ Price, Social Medicine wrote:
 I have been trying to find a function to calculate the Breslow-Day test for 
 homogeneity of the odds ratio in R. I know the test can be preformed in SAS 
 but i was wondering if anyone could help me to perform this in r.

I don't recall seeing the Breslow-Day test anywhere in an R package, but
the VCD package (available via CRAN) has a function called woolf_test()
to calculate Woolf's test for homogeneity of ORs.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Tim Churches
Marc Schwartz (via MN) wrote:

 There is also code for the Woolf test in ?mantelhaen.test

Is there? How is it obtained? The documentation on mantelhaen.test in R
2.2.0 contains a note: Currently, no inference on homogeneity of the
odds ratios is performed. and a quick scan of the source code for the
function didn't reveal any meantion of Woolf's test.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Tim Churches
Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
 
 On Wed, 2005-10-19 at 06:47 +1000, Tim Churches wrote:
  Marc Schwartz (via MN) wrote:
  
   There is also code for the Woolf test in ?mantelhaen.test
  
  Is there? How is it obtained? The documentation on mantelhaen.test in 
 R
  2.2.0 contains a note: Currently, no inference on homogeneity of the
  odds ratios is performed. and a quick scan of the source code for the
  function didn't reveal any meantion of Woolf's test.
  
  Tim C
 
 
 Review the code in the examples on the cited help page...
 
 :-)

OK, I see it now, thanks.

Tim C

 
 HTH,
 
 Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] running JPEG device on R 1.9.1 using xvfb-run on Linux

2005-10-13 Thread Tim Churches
Prof Brian Ripley wrote:
 On Wed, 12 Oct 2005, David Zhao wrote:
 
 
Does anybody have experience in running jpeg device using xvfb-run on
linux? I've been having sporadic problem with: /usr/X11/bin/xvfb-run
/usr/bin/R --no-save  Rinput.txt, with error saying: error in X11
connection. Especially when I run it from a perl script.
 
 
 Not sure what `xvfb-run on Linux' is, as it is not on my Linux (FC3).
 If you Google it you will find a number of problems reported on Debian 
 lists.  Here I would suspect timing.
 
 What I do is to run Xvfb on screen 5 by
 
 Xvfb :5 
 setenv DISPLAY :5
 
 and do not have a problem with the jpeg() or png() devices.  I do have a 
 problem with the rgl() package, but then I often do on-screen (on both 32- 
 and even more so 64-bit FC3).

For R-embedded-in-Python (via RPy) on a Web server, we have been using a
Python programme to automatically start Xvfb if it is not already
running. You can find a copy of the programme in the NetEpi-Analysis
tarball available at
http://sourceforge.net/project/showfiles.php?group_id=123700

The tricky bit is managing the permissions for the Xvfb session,
particularly in a Web server context - you need to take care. However,
this use of Xvfb has been perfectly reliable (on Red Hat Enterprise
Linux 2.1 and 3 with R2.0 and R 2.1)
 
Is there a better way of doing this? or how can I fix the problem.
 
 You really should update your R.

Yes. We now use GDD, which is an alternative R graphics driver for
raster graphics (Jpeg and PNG), available via CRAN. It allows R to
directly generate jpeg and png files on a Linux or Unix machine without
the need for an X server to be running (not even Xvfb). The quality of
the output is also better than the standard R X11/png/jpeg graphics
device due to the use of anti-aliased fonts by GDD. Earlier versions of
GDD were a bit buggy, but so far we have found the latest version
(0.1.7) to be fine. It is a bit fiddly to install all the libraries it
requires as well as  the recommended (no-cost) Microsoft TrueType fonts,
but the effort is worth it. Many thanks to Simon Urbanek for his work on
GDD.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Leading in line-wrapped Lattice value and panel labels

2005-09-06 Thread Tim Churches
Version 2.1.1
Platforms: all

What is the trellis parameter (or is there a trellis parameter) to set the 
leading (the gap between lines) when long axis values labels or panel header 
labels wrap over more than one line? By default, there is a huge gap between 
lines, and much looking and experimentation has not revealed to me a suitable 
parameter to adjust this.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] The Perils of PowerPoint

2005-09-02 Thread Tim Churches
(Ted Harding) wrote:

By the way, the Washington Post/Minneapolis Star Tribune article is
somewhat reminiscent of a short (15 min) broadcast on BBC Radio 4
back on October 18 2004 15:45-16:00 called

  Microsoft Powerpoint and the Decline of Civilisation

which explores similar themes and also frequently quotes Tufte.
Unfortunately it lapsed for ever from Listen Again after the
statutory week, so I can't point you to a replay. (However, I
have carefully preserved the cassette recording I made).
  

Try http://sooper.org/misc/powerpoint.mp3 (copyright law notwithstanding...)

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Confidence interval bars on Lattice barchart with groups

2005-06-25 Thread Tim Churches
I am trying to add confidence (error) bars to lattice barcharts (and
dotplots, and xyplots). I found this helpful message from Deepayan
Sarkar and based teh code below on it:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/50299.html

However, I can't get it to work with groups, as illustrated. I am sure I
am missing something elementary, but I am unsure what.

Using R 2.1.1 on various platforms. I am aware of xYplot in the Hmisc
library but would prefer to avoid any dependency on a non-core R
library, if possible.

Tim C

##
# set up dummy test data
testdata - data.frame(
dsr=c(1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,
  2,3,4,5,6,7,8,9,10,11,3,4,5,6,7,8,9,10,11,12),
year=as.factor(c(1998,1998,1998,1998,1998,1998,1998,1998,1998,1998,
 1999,1999,1999,1999,1999,1999,1999,1999,1999,1999,
 2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
 2001,2001,2001,2001,2001,2001,2001,2001,2001,2001)),
geog_area=c('North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle'),
sex=c('Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female',
  'Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female',
  'Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female',
  'Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female'),
age=c('Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young',
  'Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young',
  'Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young',
  'Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young'))

# add dummy lower and upper confidence limits
testdata$dsr_ll - testdata$dsr - 0.7
testdata$dsr_ul - testdata$dsr + 0.5

# examine the test data
testdata

# check that a normal barchart with groups works OK - it does
barchart(geog_area ~ dsr | year, testdata, groups=sex, origin = 0)

# this works as expected, but not sure what teh error messages mean
with(testdata,barchart(geog_area ~ dsr | year + sex,
  origin = 0,
  dsr_ul = dsr_ul,
  dsr_ll = dsr_ll,
  panel = function(x, y, ..., dsr_ll, dsr_ul, subscripts) {
  panel.barchart(x, y, subscripts, ...)
  dsr_ll - dsr_ll[subscripts]
  dsr_ul - dsr_ul[subscripts]
  panel.segments(dsr_ll,
 as.numeric(y),
 dsr_ul,
 as.numeric(y),
 col = 'red', lwd = 2)}
  ))

# no idea what I am doing wrong here, but there is not one bar per
group... something
# to do with panel.groups???
with(testdata,barchart(geog_area ~ dsr | year, groups=sex,
  origin = 0,
  dsr_ul = dsr_ul,
  dsr_ll = dsr_ll,
  panel = function(x, y, ..., dsr_ll, dsr_ul, subscripts,
groups) {
  panel.barchart(x, y, subscripts, groups, ...)
  dsr_ll - dsr_ll[subscripts]
  dsr_ul - dsr_ul[subscripts]
  panel.segments(dsr_ll,
 as.numeric(y),
 dsr_ul,
 as.numeric(y),
 col = 'red', lwd = 2)}
  ))
##

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Runnning R remotely

2005-02-02 Thread Tim Churches
Laura Quinn wrote:
I wasn't aware that it was possible to use postscript in the same fashion
as png, eg:
png(file,width=x,height=y,)
image(map)
text(text)
title(title)
box()
dev.off()
As there are a large number of iterations png has been working nicely
(when not working remotely!), especially as it has proven easy to convery
into gifs and then into movie gifs. Could anyone suggest an alternative
approach in this case?
 

Start an Xvfb (X11 virtual frame buffer) server in your remote ssh 
session. R will then use that as an X11 device to produce the PNG 
output. If you are running in a hostile network environment, consider 
using authentication and/or switching off network access to the Xvfb 
session - see the man pages for Xvfb. Xvfb is installed by default on 
most recent Linux distributions - if not, there should be an installable 
package available for it for your flavour of Linux.

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting with Statistics::R, Perl/R

2005-01-24 Thread Tim Churches
Peter Dalgaard wrote:
d) Use bitmap(). It requires a working Ghostscript install, but is
otherwise much more convenient. Newer versions of Ghostscript have
some quite decent antialiasing built into some of the png devices.
Currently you need a small hack to pass the extra options to
Ghostscript -- we should probably add a gsOptions argument in due
course. This works for me on FC3 (Ghostscript 7.07):
mybitmap(file=foo.png, type=png16m, gsOptions= -dTextAlphaBits=4
-dGraphicsAlphaBits=4 )
where mybitmap() is a modified bitmap() that just sticks the options
into the command line. There are definitely better ways...
[The antialiasing is not quite perfect. In particular, the axes stand
out from the box around plots, presumably because an additive model is
used (so that if you draw a line on top of itself, the result becomes
darker). Also, text gets a little muddy at the default 9pt @ 72dpi, so
you probably want to increase the pointsize or the resolution.]
 

Apart from the significant quality issues which you mention, the other 
problem with using bitmap() in a Web server environment is the speed 
issue - it takes much longer to produce the output. Whether it takes too 
long depends on the users of your Web application, and how many 
simultaneous users there are. However, most users are more worried by 
the poor quality of the fonts in output produced by bitmap().

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting with Statistics::R, Perl/R

2005-01-21 Thread Tim Churches
Dirk Eddelbuettel wrote:
Plotting certain formats requires the X11 server to be present as the font
metrics for those formats can be supplied only the X11 server. Other drivers
don;t the font metrics from X11 -- I think pdf is a good counterexample.
When you run in 'batch' via a Perl script, you don't have the X11 server --
even though it may be on the machine and running, it is not associated with
the particular session running your Perl job.  There are two common fixes:
a) if you must have png() as a format, you can start a virtual X11 server
  with the xvfb server -- this is a bit involved, but doable;
 

An example of a Python programme which manages the starting of an Xvfb 
server when one is required can be found in the xvfb_spawn.py file 
/SOOMv0 directory of the tarball for NetEpi Analysis, which can be 
downloaded by following the links at http://www.netepi.org

xvfb_spawn.py was written for use with RPy, which is a Python-to-R 
bridge, when used in a Web server setting (hence no X11 display server 
available). It should be possible to translate the programme to Perl, or 
to write somethig similar in Perl. Comments in the code note some 
potential security traps for the unwary.

Hopefully one day the dependency of the R raster graphics devices on an 
X11 server will be removed. R on Win32 doesn't have that dependency (but 
then, Windows machines, even servers, have displays running all the time 
as part of their kernel, and who would wish that on other operating 
system?). However, there are several graphics back-ends which produce 
very high quality raster graphics on POSIX platforms without the need 
for an X11 device to be present - Agg (Anti-grain geometry, see 
http://www.antigrain.com/) and Cairo (see http://cairographics.org/) 
spring to mind (usually disclaimers about the foregoing comments not 
meaning to seem like ingratitude to the R development team etc apply).

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting with Statistics::R, Perl/R

2005-01-21 Thread Tim Churches
Joe Conway wrote:
Dirk Eddelbuettel wrote:
On Fri, Jan 21, 2005 at 06:06:45PM -0800, Leah Barrera wrote:
I am trying to plot in R from a perl script using the Statistics::R
 package as my bridge.  The following are the conditions:
0. I am running from a Linux server.
Plotting certain formats requires the X11 server to be present as the
font metrics for those formats can be supplied only the X11 server.
Other drivers don;t the font metrics from X11 -- I think pdf is a
good counterexample. When you run in 'batch' via a Perl script, you
don't have the X11 server -- even though it may be on the machine and
running, it is not associated with the particular session running
your Perl job.  There are two common fixes:
a) if you must have png() as a format, you can start a virtual X11
server with the xvfb server -- this is a bit involved, but doable;

Attached is an init script I use to start up xvfb on Linux.
HTH,
Joe

#!/bin/bash
#
# syslogStarts Xvfb.
#
#
# chkconfig: 2345 12 88
# description: Xvfb is a facility that applications requiring an X frame buffer 
\
# can use in place of actually running X on the server
# Source function library.
. /etc/init.d/functions
[ -f /usr/X11R6/bin/Xvfb ] || exit 0
XVFB=/usr/X11R6/bin/Xvfb :5 -screen 0 1024x768x16
RETVAL=0
umask 077
start() {
   echo -n $Starting Xvfb: 
   $XVFB
   RETVAL=$?
   echo_success
   echo
   [ $RETVAL = 0 ]  touch /var/lock/subsys/Xvfb
   return $RETVAL
}
stop() {
   echo -n $Shutting down Xvfb: 
   killproc Xvfb
   RETVAL=$?
   echo
   [ $RETVAL = 0 ]  rm -f /var/lock/subsys/Xvfb
   return $RETVAL
}
restart() {
   stop
   start
}
case $1 in
 start)
   start
   ;;
 stop)
   stop
   ;;
 restart|reload)
   restart
   ;;
 condrestart)
   [ -f /var/lock/subsys/Xvfb ]  restart || :
   ;;
 *)
   echo $Usage: $0 {start|stop|restart|condrestart}
   exit 1
esac
exit $RETVAL
 

Hmm, the only problem with that is that, if I am not mistaken, you are 
starting Xvfb without any authentication, and I am told by people who 
know about such things that in the context of an Internet-accessible Web 
server, having an X server accepting unauthenticated connections is not 
a good idea. In other, less hostile environments, it might be OK. Maybe 
such concerns are unreasonable paranoia, but my motto is better safe 
than sorry when it comes to Internet-facing servers. I think there are 
also other switches you can pass to Xvfb to stop it listening on various 
TCP/IP ports etc.

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Lattice graph with segments

2004-12-04 Thread Tim Churches
Andrew Robinson wrote:
Ruud,
try something like the following (not debugged, no coffee yet):
xyplot(coupon.period ~ median, data=prepayment,
 subscripts=T,	 
 panel=function(x,y,subscripts,...){
   panel.xyplot(x,y)
   panel.segments(deel1$lcl[subscripts], deel$ucl[subscripts])
 }
)

Andrew Robinson wrote:
 Ruud,

 try something like the following (not debugged, no coffee yet):


 xyplot(coupon.period ~ median, data=prepayment,
  subscripts=T, 
  panel=function(x,y,subscripts,...){
panel.xyplot(x,y)
panel.segments(deel1$lcl[subscripts], deel$ucl[subscripts])
  }
 )

Not quite:
library(lattice)
prepayment - data.frame(median=c(10.89,12.54,10.62,8.46,7.54,4.39),
 ucl=c(NA,11.66,9.98,8.05,7.27,4.28),
 lcl=c(14.26,13.34,11.04,8.72,7.90,4.59),
 coupon.period=c('a','b','c','d','e','f'))
xyplot(coupon.period ~ median, data=prepayment,
 subscripts=T,  
 panel=function(x,y,subscripts,...){
   panel.xyplot(x,y)
   panel.segments(prepayment$lcl[subscripts], prepayment$ucl[subscripts])
 }
)
throws the error:
Error in max(length(x0), length(x1), length(y0), length(y1)) :
Argument x1 is missing, with no default
Tim C
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How about a mascot for R?

2004-12-02 Thread Tim Churches
Damian Betebenner wrote:
R users,
How come R doesn't have a mascot? 
Perhaps someone with artistic flair could create a mascot based on this 
image? It would help to give newcomers to R-help the right idea:

http://www.accesscom.com/~alvaro/alien/thepics/ripley1__.jpg
Tim C



signature.asc
Description: PGP signature
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Unable to understand strptime() behaviour

2004-12-01 Thread Tim Churches
R V2.0.1 on Windows XP.
I have read the help pages on strptime() over and over, but can't
understand why strptime() is producing the following results.
   v - format(2002-11-31, format=%Y-%m-%d)
   v
[1] 2002-11-31
   factor(v, levels=v)
[1] 2002-11-31
Levels: 2002-11-31
   x - strptime(2002-11-31, format=%Y-%m-%d)
   x
[1] 2002-12-01
   factor(x, levels=x)
[1] NA NA NA NA NA NA NA NA NA
Levels: 2002-12-01 NA NA NA NA NA NA NA NA
Tim C
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: Re: [R] draft of posting guide. Sorry.

2003-12-22 Thread Tim Churches
A.J. Rossini [EMAIL PROTECTED] wrote:
 However, the amount (and quality) of
 (freely-available, at least for the cost of download, which might not
 be free) documentation for R is simply incredible.  The closest that
 I've seen, for freely available languages, is Python, for actual
 quality of documentation.

The Python documentation is truly excellent, but I agree, the R documentation is 
even better. Sometimes the R help is a bit terse, but that simply means that one 
has to think a bit to work out what is meant, but I have never found it to be 
insufficient.

Tim C

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: RE: [R] R and Memory

2003-12-02 Thread Tim Churches
Mulholland, Tom [EMAIL PROTECTED] wrote:
 
 I would suggest that you make a more thorough search of the
 R-Archives.
 (http://finzi.psych.upenn.edu/search.html) If you do you will find
 this
 discussion has been had several times and that the type of machine
 you
 are running will have an impact upon what you can do. My feeling is
 that
 you are going have to knuckle down with the documentation and
 understand
 how R works and then when you have specific issues that show you have
 read all the appropriate documentation, you might try another message
 to
 the list.
 
 Ciao, Tom

Another approach is to not try to bring all your data into R at once - it is unlikely 
that you actually need every column of every row in your dataset to undertake any 
particular analysis. The trick is to bring into R only those rows and columns which 
you need at a particular moment, and then discard them.

The best way to do this is to manage your data in an SQl database such as 
MySQL or PostgreSQL, and then use one of the R database interfaces to issue 
queries against this database and to surface the query results as a data frame.  
Remeber to compose your queries in such as way as to only retreive the rows 
and columns you really need at any particular moment, and don't forget to delete 
these data frames as soon as you have finished with them (or at least, as soon 
as you need more space in your R session).

There is also an (experimental I think) package which allows lazy or virtual 
loading of database queries into data frames, so that the query results are paged 
into memory as they are needed. But I doubt you will need that.

Tim C

 
 _
  
 Tom Mulholland
 Senior Policy Officer
 WA Country Health Service
 Tel: (08) 9222 4062
  
 The contents of this e-mail transmission are confidential and may be
 protected by professional privilege. The contents are intended only
 for
 the named recipients of this e-mail. If you are not the intended
 recipient, you are hereby notified that any use, reproduction,
 disclosure or distribution of the information contained in this
 e-mail
 is prohibited. Please notify the sender immediately.
 
 
 -Original Message-
 From: Edward McNeil [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, 2 December 2003 8:45 AM
 To: [EMAIL PROTECTED]
 Subject: [R] R and Memory
 
 
 Dear all,
 This is my first post.
 We have started to use R here and have also started teaching it to
 our
 PhD students. Our unit will be the HQ for developing R throughout
 Thailand.
 
 I would like some help with a problem we are having. We have one
 sample
 of data that is quite large in fact - over 2 million records (ok ok
 it's
 more like a population!). The data is stored in SPSS. The file is
 over
 350Mb but SPSS happily stores this much data. Now when I try to read
 it
 into R it grunts and groans for a few seconds and then reports that
 there is not enough memory (the computer has 250MB RAM). I have tried
 setting the memory in the command line (--max-vsize and
 --max-mem-size)
 but all to no avail.
 
 Any help would be muchly appreciated!
 
 Edward McNeil (son of Don)
 Epidemiology Unit
 Faculty of Medicine
 Prince of Songkhla University
 Hat Yai  90110
 THAILAND
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] SAS transport files and the foreign package

2003-01-18 Thread Tim Churches
On Sat, 2003-01-18 at 07:45, Frank E Harrell Jr wrote:
 I had no idea how strange the XPORT format really is.

Like the fact that the IBM double precision representation used in XPORT
uses 7 bits for the exponent and 56 bits for the mantissa, whereas IEEE
format uses 11 bits for the exponent and 52 bits for the mantissa.
  
 Following Duncan Temple Lang's suggestion I am contacting one of our
 clients to see what they think about moving towards XML for this.
 My guess is that XML will take a while to be used routinely for 
 this and that the sometimes huge datasets involved will cause XML 
 files to be monstrous (compression will help but will tax memory
 usage of R at least temporarily during processing).  

The nice things about the SAS XML engine are: 
a) all the metadata associated with a dataset is included in the
generated XML file, including not just the names of the formats
for each variable (column), but the actual format value labels
themselves. 
b) more than one dataset can be included in a single generated XML
export file
c) like the XPORT format, close to foolproof from the SAS user's
point of view, because the SAS XML engine does all the work.

The generated files are indeed huge (relative to the
amount of actual data they contain). For our purposes,
this is not likely to be a huge problem - we select
and/or summarise data in SAS, and then pass the subset or 
summary set to R. At the moment, we are experimenting with 
parsing the SAS XML files with Python and then passing the 
data to R via RPy (the Python-to-R bridge) - mainly because
I am slightly more adept at writing Python than R. However, the
ability of R to read SAS XML files directly, and to set
up categorical SAS variables which have formats as factor
columns in R data.frames, would be fabulous.

Tim C

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help