Re: [R] matching-case sensitivity

2003-08-27 Thread Spencer Graves
Alternatively, you could use casefold.  This would make your code more 
compatible with S-Plus.  For me, toupper and tolower are easier 
names to remember and easier to read.  However, if you think that 
someone might want to try using your code with S-Plus, then casefold 
might be the better choice.

hope this helps.  spencer graves

Marc Schwartz wrote:
On Tue, 2003-08-26 at 15:09, Jablonsky, Nikita wrote:

Hi All,

I am trying to match two character arrays (email lists) using either
pmatch(), match() or charmatch() functions. However the function is
missing some matches due to differences in the cases of some letters
between the two arrays. Is there any way to disable case sensitivity or is
there an entirely better way to match two character arrays that have
identical entries but written in different case?
Thanks
Nikita


At least two options for case insensitive matching:

1. use grep(), which has an 'ignore.case' argument that you can set to
TRUE. See ?grep
2. use the function toupper() to convert both character vectors to all
upper case. See ?toupper.  Conversely, tolower() would do the opposite.
A quick solution using the second option would be:

Vector1[toupper(Vector1) %in% toupper(Vector2)]

which would return the elements that match in both vectors.

A more formal example with some data:

Vector1 - letters[1:10]
Vector1
[1] a b c d e f g h i j
Vector2 - c(toupper(letters[5:8]), letters[9:15])
Vector2
[1] E F G H i j k l m n o
Vector1[toupper(Vector1) %in% toupper(Vector2)]
[1] e f g h i j
HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Package for Numerical Integral?

2003-08-27 Thread Spencer Graves
Did you consider integrate?

hope this helps.  spencer graves

Yao, Minghua wrote:
Dear all,

Is there any package for numerically calculating an integral? Thanks in
advance.
-Minghua

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R on Linux/Opteron?

2003-08-27 Thread Peter Dalgaard BSA
Dirk Eddelbuettel [EMAIL PROTECTED] writes:

 On Tue, Aug 26, 2003 at 03:17:19PM -0400, Liaw, Andy wrote:
  Has anyone tried using R on the the AMD Opteron in either 64- or 32-bit
  mode?  If so, any good/bad experiences, comments, etc?  We are considering
  getting this hardware, and would like to know if R can run smoothly on such
  a beast.  Any comment much appreciated.
 
 http://buildd.debian.org/build.php?pkg=r-basearch=ia64file=log
 
 has logs of R builds on ia64 since Nov 2001, incl. the outcome of make
 check. We do not run the torture tests -- though I guess we could on some of
 the beefier hardware such as ia64. 

I don't think that's quite the same beast, though. Opterons are the
x86-64 (or amd64) architecture and ia64 is Intel's, aka Itanium.
Debian appears to be just warming up to including this architecture:
http://lists.debian.org/debian-x86-64/2003/debian-x86-64-200308/threads.html
whereas they have had ia64 out for a while.

SuSE has an Opteron option and Luke said he tried it. Apparently it
has a functioning 64-bit compiler toolchain - I weren't sure earlier
whether they were just running a 64bit kernel and 32bit applications,
but when Luke says so, I believe it...

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R on Linux/Opteron?

2003-08-27 Thread Dirk Eddelbuettel
On Tue, Aug 26, 2003 at 03:17:19PM -0400, Liaw, Andy wrote:
 Has anyone tried using R on the the AMD Opteron in either 64- or 32-bit
 mode?  If so, any good/bad experiences, comments, etc?  We are considering
 getting this hardware, and would like to know if R can run smoothly on such
 a beast.  Any comment much appreciated.

http://buildd.debian.org/build.php?pkg=r-basearch=ia64file=log

has logs of R builds on ia64 since Nov 2001, incl. the outcome of make
check. We do not run the torture tests -- though I guess we could on some of
the beefier hardware such as ia64. 

Dirk

-- 
Those are my principles, and if you don't like them... well, I have others.
-- Groucho Marx

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] matching-case sensitivity

2003-08-27 Thread Thomas Lumley
On Tue, 26 Aug 2003, Jablonsky, Nikita wrote:

 Hi All,

 I am trying to match two character arrays (email lists) using either
 pmatch(), match() or charmatch() functions. However the function is
 missing some matches due to differences in the cases of some letters
 between the two arrays. Is there any way to disable case sensitivity or is
 there an entirely better way to match two character arrays that have
 identical entries but written in different case?


You could use tolower() or toupper() to remove case differences.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] matching-case sensitivity

2003-08-27 Thread Marc Schwartz
On Tue, 2003-08-26 at 15:09, Jablonsky, Nikita wrote:
 Hi All,
 
 I am trying to match two character arrays (email lists) using either
 pmatch(), match() or charmatch() functions. However the function is
 missing some matches due to differences in the cases of some letters
 between the two arrays. Is there any way to disable case sensitivity or is
 there an entirely better way to match two character arrays that have
 identical entries but written in different case?
 
 Thanks
 Nikita


At least two options for case insensitive matching:

1. use grep(), which has an 'ignore.case' argument that you can set to
TRUE. See ?grep

2. use the function toupper() to convert both character vectors to all
upper case. See ?toupper.  Conversely, tolower() would do the opposite.


A quick solution using the second option would be:

Vector1[toupper(Vector1) %in% toupper(Vector2)]

which would return the elements that match in both vectors.


A more formal example with some data:

Vector1 - letters[1:10]
Vector1
[1] a b c d e f g h i j


Vector2 - c(toupper(letters[5:8]), letters[9:15])
Vector2
[1] E F G H i j k l m n o


Vector1[toupper(Vector1) %in% toupper(Vector2)]
[1] e f g h i j


HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] GWplot

2003-08-27 Thread Jason Fisher
For anyone wishing to see R-Tcl/Tk-MySQL in action (Windows XP)...

http://moffett.isis.ucla.edu/gwplot/

Examples were by far the most useful learning tool during my programming 
endeavors so I hope this may help in your own projects.

Thanks,
Jason
_
Enter for your chance to IM with Bon Jovi, Seal, Bow Wow, or Mary J Blige 
using MSN Messenger http://entertainment.msn.com/imastar

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Seeking Packaging advice

2003-08-27 Thread Ross Boylan
I have two questions about packaging up code.

1) Weave/tangle advisable?
In the course of extending some C code already in S, I had to work out
the underlying math.  It seems to me useful to keep this information
with the code, using Knuth's tangle/weave type tools.  I know there is
some support for this in R code, but my question is about the wisdow of
doing this with C (or Fortran, or other source) code.

Against the advantage of having the documentation and code nicely
integrated are the drawbacks of added complexity in the build process
and portability concerns.  Some of this is mitigated by the existing
dependence on TeX.

An intermediate approach would be to provide both the web (in the Knuth
sense) source and the C output; the latter could be used directly by
those not wishing to hassle with web.  This isn't ideal, since the
resulting C is likely to be a bit cryptic, and if someone edits the C
without changing the web source confusion will reign.

So do people have any thoughts about whether introducing this is a step
forward or back?

2) Modifications of existing packages.
I modified the survival package (I'm not sure if that's properly called
a base package, but it's close).  I know in this particular case, if
I'm serious, I probably should contact the package maintainer.  But this
kind of operation will probably be pretty common for me; I imagine many
on this list have already done it.  In general, is the best thing to do
a) package the new routines as a small additional package, with a
dependence on the base package if necessary (the particular change I've
made actually produces a few distinct files, slight tweaks of existing
ones, that can stand on their own)
b) package the new things in with the old under the same name as the old
(obviously requires working with package maintainter)
c) package the new things with the old and give it a new name.

I'm also curious about what development strategy is best; I did b), and
it seemed to work OK.  But I kept expecting it to cause disaster (it
probably helped that I usually didn't load the baseline survival
packages; clearly that wouldn't be an option if working with one of the
automatically loaded packages).

Thanks.
-- 
Ross Boylan  wk:  (415) 502-4031
530 Parnassus Avenue (Library) rm 115-4  [EMAIL PROTECTED]
Dept of Epidemiology and Biostatistics   fax: (415) 476-9856
University of California, San Francisco
San Francisco, CA 94143-0840 hm:  (415) 550-1062

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Generating routine for Poisson random numbers

2003-08-27 Thread Paul Meagher
 You can generate Poisson random numbers from a Poisson process like this:

 rfishy-function(lambda){
 t - 0
 i - -1
 while(t=lambda){
 t-t-log(runif(1))
 i-i+1
 }
 return(i)
 }

This is a nice compact algorithm for generating Poisson random numbers.  It
appears to work.  I am used to seeing a Poisson counting process implemented
using a num frames parameter, a success probability per frame, and
counting the unform numbers in the 0 to 1 range that fall below the success
probability over num frames.  It is interesting to see an implementation
that bypasses having to use a num_frames parameter, just the lambda value,
and relies instead on incrementing a t value on each iteration and counting
the number of times you can do so (while t = lambda).

 The name of this generator is descriptive, not a pub. It is very slow for
 large lambda, and incorrect for extremely large lambda (and possibly for
 extremely small lambda). If you only wanted a fairly small number of
 random variates with, say, 1e-6lambda100, then it's not too bad.

One could impose a lambda range check so that you can only invoke the
function using a lamba range where the Poisson RNG is expected to be
reasonably accurate.  The range you are giving is probably the most commonly
used range where a Poisson random number generator might be used?  Brian
Ripley also mentioned that the counting process based implementation would
not work well for large lambdas.  Do you encounter such large lambdas in
practice?   Can't you always, in theory, avoid such large lambdas by
changing the size of the time interval you want to consider?

 But why would anyone *want* to code their own Poisson random number
generator, except perhaps as an interesting student exercise?

Yes this is meant as an interesting exercise for someone who wants to
understand how to implement probability distributions in an object oriented
way (I am writing an article introducing people to probability modelling).
I am looking for a compact algorithm that I can easily explain to people how
it works and which will be a good enough rpois() approximation in many
cases.  I don't want to be blown out of the water for suggesting such an
algorithm to represent a Poisson RNG so if you think it is inappropriate to
learn about what how a Poisson RNG works using the above described
generating process,  then I would be interested in your views.

Thank you for your thoughts on this matter.

Regards,
Paul Meagher




 -thomas



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Viewing function source

2003-08-27 Thread Thomas Lumley
On Tue, 26 Aug 2003 [EMAIL PROTECTED] wrote:

 Thomas Lumley wrote:

  The name of this generator is descriptive, not a pun. It is very slow
  for large lambda, and incorrect for extremely large lambda (and
  possibly for extremely small lambda).

 Not sure why it should be incorrect. It's certainly not theoretically
 incorrect. Rounding errors? Problem with runif()?

Yes. Both.  But it should work as long as lambda is much smaller than
2^53 and much larger than 2^-32 and much smaller than the period of
whatever generator you are using, so it's going to be too slow before it's
inaccurate.


  If you only wanted a fairly small number of random variates with, say,
  1e-6lambda100, then it's not too bad.

 In the above vectorised form, if it's fast for n=1 it stays pretty fast
 for large n: try it with z-rfishy(1,5); even rfishy(10,5) only
 takes a few seconds.

If you were going to implement in another language (which I thought was
the point) then vectorising won't help.  In R, yes.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Re: Generating routine for Poisson random numbers

2003-08-27 Thread Thomas Lumley

 One could impose a lambda range check so that you can only invoke the
 function using a lamba range where the Poisson RNG is expected to be
 reasonably accurate.  The range you are giving is probably the most commonly
 used range where a Poisson random number generator might be used?  Brian
 Ripley also mentioned that the counting process based implementation would
 not work well for large lambdas.  Do you encounter such large lambdas in
 practice?   Can't you always, in theory, avoid such large lambdas by
 changing the size of the time interval you want to consider?

Personally, I'd probably change the question by approximating by a Normal
for large lambda and a Bernoulli for very small lambda.

The algorithm gets slow well before it gets inaccurate, though.

  But why would anyone *want* to code their own Poisson random number
 generator, except perhaps as an interesting student exercise?

 Yes this is meant as an interesting exercise for someone who wants to
 understand how to implement probability distributions in an object oriented
 way (I am writing an article introducing people to probability modelling).
 I am looking for a compact algorithm that I can easily explain to people how
 it works and which will be a good enough rpois() approximation in many
 cases.  I don't want to be blown out of the water for suggesting such an
 algorithm to represent a Poisson RNG so if you think it is inappropriate to
 learn about what how a Poisson RNG works using the above described
 generating process,  then I would be interested in your views.

No, that's why I gave that as the exception.  There are lots of things
worth doing as a learning exercise that aren't worth doing otherwise.


I do think that in an article you should also point out to people that
there is a lot of numerical code available out there, written by people
who know a lot more than we do about what they are doing. It's often
easier than writing your own code and the results are better. One
advantage of an object-oriented approach is that you can just rip out your
implementation and slot in a new one if it is better.


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R tools for large files

2003-08-27 Thread Richard A. O'Keefe
Duncan Murdoch [EMAIL PROTECTED] wrote:
For example, if you want to read lines 1000 through 1100, you'd do it
like this:

 lines - readLines(foo.txt, 1100)[1000:1100]

I created a dataset thus:
# file foo.awk:
BEGIN {
s = 01
for (i = 2; i = 41; i++) s = sprintf(%s %02d, s, i)
n = (27 * 1024 * 1024) / (length(s) + 1)
for (i = 1; i = n; i++) print s
exit 0
}
# shell command:
mawk -f foo.awk /dev/null BIG

That is, each record contains 41 2-digit integers, and the number
of records was chosen so that the total size was approximately
27 dimegabytes.  The number of records turns out to be 230,175.

 system.time(v - readLines(BIG))
[1] 7.75 0.17 8.13 0.00 0.00
# With BIG already in the file system cache...
 system.time(v - readLines(BIG, 20)[199001:20])
[1] 11.73  0.16 12.27  0.00  0.00

What's the importance of this?
First, experiments I shall not weary you with showed that the
time to read N lines grows faster than N.
Second, if you want to select the _last_ thousand lines,
you have to read _all_ of them into memory.

For real efficiency here, what's wanted is a variant of readLines
where n is an index vector (a vector of non-negative integers,
a vector of non-positive integers, or a vector of logicals) saying
which lines should be kept.

The function that would need changing is do_readLines() in
src/main/connections.c, unfortunately I don't understand R internals
well enough to do it myself (yet).

As a matter of fact, that _still_ wouldn't yield real efficiency,
because every character would still have to be read by the modified
readLines(), and it reads characters using Rconn_fgetc(), which is
what gives readLines() its power and utility, but certainly doesn't
give it wings.  (One of the fundamental laws of efficient I/O library
design is to base it on block- or line- at-a-time transfers, not
character-at-a-time.)

The AWK program
NR = 199000 { next }
{print}
NR == 20 { exit }
extracts lines 199001:2 in just 0.76 seconds, about 15 times
faster.  A C program to the same effect, using fgets(), took 0.39
seconds, or about 30 times faster than R.

There are two fairly clear sources of overhead in the R code:
(1) the overhead of reading characters one at a time through Rconn_fgetc()
instead of a block or line at a time.  mawk doesn't use fgets() for
reading, and _does_ have the overhead of repeatedly checking a
regular expression to determine where the end of the line is,
which it is sensible enough to fast-path.
(2) the overhead of allocating, filling in, and keeping, a whole lot of
memory which is of no use whatever in computing the final result.
mawk is actually fairly careful here, and only keeps one line at
a time in the program shown above.  Let's change it:
NR = 199000 {next}
{a[NR] = $0}
NR == 20 {exit}
END {for (i in a) print a[i]}
That takes the time from 0.76 seconds to 0.80 seconds

The simplest thing that could possibly work would be to add a function
skipLines(con, n) which simply read and discarded n lines.

 result - scan(textConnection(lines), list(  ))

 system.time(m - scan(textConnection(v), integer(41)))
Read 41000 items
[1] 0.99 0.00 1.01 0.00 0.00

One whole second to read 41,000 numbers on a 500 MHz machine?

 vv - rep(v, 240)

Is there any possibility of storing the data in (platform) binary form?
Binary connections (R-data.pdf, section 6.5 Binary connections) can be
used to read binary-encoded data.

I wrote a little C program to save out the 230175 records of 41 integers
each in native binary form.  Then in R I did

 system.time(m - readBin(BIN, integer(), n=230175*41, size=4))
[1] 0.57 0.52 1.11 0.00 0.00
 system.time(m - matrix(data=m, ncol=41, byrow=TRUE))
[1] 2.55 0.34 2.95 0.00 0.00

Remember, this doesn't read a *sample* of the data, it reads *all*
the data.  It is so much faster than the alternatives in R that it
just isn't funny.  Trying scan() on the file took nearly 10 minutes
before I killed it the other day, using readBin() is a thousand times
faster than a simple scan() call on this particular data set.

There has *got* to be a way of either generating or saving the data
in binary form, using only approved Windows tools.  Heck, it can
probably be done using VBA.


By the way, I've read most of the .pdf files I could find on the CRAN site,
but haven't noticed any description of the R save-file format.  Where should
I have looked?  (Yes, I know about src/main/saveload.c; I was hoping for
some documentation, with maybe some diagrams.)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Re: Generating routine for Poisson random numbers

2003-08-27 Thread Paul Meagher
 I do think that in an article you should also point out to people that
 there is a lot of numerical code available out there, written by people
 who know a lot more than we do about what they are doing. It's often
 easier than writing your own code and the results are better. One
 advantage of an object-oriented approach is that you can just rip out your
 implementation and slot in a new one if it is better.

Exactly.  That is another reason I do not feel the need to implement the RNG
algorithm perfectly.  If someone really wants a more fool proof rpois-like
algorithm with better running time characteristics they can reimplement
method using the rpois.c normal deviates approach.

BTW, here is what my final Poisson RNG method looks like coded in PHP - it
is modelled after the JSci library approach.  I am only showing the
constructor and the RNG method:

class PoissonDistribution extends ProbabilityDistribution {

  var $lambda;

  function PoissonDistribution($lambda=1) {
if($lambda = 0.0) {
  die(Lamda parameter should be positive.);
}
$this-lambda = $interval;
  }

  function RNG($num_vals=1) {
if ($num_vals  1) {
  die(Number of random numbers to return must be 1 or greater);
}
for ($i=0; $i  $num_vals; $i++) {
  $temp  = 0;
  $count = -1;
  while($temp = $this-lambda) {
$rand_val = mt_rand() / mt_getrandmax();
$temp  = $temp - log($rand_val);
$count++;
  }
  // record count value(s)
  if ($num_vals == 1) {
$counts = $count;
  } else {
$counts[$i] = $count;
  }
}
return $counts;
  }

My simple eyeball tests indicate that the algorithm appears to generate
unbiased estimates of the expected mean and variance given lambas in the
range of .02 and 900.  I guess to confirm the unbiasedness I would need to
generate a bunch of sample estimates of the mean and variance from my
Poisson random number sequences, plot the relative frequency of these
estimates, and see if the central tendency of the estimates correspond to
the mean and variance expected theoretically for a poisson random variable
(i.e., mean and variance = lambda).

I see why the performance characteristics get bad when lambda is big - the
counting process involves more iterations.  Most of the text book examples
never use a lambda this big, often lambda is less than 100 and often not
less than .02 or so.  In other words, the typical parameter space for the
algorithm may be such that areas where it breaks down are not that common in
practice.

I think this will be a perfectly acceptable RNG for a Poisson random
variable provided you don't use unusually large or small lambda values - if
I knew the break down range, I could implement a check-range test to
disallow usage of the function for that range.

Not sure yet exactly what characteristics of the algorithm would lead it to
behave incorrectly at extremely small or large lambda values?

BTW, is this simple method of generating a poisson random number discussed
in detail in any other books or papers that I might consult?

Regards,
Paul Meagher

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] How to do leave-n-out cross validation in R?

2003-08-27 Thread Lily
Seems crossval from library(bootstrap) can only be
used for leave-one-out and k-fold cross validation?
Here is a dumb question, suppose n=80, how to do
exactly leave-50-out cross validation? K-fold cross
validation is not eligible for this case since
n/ngroup is not an integer. Thanks!

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] R tools for large files

2003-08-27 Thread Adaikalavan Ramasamy
If we are going to use unix tools to create a new dataset before calling
into R, why not simply use 

cat my_big_bad_file | tail +1001 | head -100

to read lines 1000-1100 (assuming one header row).

Or if you have the shortlisted rownames in one file, you can use join
after sort. A working example follows.


#

#!/bin/bash

# match.sh last modified 10/07/03
# Does the same thing as egrep 'a|b|c|...' file but in batch mode
# A script that matches all occurances of shortlist in data using
the first column as common key 

if [ $# -ne 2 ]; then
   echo Usage: ${0/*\/} shortlist data
   exit
fi

TEMP1=/tmp/temp1.`date +%y%m%d-%H%M%S`
TEMP2=/tmp/temp2.`date +%y%m%d-%H%M%S`
TEMP3=/tmp/temp3.`date +%y%m%d-%H%M%S`
TEMP4=/tmp/temp4.`date +%y%m%d-%H%M%S`
TEMP5=/tmp/temp5.`date +%y%m%d-%H%M%S`

grep -n . $1 | cut -f1 -d: | paste - $1  $TEMP1
sort -k 2 $TEMP1  $TEMP2 

tail +2 $2 | sort -k 1  $TEMP3  # Assume data file has header 

headerRow=`head -1 $2`

join -j1 2 -j2 1 -a 1 -t\$TEMP2 $TEMP3  $TEMP4
sort -n -k 2 $TEMP4  $TEMP5

/bin/echo $headerRow
cut -f1,3- $TEMP5# column 2 contains orderings

rm $TEMP1 $TEMP2 $TEMP3 $TEMP4


#



-Original Message-
From: Richard A. O'Keefe [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 27, 2003 9:04 AM
To: [EMAIL PROTECTED]
Subject: Re: [R] R tools for large files


Duncan Murdoch [EMAIL PROTECTED] wrote:
For example, if you want to read lines 1000 through 1100, you'd
do it
like this:

 lines - readLines(foo.txt, 1100)[1000:1100]

I created a dataset thus:
# file foo.awk:
BEGIN {
s = 01
for (i = 2; i = 41; i++) s = sprintf(%s %02d, s, i)
n = (27 * 1024 * 1024) / (length(s) + 1)
for (i = 1; i = n; i++) print s
exit 0
}
# shell command:
mawk -f foo.awk /dev/null BIG

That is, each record contains 41 2-digit integers, and the number of
records was chosen so that the total size was approximately 27
dimegabytes.  The number of records turns out to be 230,175.

 system.time(v - readLines(BIG))
[1] 7.75 0.17 8.13 0.00 0.00
# With BIG already in the file system cache...
 system.time(v - readLines(BIG, 20)[199001:20])
[1] 11.73  0.16 12.27  0.00  0.00

What's the importance of this?
First, experiments I shall not weary you with showed that the time to
read N lines grows faster than N. Second, if you want to select the
_last_ thousand lines, you have to read _all_ of them into memory.

For real efficiency here, what's wanted is a variant of readLines where
n is an index vector (a vector of non-negative integers, a vector of
non-positive integers, or a vector of logicals) saying which lines
should be kept.

The function that would need changing is do_readLines() in
src/main/connections.c, unfortunately I don't understand R internals
well enough to do it myself (yet).

As a matter of fact, that _still_ wouldn't yield real efficiency,
because every character would still have to be read by the modified
readLines(), and it reads characters using Rconn_fgetc(), which is what
gives readLines() its power and utility, but certainly doesn't give it
wings.  (One of the fundamental laws of efficient I/O library design is
to base it on block- or line- at-a-time transfers, not
character-at-a-time.)

The AWK program
NR = 199000 { next }
{print}
NR == 20 { exit }
extracts lines 199001:2 in just 0.76 seconds, about 15 times faster.
A C program to the same effect, using fgets(), took 0.39 seconds, or
about 30 times faster than R.

There are two fairly clear sources of overhead in the R code:
(1) the overhead of reading characters one at a time through
Rconn_fgetc()
instead of a block or line at a time.  mawk doesn't use fgets() for
reading, and _does_ have the overhead of repeatedly checking a
regular expression to determine where the end of the line is,
which it is sensible enough to fast-path.
(2) the overhead of allocating, filling in, and keeping, a whole lot of
memory which is of no use whatever in computing the final result.
mawk is actually fairly careful here, and only keeps one line at
a time in the program shown above.  Let's change it:
NR = 199000 {next}
{a[NR] = $0}
NR == 20 {exit}
END {for (i in a) print a[i]}
That takes the time from 0.76 seconds to 0.80 seconds

The simplest thing that could possibly work would be to add a function
skipLines(con, n) which simply read and discarded n lines.

 result - scan(textConnection(lines), list(  ))

 system.time(m - scan(textConnection(v), integer(41)))
Read 41000 items
[1] 0.99 0.00 1.01 0.00 0.00

One whole second to read 41,000 numbers on a 500 MHz machine?

 vv - rep(v, 240)

Is there any possibility of storing the data in (platform) binary form?
Binary connections 

Re: [R] R tools for large files

2003-08-27 Thread Prof Brian Ripley
I'm bored, but just to point out the obvious fact: to skip n lines in a
text file you have to read *all* the characters in between to find the
line separators.

I have known for 30 years that reading text files of numbers is slow and 
inefficient.  So do it only once and dump the results to a binary format, 
or a RDBMS or 

On Wed, 27 Aug 2003, Richard A. O'Keefe wrote:

 Duncan Murdoch [EMAIL PROTECTED] wrote:
   For example, if you want to read lines 1000 through 1100, you'd do it
   like this:
   
lines - readLines(foo.txt, 1100)[1000:1100]
 
 I created a dataset thus:
 # file foo.awk:
 BEGIN {
 s = 01
 for (i = 2; i = 41; i++) s = sprintf(%s %02d, s, i)
 n = (27 * 1024 * 1024) / (length(s) + 1)
 for (i = 1; i = n; i++) print s
 exit 0
 }
 # shell command:
 mawk -f foo.awk /dev/null BIG
 
 That is, each record contains 41 2-digit integers, and the number
 of records was chosen so that the total size was approximately
 27 dimegabytes.  The number of records turns out to be 230,175.
 
  system.time(v - readLines(BIG))
 [1] 7.75 0.17 8.13 0.00 0.00
   # With BIG already in the file system cache...
  system.time(v - readLines(BIG, 20)[199001:20])
 [1] 11.73  0.16 12.27  0.00  0.00
 
 What's the importance of this?
 First, experiments I shall not weary you with showed that the
 time to read N lines grows faster than N.
 Second, if you want to select the _last_ thousand lines,
 you have to read _all_ of them into memory.
 
 For real efficiency here, what's wanted is a variant of readLines
 where n is an index vector (a vector of non-negative integers,
 a vector of non-positive integers, or a vector of logicals) saying
 which lines should be kept.
 
 The function that would need changing is do_readLines() in
 src/main/connections.c, unfortunately I don't understand R internals
 well enough to do it myself (yet).
 
 As a matter of fact, that _still_ wouldn't yield real efficiency,
 because every character would still have to be read by the modified
 readLines(), and it reads characters using Rconn_fgetc(), which is
 what gives readLines() its power and utility, but certainly doesn't
 give it wings.  (One of the fundamental laws of efficient I/O library
 design is to base it on block- or line- at-a-time transfers, not
 character-at-a-time.)
 
 The AWK program
 NR = 199000 { next }
 {print}
 NR == 20 { exit }
 extracts lines 199001:2 in just 0.76 seconds, about 15 times
 faster.  A C program to the same effect, using fgets(), took 0.39
 seconds, or about 30 times faster than R.
 
 There are two fairly clear sources of overhead in the R code:
 (1) the overhead of reading characters one at a time through Rconn_fgetc()
 instead of a block or line at a time.  mawk doesn't use fgets() for
 reading, and _does_ have the overhead of repeatedly checking a
 regular expression to determine where the end of the line is,
 which it is sensible enough to fast-path.
 (2) the overhead of allocating, filling in, and keeping, a whole lot of
 memory which is of no use whatever in computing the final result.
 mawk is actually fairly careful here, and only keeps one line at
 a time in the program shown above.  Let's change it:
   NR = 199000 {next}
   {a[NR] = $0}
   NR == 20 {exit}
   END {for (i in a) print a[i]}
 That takes the time from 0.76 seconds to 0.80 seconds
 
 The simplest thing that could possibly work would be to add a function
 skipLines(con, n) which simply read and discarded n lines.
 
result - scan(textConnection(lines), list(  ))
   
  system.time(m - scan(textConnection(v), integer(41)))
 Read 41000 items
 [1] 0.99 0.00 1.01 0.00 0.00
 
 One whole second to read 41,000 numbers on a 500 MHz machine?
 
  vv - rep(v, 240)
 
 Is there any possibility of storing the data in (platform) binary form?
 Binary connections (R-data.pdf, section 6.5 Binary connections) can be
 used to read binary-encoded data.
 
 I wrote a little C program to save out the 230175 records of 41 integers
 each in native binary form.  Then in R I did
 
  system.time(m - readBin(BIN, integer(), n=230175*41, size=4))
 [1] 0.57 0.52 1.11 0.00 0.00
  system.time(m - matrix(data=m, ncol=41, byrow=TRUE))
 [1] 2.55 0.34 2.95 0.00 0.00
 
 Remember, this doesn't read a *sample* of the data, it reads *all*
 the data.  It is so much faster than the alternatives in R that it
 just isn't funny.  Trying scan() on the file took nearly 10 minutes
 before I killed it the other day, using readBin() is a thousand times
 faster than a simple scan() call on this particular data set.
 
 There has *got* to be a way of either generating or saving the data
 in binary form, using only approved Windows tools.  Heck, it can
 probably be done using VBA.
 
 
 By the way, I've read most of the .pdf files I could find on the CRAN site,
 but haven't noticed any description of the R save-file format.  Where should
 I have looked?  (Yes, I 

Re: [R] How to do leave-n-out cross validation in R?

2003-08-27 Thread Prof Brian Ripley
On Tue, 26 Aug 2003, Lily wrote:

 Seems crossval from library(bootstrap) can only be
 used for leave-one-out and k-fold cross validation?
 Here is a dumb question, suppose n=80, how to do
 exactly leave-50-out cross validation? K-fold cross
 validation is not eligible for this case since
 n/ngroup is not an integer. Thanks!

First, _you_ have to say exactly what _you_ mean by leave-n-out CV. If you
can specify the algorithm, you can program it in R (or we may be able to 
help).

As I have never encountered this, I don't know the definition, nor do I
see the point.  I suspect it is not really cross-validation at all (the
term is widely misused in the machine-learning/neural nets communities to
mean the use of a validation set).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Seeking Packaging advice

2003-08-27 Thread Prof Brian Ripley
On Tue, 26 Aug 2003, Ross Boylan wrote:

 I have two questions about packaging up code.
 
 1) Weave/tangle advisable?
 In the course of extending some C code already in S, I had to work out
 the underlying math.  It seems to me useful to keep this information
 with the code, using Knuth's tangle/weave type tools.  I know there is
 some support for this in R code, but my question is about the wisdow of
 doing this with C (or Fortran, or other source) code.
 
 Against the advantage of having the documentation and code nicely
 integrated are the drawbacks of added complexity in the build process
 and portability concerns.  Some of this is mitigated by the existing
 dependence on TeX.

There is none. We don't assume a working latex/tex, although some manuals 
will not be produced without working (pdf)latex (or texinfo-pdf).

One quick comment: the pre-compiled packages (for Windows now and MacOS X
for the next release) are produced automatically without user
intervention.  So if you want to have a package on CRAN, it needs to work
out of the box, and there is no dependence on TeX, let alone weave/tangle,
in the standard procedure.

 An intermediate approach would be to provide both the web (in the Knuth
 sense) source and the C output; the latter could be used directly by
 those not wishing to hassle with web.  This isn't ideal, since the
 resulting C is likely to be a bit cryptic, and if someone edits the C
 without changing the web source confusion will reign.
 
 So do people have any thoughts about whether introducing this is a step
 forward or back?

A useful analogue: we now distribute Fortran code not the original Ratfor.


 2) Modifications of existing packages.
 I modified the survival package (I'm not sure if that's properly called
 a base package, but it's close).  I know in this particular case, if

It's a `recommended' package, as the DESCRIPTION file says.  There is a 
base package, and several standard packages bundled with R, which have
priority base and are often call `base packages'.

 I'm serious, I probably should contact the package maintainer.  But this
 kind of operation will probably be pretty common for me; I imagine many
 on this list have already done it.  In general, is the best thing to do
 a) package the new routines as a small additional package, with a
 dependence on the base package if necessary (the particular change I've
 made actually produces a few distinct files, slight tweaks of existing
 ones, that can stand on their own)
 b) package the new things in with the old under the same name as the old
 (obviously requires working with package maintainter)
 c) package the new things with the old and give it a new name.
 
 I'm also curious about what development strategy is best; I did b), and
 it seemed to work OK.  But I kept expecting it to cause disaster (it
 probably helped that I usually didn't load the baseline survival
 packages; clearly that wouldn't be an option if working with one of the
 automatically loaded packages).

I think a) is the best, including changing the names of any R functions 
you alter, and changing the entry points in any compiled code you alter.

Package maintainers may have very good reasons not to go along with b), 
including their not being the original authors (true for survival), 
workload, lack of interest in the proposed changes, complications of 
ownership and copyright, 

c) is I believe unwise.  It may be allowed by the licence (or may not) but
in the couple of cases where I have seen it done it did not give anything
like adequate credit to the original authors (who were never consulted)
and the modified code distributed was out-of-date when originally 
released, let alone now.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] selecting by variable

2003-08-27 Thread Eugene Salinas
Hi,

I'm a recent R convert so I haven't quite figured out
the details yet...

How do I select one variable by another one? Ie if I
want to draw the histogram of variable X only for
those individuals that also have a value Y in a
certain range?

In STATA I would give something like:

histogram X if ((Y=A  Y=B))

(The data is for individuals and each individual has a
number of characteristics including X and Y).

thanks, eugene.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] selecting by variable

2003-08-27 Thread Hotz, T.
Eugene,

R allows indexing with logical vectors, so your example would look
like

hist(X[(Y=A)  (Y=B)])

See the manual An Introduction to R for details.

HTH

Thomas

 -Original Message-
 From: Eugene Salinas [mailto:[EMAIL PROTECTED]
 Sent: 27 August 2003 09:49
 To: [EMAIL PROTECTED]
 Subject: [R] selecting by variable
 
 
 Hi,
 
 I'm a recent R convert so I haven't quite figured out
 the details yet...
 
 How do I select one variable by another one? Ie if I
 want to draw the histogram of variable X only for
 those individuals that also have a value Y in a
 certain range?
 
 In STATA I would give something like:
 
 histogram X if ((Y=A  Y=B))
 
 (The data is for individuals and each individual has a
 number of characteristics including X and Y).
 
 thanks, eugene.
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

---

Thomas Hotz
Research Associate in Medical Statistics
University of Leicester
United Kingdom

Department of Epidemiology and Public Health
22-28 Princess Road West
Leicester
LE1 6TP
Tel +44 116 252-5410
Fax +44 116 252-5423

Division of Medicine for the Elderly
Department of Medicine
The Glenfield Hospital
Leicester
LE3 9QP
Tel +44 116 256-3643
Fax +44 116 232-2976

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] matching-case sensitivity

2003-08-27 Thread Petr Pikal
Hallo

On 26 Aug 2003 at 13:09, Jablonsky, Nikita wrote:

 Hi All,
 
 I am trying to match two character arrays (email lists) using either
 pmatch(), match() or charmatch() functions. However the function is
 missing some matches due to differences in the cases of some letters
try toupper or tolower

 ttt-toupper(differences in the cases)
 ttt
[1] DIFFERENCES IN THE CASES
 tolower(ttt)
[1] differences in the cases



 between the two arrays. Is there any way to disable case sensitivity
 or is there an entirely better way to match two character arrays that
 have identical entries but written in different case?
 
 Thanks
 Nikita
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Petr Pikal
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] selecting by variable

2003-08-27 Thread Philippe Glaziou
Eugene Salinas [EMAIL PROTECTED] wrote:
 How do I select one variable by another one? Ie if I
 want to draw the histogram of variable X only for
 those individuals that also have a value Y in a
 certain range?
 
 In STATA I would give something like:
 
 histogram X if ((Y=A  Y=B))

hist(x[Y=A  Y=B])

See:
?Subscript
?

-- 
Philippe Glaziou

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] selecting by variable

2003-08-27 Thread Ko-Kang Kevin Wang
On Wed, 27 Aug 2003, Eugene Salinas wrote:

 I'm a recent R convert so I haven't quite figured out
 the details yet...

Usually it is good to read the manuals when you use a unfamiliar 
software...

 How do I select one variable by another one? Ie if I
 want to draw the histogram of variable X only for
 those individuals that also have a value Y in a
 certain range?

e.g.
  x = rnorm(100)
  y = 1:100
  x[y = 20:50]
will give you the value of x when y is between 20 and 50.  To do a 
histogram, type:
  ?hist

-- 
Cheers,

Kevin

--
On two occasions, I have been asked [by members of Parliament],
'Pray, Mr. Babbage, if you put into the machine wrong figures, will
the right answers come out?' I am not able to rightly apprehend the
kind of confusion of ideas that could provoke such a question.

-- Charles Babbage (1791-1871) 
 From Computer Stupidities: http://rinkworks.com/stupid/

--
Ko-Kang Kevin Wang
Master of Science (MSc) Student
SLC Tutor and Lab Demonstrator
Department of Statistics
University of Auckland
New Zealand
Homepage: http://www.stat.auckland.ac.nz/~kwan022
Ph: 373-7599
x88475 (City)
x88480 (Tamaki)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] selecting by variable

2003-08-27 Thread Prof Brian Ripley
hist(X[Y=A  Y=B])

`An Introduction to R' explains such things, as do (in more detail) the 
introductory texts (see the R FAQ).


On Wed, 27 Aug 2003, Eugene Salinas wrote:

 I'm a recent R convert so I haven't quite figured out
 the details yet...
 
 How do I select one variable by another one? Ie if I
 want to draw the histogram of variable X only for
 those individuals that also have a value Y in a
 certain range?
 
 In STATA I would give something like:
 
 histogram X if ((Y=A  Y=B))
 
 (The data is for individuals and each individual has a
 number of characteristics including X and Y).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] GWplot

2003-08-27 Thread Martin Maechler
 Jason == Jason Fisher [EMAIL PROTECTED]
 on Tue, 26 Aug 2003 14:08:26 -0700 writes:

Jason For anyone wishing to see R-Tcl/Tk-MySQL in action (Windows XP)...
Jason http://moffett.isis.ucla.edu/gwplot/

Jason Examples were by far the most useful learning tool
Jason during my programming endeavors so I hope this may
Jason help in your own projects.

Thank you, Jason.  I like this spirit of sharing!
This looks very interesting for a project we will start here in
a few weeks.  From reading the above web page, it's not clear
why you say
 The current version of GWplot is designed for Windows 2000/XP 
when all the tools you say you are using are typically part of
every Linux distribution (and also available probably for every
platform R runs apart from classic MacOS).

What problems do you see using this outside of Win-Xp?
Regards,
Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R tools for large files

2003-08-27 Thread Duncan Murdoch
On Wed, 27 Aug 2003 13:03:39 +1200 (NZST), you wrote:

For real efficiency here, what's wanted is a variant of readLines
where n is an index vector (a vector of non-negative integers,
a vector of non-positive integers, or a vector of logicals) saying
which lines should be kept.

I think that's too esoteric to be worth doing.  Most often in cases
where you aren't reading every line, you don't know which lines to
read until you've read earlier ones.

There are two fairly clear sources of overhead in the R code:
(1) the overhead of reading characters one at a time through Rconn_fgetc()
instead of a block or line at a time.  mawk doesn't use fgets() for
reading, and _does_ have the overhead of repeatedly checking a
regular expression to determine where the end of the line is,
which it is sensible enough to fast-path.

One complication with reading a block at a time is what to do when you
read too far.  Not all connections can use seek() to reposition to the
beginning, so you'd need to read them one character at a time, (or
attach a buffer somehow, but then what about rw connections?)

The simplest thing that could possibly work would be to add a function
skipLines(con, n) which simply read and discarded n lines.

result - scan(textConnection(lines), list(  ))

That's probably worth doing.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] seeking help with with()

2003-08-27 Thread Simon Fear
I tried to define a function like:

fnx - function(x, by.vars=Month)
  print(by(x, by.vars, summary))

But this doesn't work (does not find x$Month; unlike other functions,
such as
subset(), the INDICES argument to by does not look for variables in
dataset
x. Is fully documented, but I forget every time). So I tried using
with:

fnxx - function(x, by.vars=Month)
  print(with(x, by(x, by.vars, summary)))

Still fails to find object x$Month. 

I DO have a working solution (below) - this post is just to ask: Can
anyone
explain what happened to the with()?



FYI solutions are to call like this:

fnx(airquality, airquality$Month)

but this will not work generically - e.g. in my real application the
dataset
gets subsetted and by.vars needs to refer to the subsets. So redefine
like
this:

fny - function(x, by.vars=Month) {
  attach(x)
  print(by(x, by.vars, summary))
  detach(x)
}
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] seeking help with with()

2003-08-27 Thread Prof Brian Ripley
On Wed, 27 Aug 2003, Simon Fear wrote:

 I tried to define a function like:
 
 fnx - function(x, by.vars=Month)
   print(by(x, by.vars, summary))
 
 But this doesn't work (does not find x$Month; unlike other functions,
 such as
 subset(), the INDICES argument to by does not look for variables in
 dataset
 x. Is fully documented, but I forget every time). So I tried using
 with:
 
 fnxx - function(x, by.vars=Month)
   print(with(x, by(x, by.vars, summary)))
 
 Still fails to find object x$Month. 

That's not the actual error message, is it?

 I DO have a working solution (below) - this post is just to ask: Can
 anyone
 explain what happened to the with()?

Nothing!

by.vars is a variable passed to fnxx, so despite lazy evaluation, it is
going to be evaluated in the environment calling fnxx().  If that fails to
find it, it looks for the default value, and evaluates that in the
environment of the body of fnxx.  It didn't really get as far as with.

(I often forget where default args are evaluated, but I believe that is 
correct in R as well as in S.)

I think you intended Months to be a name and not a variable.  With

X - data.frame(z=rnorm(20), Month=factor(rep(1:2, each=10)))

fnx - function(x, by.vars=Month)
   print(by(x, x[by.vars], summary))

will work, as will

fnx - function(x, by.vars=Month)
   print(by(x, x[deparse(substitute(by.vars))], summary))


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] seeking help with with()

2003-08-27 Thread Peter Dalgaard BSA
Simon Fear [EMAIL PROTECTED] writes:

 I tried to define a function like:
 
 fnx - function(x, by.vars=Month)
   print(by(x, by.vars, summary))
 
 But this doesn't work (does not find x$Month; unlike other functions,
 such as
 subset(), the INDICES argument to by does not look for variables in
 dataset
 x. Is fully documented, but I forget every time). So I tried using
 with:
 
 fnxx - function(x, by.vars=Month)
   print(with(x, by(x, by.vars, summary)))
 
 Still fails to find object x$Month. 
 
 I DO have a working solution (below) - this post is just to ask: Can
 anyone
 explain what happened to the with()?
 

Nothing, but by.vars is evaluated in the function frame where it is
not defined. I think you're looking for something like

function(x, by.vars) {
  if (missing(by.vars)) by.vars - as.name(Month)
  print(eval.parent(substitute(with(x, by(x, by.vars, summary)
}

(Defining the default arg requires a bit of sneakiness...)
-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] seeking help with with()

2003-08-27 Thread Simon Fear
Thank you so much for that fix (to my understanding).

I would be willing to add such an example to the help 
page for future releases - though I'm sure others would 
do it better - there are currently no examples where
 INDICES is a name.

In fact in my real application it is more or less essential
that INDICES is a name or at least deparse(substituted 
as a subscript; in a slight elaboration of my previous fix

fnz - function(dframe, by.vars=treat)
  for (pop in 1:2) {
dframe.pop - subset(dframe, ITT==pop)
attach(dframe.pop)
print(by(dframe.pop, by.vars, summary))
detach(dframe.pop)
  }

the second call (when pop=2) to by() will crash because by.vars 
is not re-evaluated afresh - it retains its value 
from the first loop.

So, my fix was wrong and I am happy to stand corrected.


 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: 27 August 2003 14:08
 To: Simon Fear
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] seeking help with with()
 
 
 Security Warning:
 If you are not sure an attachment is safe to open please contact 
 Andy on x234. There are 0 attachments with this message.
 
 
 On Wed, 27 Aug 2003, Simon Fear wrote:
 
  I tried to define a function like:
  
  fnx - function(x, by.vars=Month)
print(by(x, by.vars, summary))
  
  But this doesn't work (does not find x$Month; unlike other 
 functions,
  such as
  subset(), the INDICES argument to by does not look for 
 variables in
  dataset
  x. Is fully documented, but I forget every time). So I tried using
  with:
  
  fnxx - function(x, by.vars=Month)
print(with(x, by(x, by.vars, summary)))
  
  Still fails to find object x$Month. 
 
 That's not the actual error message, is it?
 
  I DO have a working solution (below) - this post is just to ask: Can
  anyone
  explain what happened to the with()?
 
 Nothing!
 
 by.vars is a variable passed to fnxx, so despite lazy 
 evaluation, it is
 going to be evaluated in the environment calling fnxx().  If 
 that fails
 to
 find it, it looks for the default value, and evaluates that in the
 environment of the body of fnxx.  It didn't really get as far as with.
 
 (I often forget where default args are evaluated, but I 
 believe that is 
 correct in R as well as in S.)
 
 I think you intended Months to be a name and not a variable.  With
 
 X - data.frame(z=rnorm(20), Month=factor(rep(1:2, each=10)))
 
 fnx - function(x, by.vars=Month)
print(by(x, x[by.vars], summary))
 
 will work, as will
 
 fnx - function(x, by.vars=Month)
print(by(x, x[deparse(substitute(by.vars))], summary))
 
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] how to calculate Rsquare

2003-08-27 Thread Ronaldo Reis Jr.
Can anybody send these articles for me? 

 NagelKerke, N. J. D. (1991) A note on a general definition of the
 coefficient of determination, Biometrika 78: 691-2.

 Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of
 determination for binary responses, The American Statistician 46:  1-4.


Thanks
Ronaldo
-- 
Of __course it's the murder weapon.  Who would frame someone with a 
fake?
--
|   // | \\   [***]
|   ( õ   õ )  [Ronaldo Reis Júnior]
|  V  [UFV/DBA-Entomologia]
|/ \   [36571-000 Viçosa - MG  ]
|  /(.''`.)\  [Fone: 31-3899-2532 ]
|  /(: :'  :)\ [EMAIL PROTECTED]]
|/ (`. `'` ) \[ICQ#: 5692561 | LinuxUser#: 205366 ]
|( `-  )   [***]
|  _/   \_Powered by GNU/Debian Woody/Sarge

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Seeking Packaging advice

2003-08-27 Thread Roger Koenker

On Wed, 27 Aug 2003, Prof Brian Ripley wrote:

 On Tue, 26 Aug 2003, Ross Boylan wrote:

  So do people have any thoughts about whether introducing this is a step
  forward or back?

 A useful analogue: we now distribute Fortran code not the original Ratfor.

As a footnote to Brian's comment, I would just say that those hardy
few of us who still write ratfor can and do include it in a subdirectory
under src since it tends to be vastly more readable than its automatically
produced fortran translation.  But we have also learned from hard
experience that one can't always rely on  the ratfor preprocessing
that is provided by systems even when it exists.

url:www.econ.uiuc.edu/~roger/my.htmlRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Seeking Packaging advice

2003-08-27 Thread Thomas Lumley
On Tue, 26 Aug 2003, Ross Boylan wrote:


 2) Modifications of existing packages.
 I modified the survival package (I'm not sure if that's properly called
 a base package, but it's close).  I know in this particular case, if
 I'm serious, I probably should contact the package maintainer.  But this
 kind of operation will probably be pretty common for me; I imagine many
 on this list have already done it.  In general, is the best thing to do
 a) package the new routines as a small additional package, with a
 dependence on the base package if necessary (the particular change I've
 made actually produces a few distinct files, slight tweaks of existing
 ones, that can stand on their own)

I think that's best

 b) package the new things in with the old under the same name as the old
 (obviously requires working with package maintainter)

The problem in this case is that the package maintainer is not the author.
Additional functionality might well be ok, but that could easily be done
with method (a).  Substantial changes to existing functions are going
cause problems when the next few thousand lines of diffs arrive from Mayo
Clinic.

 c) package the new things with the old and give it a new name.

Keeping this in sync is hard.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] how to calculate Rsquare

2003-08-27 Thread Paul, David A
I think you've badly misinterpreted the purpose 
of the R listserv with this request:


https://www.stat.math.ethz.ch/mailman/listinfo/r-help says

The `main' R mailing list, for announcements about the 
development of R and the availability of new code, questions 
and answers about problems and solutions using R, enhancements 
and patches to the source code and documentation of R, 
comparison and compatibility with S and S-plus, and for 
the posting of nice examples and benchmarks.



-Original Message-
From: Ronaldo Reis Jr. [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 27, 2003 10:04 AM
To: R-Help
Subject: Re: [R] how to calculate Rsquare


Can anybody send these articles for me? 

 NagelKerke, N. J. D. (1991) A note on a general definition of the

 coefficient of determination, Biometrika 78: 691-2.

 Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of

 determination for binary responses, The American Statistician 46:  
 1-4.


Thanks
Ronaldo
-- 
Of __course it's the murder weapon.  Who would frame someone with a 
fake?
--
|   // | \\   [***]
|   ( õ   õ )  [Ronaldo Reis Júnior]
|  V  [UFV/DBA-Entomologia]
|/ \   [36571-000 Viçosa - MG  ]
|  /(.''`.)\  [Fone: 31-3899-2532 ]
|  /(: :'  :)\ [EMAIL PROTECTED]]
|/ (`. `'` ) \[ICQ#: 5692561 | LinuxUser#: 205366 ]
|( `-  )   [***]
|  _/   \_Powered by GNU/Debian Woody/Sarge

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Exporting R graphs (review)

2003-08-27 Thread Marc Schwartz
On Wed, 2003-08-27 at 09:21, [EMAIL PROTECTED] wrote:
 Hi guys,
 
Yesterday I posted my first couple of questions (see bottom of this
 message) to this forum and I would like to thank you guys for all the
 useful feedback I got. I just would like to make some comments:
 
 1. Exporting R graphs as vector graphics:
 
 The best answer came from Thomas Lumley [EMAIL PROTECTED]
 He suggested using the RSvgDevice package. As far as I know SVG graphics
 can be manipulated with OpenOffice and also with sodipodi, I'll check this
 package out asap. This should apply to linux and win users.


Just a quick heads up that you can export SVG format files from OOo
Draw. However, there is no present ability to import them into the OOo
apps.

According to OOo's IssueZilla, there are no plans to suport SVG import
prior to version 2.0. 

There is however a fair amount of pressure to do so as SVG formats
become more prevalent as a cross-platform vector format, especially now
that web apps like Mozilla/Firebird are building support for it.

This is one of the reasons that I have stayed with bitmaps for screen
display and EPS for printing when using OOo.

Also, you may be aware that OOo V1.1 (which is at RC3 right now) can
export PDF files directly. However, if you have EPS images embedded in a
document or slide show, they print as you see them on the screen (blank
objects with the embedded title). Thus you need to print them to a PS
file and then use ps2pdf if you want a proper PDF file generated.

Lastly, for those interested, there is a java based OOo Writer to LaTeX
CLI conversion utility and Writer export filter in development.
Amazingly, it is called Writer2LaTex... ;-)

Info is available at http://www.hj-gym.dk/~hj/writer2latex/

HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] how to calculate Rsquare

2003-08-27 Thread Spencer Graves
The Battelle Institute surely should have access to a library with such 
popular and prestigious journals as Biometrika and The American 
Statistians.  If you don't have time for that, you surely should have 
money to purchase a copy from, e.g., www.lindahall.org/docserv.

hope this helps.  spencer graves

Paul, David A wrote:
I think you've badly misinterpreted the purpose 
of the R listserv with this request:

https://www.stat.math.ethz.ch/mailman/listinfo/r-help says

The `main' R mailing list, for announcements about the 
development of R and the availability of new code, questions 
and answers about problems and solutions using R, enhancements 
and patches to the source code and documentation of R, 
comparison and compatibility with S and S-plus, and for 
the posting of nice examples and benchmarks.



-Original Message-
From: Ronaldo Reis Jr. [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 27, 2003 10:04 AM
To: R-Help
Subject: Re: [R] how to calculate Rsquare

Can anybody send these articles for me? 


	  NagelKerke, N. J. D. (1991) A note on a general definition of the


coefficient of determination, Biometrika 78: 691-2.

	  Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of


determination for binary responses, The American Statistician 46:  
1-4.



Thanks
Ronaldo
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Exporting R graphs (review)

2003-08-27 Thread Peter Dalgaard BSA
Marc Schwartz [EMAIL PROTECTED] writes:

 On Wed, 2003-08-27 at 09:21, [EMAIL PROTECTED] wrote:
  Hi guys,
  
 Yesterday I posted my first couple of questions (see bottom of this
  message) to this forum and I would like to thank you guys for all the
  useful feedback I got. I just would like to make some comments:
  
  1. Exporting R graphs as vector graphics:
  
  The best answer came from Thomas Lumley [EMAIL PROTECTED]
  He suggested using the RSvgDevice package. As far as I know SVG graphics
  can be manipulated with OpenOffice and also with sodipodi, I'll check this
  package out asap. This should apply to linux and win users.
 
 
 Just a quick heads up that you can export SVG format files from OOo
 Draw. However, there is no present ability to import them into the OOo
 apps.
 
 According to OOo's IssueZilla, there are no plans to suport SVG import
 prior to version 2.0. 
 
 There is however a fair amount of pressure to do so as SVG formats
 become more prevalent as a cross-platform vector format, especially now
 that web apps like Mozilla/Firebird are building support for it.

There is also the option of writing a driver specifically for oodraw's
format (zipped XML files, mainly). AFAICT, this is mainly a whole lot
of red tape, with the actual plotting specified in sections like

draw:polyline draw:style-name=gr6 draw:layer=layout
svg:width=6.272cm svg:height=5.269cm draw:transform=rotate
(-0.767770337952161) translate (16.15cm 9.809cm) svg:viewBox=0 0
6272 5269 draw:points=0,3261 325,6 3206,0 3755,5268 6271,3261/

I.e. it doesn't look impossible, but might require a bit of stamina...

In principle you could also try xfig()-CGM-oodraw and maybe other
routes using fig2dev but I can't vouch for the quality.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] selecting by variable

2003-08-27 Thread Salvador Alcaraz Carrasco
On Wed, 27 Aug 2003, Petr Pikal wrote:

 Hallo

 On 27 Aug 2003 at 1:49, Eugene Salinas wrote:

  Hi,
 
  I'm a recent R convert so I haven't quite figured out
  the details yet...
 
  How do I select one variable by another one? Ie if I
  want to draw the histogram of variable X only for
  those individuals that also have a value Y in a
  certain range?
 
  In STATA I would give something like:
 
  histogram X if ((Y=A  Y=B))

 hist(X[(Y=A)(Y=B)])

 if A and B are objects storing your limits

 ?Logic
 ?[

 
  (The data is for individuals and each individual has a
  number of characteristics including X and Y).
 
  thanks, eugene.
 
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help

 Cheers
 Petr Pikal
 [EMAIL PROTECTED]

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
Salvador Alcaraz Carrasco  http://www.umh.es
Arquitectura y Tecnología de Computadores  http://obelix.umh.es
Dpto. Física y Arquitectura de Computadores[EMAIL PROTECTED]
Universidad Miguel Hernández   [EMAIL PROTECTED]
Avda. del ferrocarril, s/n Telf. +34 96 665 8495
Elche, Alicante (Spain)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] How to test a model with two unkown constants

2003-08-27 Thread Sven Garbade
Hi all,

suppose I've got a vector y with some data (from a repeated measure
design) observed given the conditions in f1 and f2. I've got a model
with two unknown fix constants a and b which tries to predict y with
respect to the values in f1 and f2. Here is an exsample

# data
y - c(runif(10, -1,0), runif(10,0,1))
# f1
f1 - rep(c(-1.4, 1.4), rep(10,2))
# f2
f2 - rep(c(-.5, .5), rep(10,2))

Suppose my simple model looks like

y = a/f1 + b*f2

Is there a function in R which can compute the estimates for a and b?
And is it possible to test the model, eg how good the fits of the
model are?

Thanks, Sven

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] how to calculate Rsquare

2003-08-27 Thread Frank E Harrell Jr
On Wed, 27 Aug 2003 11:04:21 -0300
Ronaldo Reis Jr. [EMAIL PROTECTED] wrote:

 Can anybody send these articles for me? 
 
NagelKerke, N. J. D. (1991) A note on a general definition of the
  coefficient of determination, Biometrika 78: 691-2.

The fitting functions lrm, psm, cph in the Design package compute Nagelkerke's 
measures.  -F Harrell

 
Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of
  determination for binary responses, The American Statistician 46:  1-4.
 
 
 Thanks
 Ronaldo
 -- 
 Of __course it's the murder weapon.  Who would frame someone with a 
 fake?
 --
 |   // | \\   [***]
 |   ( õ   õ )  [Ronaldo Reis Júnior]
 |  V  [UFV/DBA-Entomologia]
 |/ \   [36571-000 Viçosa - MG  ]
 |  /(.''`.)\  [Fone: 31-3899-2532 ]
 |  /(: :'  :)\ [EMAIL PROTECTED]]
 |/ (`. `'` ) \[ICQ#: 5692561 | LinuxUser#: 205366 ]
 |( `-  )   [***]
 |  _/   \_Powered by GNU/Debian Woody/Sarge
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


---
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Basic GLM: residuals definition

2003-08-27 Thread Martin Hoyle
Dear R Users,

I suppose this is a school boy question, but here it is anyway. I'm trying to 
re-create the residuals for a poisson GLM with simulated data;

x-rpois(1000,5)
model-glm(x~1,poisson)
my.resids-(log(x)- summary(model)$coefficients[1])
plot(my.resids,residuals(model))

This shows that my calculated residuals (my.resids) are not the same as 
residuals(model).
p 65 of Annette Dobson's book says that GLM (unstandardised) residuals are calculated 
by analogy with the Normal case.
So where am I going wrong?

Thanks for your attention.

Martin.


Martin Hoyle,
School of Life and Environmental Sciences,
University of Nottingham,
University Park,
Nottingham,
NG7 2RD,
UK
Webpage: http://myprofile.cos.com/martinhoyle

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] How to test a model with two unkown constants

2003-08-27 Thread Prof Brian Ripley
That's the linear model lm(y ~ I(1/f1) + f2), so yes, yes and
fuller answers can be found in most of the books and guides mentioned in 
R's FAQ.

Note that how `good' the fit is will have to be relative, unless you
really can assume a uniform error with range 1, when you could do a 
maximum-likelihood fit (and watch out for the non-standard distribution 
theory).

On 27 Aug 2003, Sven Garbade wrote:

 Hi all,
 
 suppose I've got a vector y with some data (from a repeated measure
 design) observed given the conditions in f1 and f2. I've got a model
 with two unknown fix constants a and b which tries to predict y with
 respect to the values in f1 and f2. Here is an exsample
 
 # data
 y - c(runif(10, -1,0), runif(10,0,1))
 # f1
 f1 - rep(c(-1.4, 1.4), rep(10,2))
 # f2
 f2 - rep(c(-.5, .5), rep(10,2))
 
 Suppose my simple model looks like
 
 y = a/f1 + b*f2

 Is there a function in R which can compute the estimates for a and b?
 And is it possible to test the model, eg how good the fits of the
 model are?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] How to test a model with two unkown constants

2003-08-27 Thread Peter Dalgaard BSA
Sven Garbade [EMAIL PROTECTED] writes:

 Hi all,
 
 suppose I've got a vector y with some data (from a repeated measure
 design) observed given the conditions in f1 and f2. I've got a model
 with two unknown fix constants a and b which tries to predict y with
 respect to the values in f1 and f2. Here is an exsample
 
 # data
 y - c(runif(10, -1,0), runif(10,0,1))
 # f1
 f1 - rep(c(-1.4, 1.4), rep(10,2))
 # f2
 f2 - rep(c(-.5, .5), rep(10,2))
 
 Suppose my simple model looks like
 
 y = a/f1 + b*f2
 
 Is there a function in R which can compute the estimates for a and b?
 And is it possible to test the model, eg how good the fits of the
 model are?

f2 and 1/f1 are exactly collinear, so no, not in R, nor any other way.

Apart from that, the model is linear in a and b so lm() can fit it
(with different f1 and f2) if you're not too squeamish about the error
distribution.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] how to calculate Rsquare

2003-08-27 Thread Peter Dalgaard BSA
Spencer Graves [EMAIL PROTECTED] writes:

 The Battelle Institute surely should have access to a library with
 such popular and prestigious journals as Biometrika and The American
 Statistians.  If you don't have time for that, you surely should have
 money to purchase a copy from, e.g., www.lindahall.org/docserv.

Battelle is not the issue, Entomology Dept. at Univ.Fed.de Viçosa is.
That is presumably a somewhat poorer place. Still, you (Ronaldo)
should check whether there is JSTOR access from somewhere around you,
as I'm sure those recipients of r-help who have it will be unsure of
what licences they might break by sending you free copies. And David's
right: This is outside the scope of r-help.

  From: Ronaldo Reis Jr. [mailto:[EMAIL PROTECTED] Sent:
  Wednesday, August 27, 2003 10:04 AM
  To: R-Help
  Subject: Re: [R] how to calculate Rsquare
  Can anybody send these articles for me?
   NagelKerke, N. J. D. (1991) A note on a general definition of the
 
 coefficient of determination, Biometrika 78: 691-2.
 
   Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of
 
  determination for binary responses, The American Statistician 46:
  1-4.


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] read.spss (package foreign) and character columns

2003-08-27 Thread RINNER Heinrich
Dear R users!

I am using R Version 1.7.1, Windows XP, package foreign (Version: 0.6-1),
SPSS 11.5.1.

There is one thing I noticed with read.spss, and I'd like to ask if this
is considered to be a feature, or possibly a bug:
When reading character columns, character strings seem to get filled with
blanks at the end.

Simple example:
In SPSS, create a file with one variable called xchar of type A5
(character of length 5), and  3 values (a, ab, abcde), save it as
test.sav.

In R:
 library(foreign)
 test - read.spss(test.sav, to.data.frame=T)
 test
  XCHAR
1 a
2 ab   
3 abcde
 levels(test$XCHAR)
[1] a ababcde

Shouldn't it rather be a ab abcde (no blanks)?

-Heinrich.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] how to calculate Rsquare

2003-08-27 Thread Spencer Graves
Please excuse my reference to Batelle:  I confused who was asking and 
who answering the question.  Thanks, Peter, for the clarification and 
for the alternative suggestions.

Spencer Graves

Peter Dalgaard BSA wrote:
Spencer Graves [EMAIL PROTECTED] writes:


The Battelle Institute surely should have access to a library with
such popular and prestigious journals as Biometrika and The American
Statistians.  If you don't have time for that, you surely should have
money to purchase a copy from, e.g., www.lindahall.org/docserv.


Battelle is not the issue, Entomology Dept. at Univ.Fed.de Viçosa is.
That is presumably a somewhat poorer place. Still, you (Ronaldo)
should check whether there is JSTOR access from somewhere around you,
as I'm sure those recipients of r-help who have it will be unsure of
what licences they might break by sending you free copies. And David's
right: This is outside the scope of r-help.

From: Ronaldo Reis Jr. [mailto:[EMAIL PROTECTED] Sent:
Wednesday, August 27, 2003 10:04 AM
To: R-Help
Subject: Re: [R] how to calculate Rsquare
Can anybody send these articles for me?
	  NagelKerke, N. J. D. (1991) A note on a general definition of the

coefficient of determination, Biometrika 78: 691-2.

	  Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of

determination for binary responses, The American Statistician 46:
1-4.



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Basic GLM: residuals definition

2003-08-27 Thread Hotz, T.
As ?residuals.glm reveals, it's got an argument type:

type: the type of residuals which should be returned. The
  alternatives are: `deviance' (default), `pearson',
  `working', `response', and `partial'.

You calculated response residuals, R gives deviance residuals 
by default. The different types are covered by most books on general
linear models.

HTH

Thomas


 -Original Message-
 From: Martin Hoyle [mailto:[EMAIL PROTECTED]
 Sent: 27 August 2003 17:01
 To: [EMAIL PROTECTED]
 Subject: [R] Basic GLM: residuals definition
 
 
 Dear R Users,
 
 I suppose this is a school boy question, but here it is 
 anyway. I'm trying to re-create the residuals for a poisson 
 GLM with simulated data;
 
 x-rpois(1000,5)
 model-glm(x~1,poisson)
 my.resids-(log(x)- summary(model)$coefficients[1])
 plot(my.resids,residuals(model))
 
 This shows that my calculated residuals (my.resids) are not 
 the same as residuals(model).
 p 65 of Annette Dobson's book says that GLM (unstandardised) 
 residuals are calculated by analogy with the Normal case.
 So where am I going wrong?
 
 Thanks for your attention.
 
 Martin.
 
 
 Martin Hoyle,
 School of Life and Environmental Sciences,
 University of Nottingham,
 University Park,
 Nottingham,
 NG7 2RD,
 UK
 Webpage: http://myprofile.cos.com/martinhoyle
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

---

Thomas Hotz
Research Associate in Medical Statistics
University of Leicester
United Kingdom

Department of Epidemiology and Public Health
22-28 Princess Road West
Leicester
LE1 6TP
Tel +44 116 252-5410
Fax +44 116 252-5423

Division of Medicine for the Elderly
Department of Medicine
The Glenfield Hospital
Leicester
LE3 9QP
Tel +44 116 256-3643
Fax +44 116 232-2976

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Basic GLM: residuals definition

2003-08-27 Thread Prof Brian Ripley
On Wed, 27 Aug 2003, Martin Hoyle wrote:

 Dear R Users,
 
 I suppose this is a school boy question, but here it is anyway. I'm trying to 
 re-create the residuals for a poisson GLM with simulated data;
 
 x-rpois(1000,5)
 model-glm(x~1,poisson)
 my.resids-(log(x)- summary(model)$coefficients[1])
 plot(my.resids,residuals(model))
 
 This shows that my calculated residuals (my.resids) are not the same as 
 residuals(model).
 p 65 of Annette Dobson's book says that GLM (unstandardised) residuals are 
 calculated by analogy with the Normal case.
 So where am I going wrong?

Not reading the help page.  Hint: what is the default for the type 
argument for the glm method for residual?

A much better reference for this is

Davison, A.~C. and Snell, E.~J. (1991) Residuals and diagnostics.
\newblock Chapter~4 of \cite{Hinkley.ZZ.91}.

Hinkley, D.~V., Reid, N. and Snell, E.~J. eds (1991) \emph{Statistical Theory
  and Modelling. In Honour of Sir David Cox, {FRS}}.
\newblock London: Chapman \ Hall.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] how to calculate Rsquare

2003-08-27 Thread Spencer Graves
Hi, Ronaldo:

	  Have you talked with anyone in the Math Department in the Univ.Fed.de 
Viçosa?  They offer courses in Statistics there, and I would expect that 
someone there could help you get copies of the articles of interest.  I 
wonder if such contacts might help you with other statistics-related 
issues as well.

hope this helps.
Spencer Graves
Peter Dalgaard BSA wrote:
Spencer Graves [EMAIL PROTECTED] writes:


The Battelle Institute surely should have access to a library with
such popular and prestigious journals as Biometrika and The American
Statistians.  If you don't have time for that, you surely should have
money to purchase a copy from, e.g., www.lindahall.org/docserv.


Battelle is not the issue, Entomology Dept. at Univ.Fed.de Viçosa is.
That is presumably a somewhat poorer place. Still, you (Ronaldo)
should check whether there is JSTOR access from somewhere around you,
as I'm sure those recipients of r-help who have it will be unsure of
what licences they might break by sending you free copies. And David's
right: This is outside the scope of r-help.

From: Ronaldo Reis Jr. [mailto:[EMAIL PROTECTED] Sent:
Wednesday, August 27, 2003 10:04 AM
To: R-Help
Subject: Re: [R] how to calculate Rsquare
Can anybody send these articles for me?
	  NagelKerke, N. J. D. (1991) A note on a general definition of the

coefficient of determination, Biometrika 78: 691-2.

	  Cox, D. R. and Wermuth, N. (1992) A comment on the coefficient of

determination for binary responses, The American Statistician 46:
1-4.



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] read.spss (package foreign) and character columns

2003-08-27 Thread Prof Brian Ripley
On Wed, 27 Aug 2003, RINNER Heinrich wrote:

 Dear R users!
 
 I am using R Version 1.7.1, Windows XP, package foreign (Version: 0.6-1),
 SPSS 11.5.1.
 
 There is one thing I noticed with read.spss, and I'd like to ask if this
 is considered to be a feature, or possibly a bug:
 When reading character columns, character strings seem to get filled with
 blanks at the end.
 
 Simple example:
 In SPSS, create a file with one variable called xchar of type A5
 (character of length 5), and  3 values (a, ab, abcde), save it as
 test.sav.
 
 In R:
  library(foreign)
  test - read.spss(test.sav, to.data.frame=T)
  test
   XCHAR
 1 a
 2 ab   
 3 abcde
  levels(test$XCHAR)
 [1] a ababcde
 
 Shouldn't it rather be a ab abcde (no blanks)?

You said it was a character string of length 5, not =5.

It's easy to strip trailing blanks (?sub has several ways).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] read.spss (package foreign) and character columns

2003-08-27 Thread Douglas Bates
RINNER Heinrich [EMAIL PROTECTED] writes:

 Dear R users!
 
 I am using R Version 1.7.1, Windows XP, package foreign (Version: 0.6-1),
 SPSS 11.5.1.
 
 There is one thing I noticed with read.spss, and I'd like to ask if this
 is considered to be a feature, or possibly a bug:
 When reading character columns, character strings seem to get filled with
 blanks at the end.
 
 Simple example:
 In SPSS, create a file with one variable called xchar of type A5
 (character of length 5), and  3 values (a, ab, abcde), save it as
 test.sav.
 
 In R:
  library(foreign)
  test - read.spss(test.sav, to.data.frame=T)
  test
   XCHAR
 1 a
 2 ab   
 3 abcde
  levels(test$XCHAR)
 [1] a ababcde
 
 Shouldn't it rather be a ab abcde (no blanks)?

I believe they are being saved as fixed length strings in the SPSS
file and R is just reading what it was given.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] discriminant function

2003-08-27 Thread Stefan Böhringer
Thank you all for the quick responses.
However, I'm not sure I unterstand the scaling matrix (denote S
henceforth) correcty. An observation x will be transformed by Sx into a
new vector space with the properties given by the description. What is
now the direction perpendicular to the seperating plane as estimated in
the process of the lda? That direction is what im primarily interested
in. When plotting a lda object I see diagrams with observations when the
plane (here the line) of separation is chosen canonically to be {ax | a
\in R}.

Thanks, best wishes,

Stefan


On Tue, 2003-08-26 at 15:59, Torsten Hothorn wrote: 
 On 26 Aug 2003, Stefan [ISO-8859-1] Böhringer wrote:
 
  How can I extract the linear discriminant functions resulting from a LDA
  analysis?
  
  The coefficients are listed as a result from the analysis but I have not
  found a way to extract these programmatically. No refrences in the
  archives were found.
 
 ?lda tells you about the object returned by `lda', especially it element:
 
 scaling: a matrix which transforms observations to discriminant
   functions, normalized so that within groups covariance matrix
   is spherical.
 
 Torsten
 
  
  Thank you very much,
  
  Stefan
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Re: diamond graphs

2003-08-27 Thread Alvaro Muñoz
Drs. Harrell and O'Keefe,



Thank you for your suggestions.



Regarding your comments about the content of the paper, I respectfully
disagree that categorizing continuous variables is a fundamental violation
of statistical graphics, nor are you to assume that all categorizations are
arbitrary. In any case, the discussion section of our paper contains text
acknowledging that contour plots are a preferred option when the continuity
of variables is desired to be preserved. The hexagons we proposed seem, at
first glance, to be unnecessarily complex but they fulfill properties that
none of the other considered alternatives do (Table 1 and Figure 1 in paper
and Figure 6 using Trellis).



It is unfortunate that the comments from Dr. O'Keefe were based on a press
release and not on the manuscript itself. I apologize for the press release
implying no graphical progress in the 20th century. Many of his points are
addressed in the manuscript. Regarding the extension of the methods to
outcomes taking negative values (e.g., changes in markers), the use of two
colors is an alternative but the plotting of 0.5*[1+(outcome/max(|outcome|)]
and using the option E of Figure 1 in the paper will result in negative and
positive values having opposite topology (much as the contrast of
negative/positive bars in the unidimensional case). I will be happy to
expedite a reprint to Dr. O'Keefe. If you so desire, please email the
address to which it should be sent.



Although it is at odds with your beliefs, University staff working on
licensing and technology transfer believe that a patent may be a vehicle to
achieve a wide use. The audience of the proposed methods would be the end
users who are not sophisticated programmers and, therefore, the hope is that
it would be available in widely used software which is not the case of the
high end software (e.g., R). The proposed graph of 2D equiponderant display
of two predictors is just a display procedure, not an inferential tool. The
sophisticated analyst has little or no need for the proposed method. It does
overcome the pitfalls of 3D bar graphs and, therefore, has the potential of
improving the way we communicate our findings. Needless to say, were the
predictions of Dr. Harrell to be on target, we will change course as the
staff working on the licensing have planned from the start.



We will be happy to share the code we wrote to produce the figures in The
American Statistician paper with individuals wanting to use the software for
academic purposes. Please send request for it to [EMAIL PROTECTED]



In summary, our idea is a simple one (one that I refer as needing only 8th
grade geometry) and it is its simplicity which has been fun to peruse.



Alvaro Muñoz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Re: diamond graphs

2003-08-27 Thread Frank E Harrell Jr
On Wed, 27 Aug 2003 13:40:59 -0400
Alvaro Muñoz [EMAIL PROTECTED] wrote:

 Drs. Harrell and O'Keefe,
 
 
 
 Thank you for your suggestions.
 
 
 
 Regarding your comments about the content of the paper, I respectfully
 disagree that categorizing continuous variables is a fundamental violation
 of statistical graphics, nor are you to assume that all categorizations are
 arbitrary. In any case, the discussion section of our paper contains text
 acknowledging that contour plots are a preferred option when the continuity
 of variables is desired to be preserved. The hexagons we proposed seem, at
 first glance, to be unnecessarily complex but they fulfill properties that
 none of the other considered alternatives do (Table 1 and Figure 1 in paper
 and Figure 6 using Trellis).

I appreciate your reply Dr Munoz.  I will have to disagree with you about the above 
although I think you made some good points.  I have seen many, many examples where 
categorization results in low-precision estimates and slight changes in the bins 
results in a significantly different landscape.  I have also seen many examples in 
epidemiology where stratified estimates have been misinterpreted.

Even though thermometer and similar plots have defects that you mentioned in your 
paper, they much more intuitively and precisely map values into the human brain.  The 
same is true of Cleveland's dot plots although one has to be careful, as you said in 
your article, about the ordering of stratifiers.

 
 
 
 It is unfortunate that the comments from Dr. O'Keefe were based on a press
 release and not on the manuscript itself. I apologize for the press release
 implying no graphical progress in the 20th century. Many of his points are
 addressed in the manuscript. Regarding the extension of the methods to
 outcomes taking negative values (e.g., changes in markers), the use of two
 colors is an alternative but the plotting of 0.5*[1+(outcome/max(|outcome|)]
 and using the option E of Figure 1 in the paper will result in negative and
 positive values having opposite topology (much as the contrast of
 negative/positive bars in the unidimensional case). I will be happy to
 expedite a reprint to Dr. O'Keefe. If you so desire, please email the
 address to which it should be sent.
 
 
 
 Although it is at odds with your beliefs, University staff working on
 licensing and technology transfer believe that a patent may be a vehicle to
 achieve a wide use. The audience of the proposed methods would be the end
 users who are not sophisticated programmers and, therefore, the hope is that
 it would be available in widely used software which is not the case of the
 high end software (e.g., R). The proposed graph of 2D equiponderant display
 of two predictors is just a display procedure, not an inferential tool. The
 sophisticated analyst has little or no need for the proposed method. It does
 overcome the pitfalls of 3D bar graphs and, therefore, has the potential of
 improving the way we communicate our findings. Needless to say, were the
 predictions of Dr. Harrell to be on target, we will change course as the
 staff working on the licensing have planned from the start.

Their belief that a patent on an idea may help achieve a wide use is sadly mistaken 
and is almost comical.  The statement it would be available in widely used software 
which is not the case of the high end software is very difficult to comprehend 
(especially in view of easy to use GUIs such as Rcmdr now available for R, as well as 
web interfaces).  There are several books I could recommend to your university staff.


 
 
 
 We will be happy to share the code we wrote to produce the figures in The
 American Statistician paper with individuals wanting to use the software for
 academic purposes. Please send request for it to [EMAIL PROTECTED]

Unfortunately, I think that once the patent announcement was made, the number of 
individuals interested in the method lessened considerably.

 
 
 
 In summary, our idea is a simple one (one that I refer as needing only 8th
 grade geometry) and it is its simplicity which has been fun to peruse.
 
 
 
 Alvaro Muñoz

Again I do thank you for your note.

Sincerely,

Frank Harrell

---
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] RMySQL crashing R

2003-08-27 Thread David James
Hi,

There have been a number of reports of RMySQL crashing R when
attempting to connect to a MySQL server using dbConnect().
The problem appears to be in some binary versions of the MySQL
client library.  Known instances include
  (1) Red Hat MySQL binary RPM client library 3.23.32, but 
  updating to 3.23.56 solved the problem.
  (2) Debian MySQL binary client library 3.23.49, but updating 
  to 3.23.56 solved the problem.
Moreover, the change logs in Appendix D of the MySQL manual
(www.mysql.com) indicate that two bugs consistent with the crashes
we've seen were fixed in 3.23.50, namely, a buffer overflow problem
when reading startup parameters and a memory allocation bug in
the glibc library used to build Linux binaries.

If you experience this problem, could you let me know the version
information (R, MySQL, and the operating system)?  Also, I'd like to
know if updating the MySQL client library and re-installing RMySQL
fix your problem.

Thanks,

-- 
David 

PS Thanks to Deepayan Sarkar, John Heuer, and Matthew Kelly for helping
   me track this problem.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Minard's Challenge: Re-Visioning Minard Contest

2003-08-27 Thread Michael Friendly
In a recent talk ('Visions of the Past, Present  Future of Statistical 
Graphics'),
I talked about, among other things, the lessons Minard's March on Moscow 
graphic had
for modern statistical graphics, and illustrated aspects of power and 
simplicity
in several programming languages where this graphic had been recreated.
I referred to 'elegance factors' of various programming languages in 
terms of
the power, simplicity and transparency of data representations and 
procedural
or declarative specifications required to program a re-creation (or 
extension)
of this famous graph.

It occurred to me that it might be of interest, perhaps fun, and 
hopefully illuminating
to pose this as a formal challenge to the R community and others.

Several exisiting exemplars are shown on my 'Re-visions of Minard' web page
http://www.math.yorku.ca/SCS/Gallery/re-minard.html
(in the Gallery of Data Visualization, ../)
These include programming examples in Mathematica, SAS/IML Workshop, 
Wilkinson's
Grammar of Graphics, images created in other data visualization systems, 
raw materials
(images, data), etc.

There are no formal rules for this Re-Visioning Minard Contest, but 
each entry should ideally include:
(a) an image file in web-friendly format (.jpg, .gif, .png, etc),
(b) the program and data used to draw the image,
(c) a 'what they were thinking' description of the process used in
constructing the graph.

To save bandwidth on r-help, I'll ask responders to reply to the list 
only with reactions to this
challenge and what they deem useful to share with all readers.  Other 
ways to reply include
posting a web URL where readers can view the details or a direct email 
reply to me.

--
Michael Friendly Email: [EMAIL PROTECTED] 
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R on Linux/Opteron?

2003-08-27 Thread Luke Tierney
On 26 Aug 2003, Peter Dalgaard BSA wrote:

 Dirk Eddelbuettel [EMAIL PROTECTED] writes:
 
  On Tue, Aug 26, 2003 at 03:17:19PM -0400, Liaw, Andy wrote:
   Has anyone tried using R on the the AMD Opteron in either 64- or 32-bit
   mode?  If so, any good/bad experiences, comments, etc?  We are considering
   getting this hardware, and would like to know if R can run smoothly on such
   a beast.  Any comment much appreciated.
  
  http://buildd.debian.org/build.php?pkg=r-basearch=ia64file=log
  
  has logs of R builds on ia64 since Nov 2001, incl. the outcome of make
  check. We do not run the torture tests -- though I guess we could on some of
  the beefier hardware such as ia64. 
 
 I don't think that's quite the same beast, though. Opterons are the
 x86-64 (or amd64) architecture and ia64 is Intel's, aka Itanium.
 Debian appears to be just warming up to including this architecture:
 http://lists.debian.org/debian-x86-64/2003/debian-x86-64-200308/threads.html
 whereas they have had ia64 out for a while.
 
 SuSE has an Opteron option and Luke said he tried it. Apparently it
 has a functioning 64-bit compiler toolchain - I weren't sure earlier
 whether they were just running a 64bit kernel and 32bit applications,
 but when Luke says so, I believe it...
 

I wasn't sure either, especially about default settings, but 'file' says

luke/R file bin/R.bin 
bin/R.bin: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), dynamically linked 
(uses shared libs), not stripped

and in R

 Sys.info()[machine]
 machine 
x86_64 
 .Machine$sizeof.pointer
[1] 8

So it looks like a functional 64-bit setup so far.

luke

-- 
Luke Tierney
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] R on Linux/Opteron?

2003-08-27 Thread Liaw, Andy
Thanks to all (and especially Prof. Tierney) for the response.  The box we
are considering will spend probably over 90% of CPU time in R, so it's
comforting to know that R compiles and pass all the test (at least once) on
such platform.

(I switched my attention from Itanium to Opteron when I read that Itanium is
slower than P4 for number crunching...)

Best,
Andy

 From: Luke Tierney [mailto:[EMAIL PROTECTED] 
 
 On 26 Aug 2003, Peter Dalgaard BSA wrote:
 
  Dirk Eddelbuettel [EMAIL PROTECTED] writes:
  
   On Tue, Aug 26, 2003 at 03:17:19PM -0400, Liaw, Andy wrote:
Has anyone tried using R on the the AMD Opteron in 
 either 64- or 
32-bit mode?  If so, any good/bad experiences, 
 comments, etc?  We 
are considering getting this hardware, and would like 
 to know if R 
can run smoothly on such a beast.  Any comment much appreciated.
   
   http://buildd.debian.org/build.php?pkg=r-basearch=ia64file=log
   
   has logs of R builds on ia64 since Nov 2001, incl. the outcome of 
   make check. We do not run the torture tests -- though I guess we 
   could on some of the beefier hardware such as ia64.
  
  I don't think that's quite the same beast, though. Opterons are the 
  x86-64 (or amd64) architecture and ia64 is Intel's, aka Itanium. 
  Debian appears to be just warming up to including this 
 architecture: 
  
 http://lists.debian.org/debian-x86- 64/2003/debian-x86-64-200308/thread
  s.html
  whereas they have had ia64 out for a while.
  
  SuSE has an Opteron option and Luke said he tried it. Apparently it 
  has a functioning 64-bit compiler toolchain - I weren't 
 sure earlier 
  whether they were just running a 64bit kernel and 32bit 
 applications, 
  but when Luke says so, I believe it...
  
 
 I wasn't sure either, especially about default settings, but 
 'file' says
 
 luke/R file bin/R.bin 
 bin/R.bin: ELF 64-bit LSB executable, AMD x86-64, version 1 
 (SYSV), dynamically linked (uses shared libs), not stripped
 
 and in R
 
  Sys.info()[machine]
  machine 
 x86_64 
  .Machine$sizeof.pointer
 [1] 8
 
 So it looks like a functional 64-bit setup so far.
 
 luke
 
 -- 
 Luke Tierney
 University of Iowa  Phone: 319-335-3386
 Department of Statistics andFax:   319-335-3017
Actuarial Science
 241 Schaeffer Hall  email:  [EMAIL PROTECTED]
 Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
 
 

--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] testing if two multivariate samples are from the samedistribution

2003-08-27 Thread Jason Liao
Hello, everyone! I wonder if any R package can do the multivariate
Smirnov test. Specifically, let x_1,..,x_n and y_1,...,y_m be
multivariate vectors. I would like to test if the two samples are from
the same underlying multivariate distribution. Thanks in advance.

Jason


=
Jason G. Liao, Ph.D.
Division of Biometrics
University of Medicine and Dentistry of New Jersey
335 George Street, Suite 2200
New Brunswick, NJ 08903-2688
phone (732) 235-8611, or (732)-235-5429
http://www.geocities.com/jg_liao

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Re: diamond graphs

2003-08-27 Thread Ross Ihaka
Alvaro Muñoz wrote:
Drs. Harrell and O'Keefe,

Although it is at odds with your beliefs, University staff working on
licensing and technology transfer believe that a patent may be a vehicle to
achieve a wide use. The audience of the proposed methods would be the end
users who are not sophisticated programmers and, therefore, the hope is that
it would be available in widely used software which is not the case of the
high end software (e.g., R). The proposed graph of 2D equiponderant display
of two predictors is just a display procedure, not an inferential tool. The
sophisticated analyst has little or no need for the proposed method. It does
overcome the pitfalls of 3D bar graphs and, therefore, has the potential of
improving the way we communicate our findings. Needless to say, were the
predictions of Dr. Harrell to be on target, we will change course as the
staff working on the licensing have planned from the start.
Perhaps I can add some personal experience, as opposed to belief.
After Robert Gentleman and I had made some initial progress in 
implementing R, we had to make some decisions about what we would do 
with it.  We looked at a number of options ranging from something 
commercial to free software.  After some research, personal 
introspection and prompting from others (hi Martin :-) we decided to 
release under GPL.

For me personally this turned out to be far harder than I thought it 
would be.  My institution has a particularly diabolical policy on 
intellectual property, especially on software.  While we could have 
quietly released the software and just said oops later on, I chose to 
get approval for free release of my work.  This took a number of years, 
several threats of resignation and a couple of salary cuts.

The reason I mention this is not as a part of a personal campaign for 
sainthood, but rather because it has utimately turned out to have been 
far more than worth the effort.  The effect of making R free has been 
see it picked up and vastly improved and extended by a very talented 
group of researchers.  We've now reached a point which Robert and I and 
other early R adopters and contributors couldn't have anticipated in our 
wildest imaginings. It's truly amazing to see this software being used 
for all sorts of cool things.  What we are seeing represents the best of 
what being an academic is all about - the free exchange of ideas with 
researchers collaborating and building on each other's work.

On the other hand, I'm currently writing what will possibly become a 
book on visualization and graphics (publication mechanism uncertain). 
The techniques discussed in the book are implemented in a certain dialog 
of a particular computer language developed at Bell Labs.  I intend to 
include code libraries for all the graphical techniques discussed.  The 
fact that you have sought to patent your idea means that, whatever its 
merits, it's pointless for me to even mention it because I can't 
distribute code for it.

I'm sure the licensing gnomes at your institution have expounded on how 
patenting will help achieve wider use, but in reality they are simply 
thinking revenue stream.   The likely real effect of of constraining 
access to your work in this way will be to have it sink into obscurity. 
Take it from one who's been there, the payoff from free dissemination is 
much higher.

--
Ross Ihaka Email:  [EMAIL PROTECTED]
Department of Statistics   Phone:  (64-9) 373-7599 x 85054
University of Auckland Fax:(64-9) 373-7018
Private Bag 92019, Auckland
New Zealand
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Newbie graphing questions

2003-08-27 Thread Francisco J. Bido
Hi everyone.  R is new to me and I'm very impressed with its 
capabilities but still cannot figure out how to do some basic things.  
There seems to be no lack of documentation but finding what I need has 
proven difficult.  Perhaps you can help.

Here's what I'm after:

1.  How do I create a new plot without erasing the prior one i.e., have 
a new window pop up with the new graph? I'm on MacOSX using the Carbon 
port.

2.  How do I pause between plot renderings i.e., in such a way that it 
will draw the subsequent graph after pressing the space bar (or any 
other key).

3.  Illustrating critical regions.  Say I wanted to illustrate the 
critical region of a standard normal.  I would need to draw a vertical 
line from the critical point to the curve and then shade the critical 
region.  How do I do this in R?

Thanks!
-Francisco
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help