Re: [R] The hidden costs of GPL software? - None

2004-11-24 Thread Uwe Ligges
Martin,
what about setting up a new mailing list R-hcgs?
(acronym for R - The hidden costs of GPL software?)
Seems to be worth given the amount of messages in this thread(s). ;-)
Uwe
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] T-test syntax question

2004-11-24 Thread Vito Ricci
Hi,

You did not specify if data are paired or not, as data
are paired you should use option paired=TRUE in
t.test(). Variances of the two samples have to be not
significatevely different, (see ? var.test) to use
t.test, if not you should specify var.equal=FALSE.

var.equal: a logical variable indicating whether to
treat the two ariances as being equal. If 'TRUE' then
the pooled variance is used to estimate the variance
otherwise the Welch (or Satterthwaite) approximation
to the degrees of freedom is used.

paired: a logical indicating whether you want a paired
t-test.

If 'paired' is 'TRUE' then both 'x' and 'y' must be
specified and they must be the same length.  Missing
values are removed (in pairs if 'paired' is 'TRUE'). 
If 'var.equal' is 'TRUE' then the pooled estimate of
the variance is used.  By default, if 'var.equal' is
'FALSE' then the variance is estimated separately for
both groups and the Welch modification to the degrees
of freedom is used.
 
From the output of your test you're sending I
understand that variances of the two samples are
significatively different (Welch Two Sample t-test)
and delta values are also significatively different
from 0.

See ? t.test

Regards
Vito


You wrote:

Hi. 

I'd like to do a t-test to compare the Delta values of
items with Crit=1 with Delta values of items with
Crit=0. What is the t.test syntax?

It should produce a result like this below (I can't
get in touch with the person who originally did this
for me)

Welch Two Sample t-test

data:  t1$Delta by Crit
t = -3.4105, df = 8.674, p-value = 0.008173
alternative hypothesis: true
difference in means is not equal to 0
95 percent confidence interval:
 -0.04506155 -0.00899827
sample estimates:
mean in group FALSE  mean in group TRUE 
 0.03331391  0.06034382 

Thanks.

=
Diventare costruttori di soluzioni
Became solutions' constructors

The business of the statistician is to catalyze 
the scientific learning process.  
George E. P. Box


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese  http://www.modugno.it/archivio/palese/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Ks.test (was (no subject))

2004-11-24 Thread Vito Ricci
Hi Angela,

I believe you should introduce only df as parameters;
t distribution as by default mean=0; see this example.

 x-rt(100,10)
 ks.test(x, pt,10)

One-sample Kolmogorov-Smirnov test

data:  x 
D = 0.1414, p-value = 0.03671
alternative hypothesis: two.sided 

Ciao
Vito

you wrote:

Good morning,
I have to apply the Ks test with the the t
distribution.
I know I have to write
ks.test(data_name,distribution_name, parameters..)
but I don't know what is the name fot t distribution
and which parameters
to introduce? may be mean=0 and freedom degrees in my
case?
Thank you for helping me.
Angela Re




=
Diventare costruttori di soluzioni
Became solutions' constructors

The business of the statistician is to catalyze 
the scientific learning process.  
George E. P. Box


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese  http://www.modugno.it/archivio/palese/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] (no subject)

2004-11-24 Thread Uwe Ligges
[EMAIL PROTECTED] wrote:
Good morning,
I have to apply the Ks test with the the t distribution.
I know I have to write ks.test(data_name,distribution_name, parameters..)
but I don't know what is the name fot t distribution and which parameters
to introduce? may be mean=0 and freedom degrees in my case?
For example:
  ks.test(x, pt, df = 4)
See ?ks.test and ?pt
Uwe Ligges

Thank you for helping me.
Angela Re
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Is there a package in R that lets me fit a robust linear mixed model?

2004-11-24 Thread Frank Duan
Dear R people,

Happy Thanksgiving! 

I just wonder if there is a R package that can supply some kind of
robust way to fit a linear mixed model. I mean assigning small
weights to those observations with large residuals, like
iteratively-reweighted-least-squares approach.

Many thanks,

Frank

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] The hidden costs of GPL software? - None

2004-11-24 Thread Ted Harding
On 24-Nov-04 John wrote:
 Off hand, the costs of GPL'd software are not hidden at all.
 R for instance demands that a would be user sit down and
 learn the language. This in turn pushes a user into learning
 more about statistics than the simple overview that Stat 1
 presents a student.

I'd see this as less a cost than a benefit!

 In contrast, any program that simplifies use also tends to
 encourage a simplified understanding.

Agreed!

 So, I believe it can be legitimately argued that the real
 hidden costs lurk in easy to use software, especially
 commeercial software with GUI interfaces.

Well put; though it's not obvious whom these costs fall on.
The people who actually use the easy to use software, or
the organisations that employ them, can all too often get
away with sloppy or invalid analysis. It may often be the
consumer of their results or of products based on them who
ultimately loses.

Best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 24-Nov-04   Time: 09:21:35
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Posting to 2 mailing lists {was How to extract data?}

2004-11-24 Thread Martin Maechler
You posted the referenced message to both
R-help and R-sig-finance.

Please do *not* post to more than one R-list!
Please do *not* post to more than one R-list!
Please do *not* post to more than one R-list!
  
Please decide if something is specific for an R-SIG-foo list
or if it belongs to R-help (or R-devel) and post to one and only
one mailing list!

We had another mess recently with this by a posting to both
R-help and R-sig-gui (and I think another one where
even 3 mailing lists where affected). The whole thing is a
particular impoliteness to all those people -- often the nice
helpers! -- who are subscribed to more than one of the implied
lists.

Chosing one list, the discussion thread will be archived/seen/read
consistently both on the server archives and people's mail/news boxes.

- If the thread should be *diverted* to another list, there
  could be *one* overlap message (posting to both), 
  where the move should be announced

- If you deem it relevant, you can still alert the readers of
  one list to a hot topic on another list, e.g., by posting an
  URL to the starting message in an (online) archive.

Martin Maechler, ETH Zurich

PS: Of course, I've been tempted for a moment
to post this to all R- mailing lists  ;-) :-)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problem with anova and glmmPQL

2004-11-24 Thread Andrew R. Criswell
Hello:

I am getting an error message when appplying anova() to two equations
estimated using glmmPQL. I did look through the archives but didn't
finding anything relevant to my problem. The R-code and results follow.

Hope someone can help.

ANDREW


 fm1 - glmmPQL(choice ~ day + stereotypy,
+random = ~ 1 | bear, data = learning, family = binomial)
iteration 1
iteration 2
iteration 3
iteration 4
 fm2 - glmmPQL(choice ~ day + envir + stereotypy,
+random = ~ 1 | bear, data = learning, family = binomial)
iteration 1
iteration 2
iteration 3
iteration 4
 anova(fm1)
numDF denDF   F-value p-value
(Intercept) 1  2032   7.95709  0.0048
day 1  2032 213.98391  .0001
stereotypy  1  2032   0.42810  0.5130

 anova(fm2)
numDF denDF   F-value p-value
(Intercept) 1  2031   5.70343  0.0170
day 1  2031 213.21673  .0001
envir   1  2031  12.50388  0.0004
stereotypy  1  2031   0.27256  0.6017

 anova(fm1, fm2)
Error in anova.lme(fm1, fm2) : Objects must inherit from classes gls,
gnls lm,lmList, lme,nlme,nlsList, or nls

 version
 _
platform i586-mandrake-linux-gnu
arch i586
os   linux-gnu
system   i586, linux-gnu
status
major2
minor0.0
year 2004
month10
day  04
language R






-- 
Andrew R. Criswell, Ph.D.
Graduate School, Bangkok University

mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Christoph Lehmann
Dear all, not really a R question but:
If I want to check for the classification accuracy of a LDA with 
previous PCA for dimensionality reduction by means of the LOOCV method:

Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA 
with the CV option set to TRUE (runs LOOCV)

-- OR--
do I need
- to compute for each 'test-bag' (the n-1 observations) a PCA 
(my.princomp.1),
- then run the LDA on the test-bag scores (- my.lda.1)
- then compute the scores of the left-out-observation using 
my.princomp.1 (- my.scores.2)
- and only then use predict.lda(my.lda.1, my.scores.2) on the scores of 
the left-out-observation

?
I read some articles, where they choose procedure 1, but I am not sure, 
if this is really correct?

many thanks for a hint
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Grumble ...

2004-11-24 Thread Ted Harding
Hi Folks,

A Grumble ...

The message I just sent to R-help about The hidden costs of GPL ...
has evoked a Challenge response:

  Hi,
  You´ve just sent a message to [EMAIL PROTECTED]
  In order to confirm the sent message, please click here

  This confirmation is necessary because [EMAIL PROTECTED]
  uses Antispam UOL, a service that avoids unwanted messages like
  advertising, pornography, viruses, and spams.

  Other messages sent to [EMAIL PROTECTED] won't need to
  be confirmed*.
  *If you receive another confirmation request, please ask
  [EMAIL PROTECTED] to include you in his/her authorized
  e-mail list.

I won't be responding to this. Let the recipient simply not receive
the mail. Of no great importance in this case, but a disadvantage
to the recipient in the long run.

I disapprove strongly of this mechanism, and want to oppose it.
There must be a few thousand subscribers to R-help. If the
Challenge mechanism became widespread, then I would receive
thousands of such messages. Rather than respond to all these,
I would quit the list (and of course probably many others).
The Challenge mechanism would destroy the mailing-list community
if it became widely adopted.

One reason I am posting this grumble to R-help is in the hope
that I get a challenge to this one too. In that case, once and
for all, I shall respond, so that the recipient will see this
message and (I hope) do something about it, to eliminate the
Challenge responder (I can't find the true recipient's
email address from the Challenge).

The recipient may be able to recognise themselves from the
fact that they receive this message but not the message which
triggered the response, which began:
===
On 24-Nov-04 John wrote:
 Off hand, the costs of GPL'd software are not hidden at all.
 R for instance demands that a would be user sit down and
 learn the language. This in turn pushes a user into learning
 more about statistics than the simple overview that Stat 1
 presents a student.

I'd see this as less a cost than a benefit!
===

My apologies for bothering you with this if you didn't want to
know about it.

Best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 24-Nov-04   Time: 10:36:35
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] T-test syntax question

2004-11-24 Thread Adaikalavan Ramasamy
As Vito Ricci has already pointed out, the Welsh test is for two group
unpaired data with unequal variance assumption.

If you have the original data, say x and y, then you can simply do
t.test( x, y, paired=FALSE, var.equal=FALSE ).

If you do not have the original data, you can still calculate the
relevant statistics and p-value as long as you know the group length and
variance. 'stats:::t.test.default' shows you the code behind t-test. I
think the relevant bits are as follows

mx - 0
my - 2
mu - 0

# You will need to fill these with your observed values
vy - var(y)
vx - var(x)
ny - length(y)
ny - length(y)


stderrx - sqrt(vx/nx)
stderry - sqrt(vy/ny)
stderr  - sqrt(stderrx^2 + stderry^2)
df  - stderr^4/(stderrx^4/(nx - 1) + stderry^4/(ny - 1))
tstat   - (mx - my - mu)/stderr

# for two sided alternative
pval  - 2 * pt(-abs(tstat), df)
alpha - 1 - conf.level
cint  - qt(1 - alpha/2, df)
cint  - tstat + c(-cint, cint)
cint  - mu + cint * stderr



On Wed, 2004-11-24 at 04:28, Steve Freeman wrote:
 Hi. 
 
 I'd like to do a t-test to compare the Delta values of items with Crit=1
 with Delta values of items with Crit=0. What is the t.test syntax?
 
 It should produce a result like this below (I can't get in touch with the
 person who originally did this for me)
 
 Welch Two Sample t-test

 data:  t1$Delta by Crit
 t = -3.4105, df = 8.674, p-value = 0.008173 alternative hypothesis: true
 difference in means is not equal to 0
 95 percent confidence interval:
  -0.04506155 -0.00899827
 sample estimates:
 mean in group FALSE  mean in group TRUE 
  0.03331391  0.06034382 
 
 Thanks.
 
 
 Steven F. Freeman * Center for Organizational Dynamics * University of
 Pennsylvania * (215) 898-6967 * Fax: (215) 898-8934 * Cell: (215) 802-4680 *
 [EMAIL PROTECTED] * http://center.grad.upenn.edu/faculty/freeman.html
 
 
   [[alternative HTML version deleted]]
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
Adaikalavan Ramasamy[EMAIL PROTECTED]
Centre for Statistics in Medicine   http://www.ihs.ox.ac.uk/csm/
Cancer Research UK  Tel : 01865 226 677
Old Road Campus, Headington, Oxford Fax : 01865 226 962

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Automatic file reading

2004-11-24 Thread Anders Malmberg
Hi,
I want to do automatic reading of a number of tables (files) stored in 
ascii format
without having to specify the variable name in R each time.  Below is an 
example
of how I would like to use it (I assume files pair1,...,pair8 exist in 
spec. dire.)

for (i in 1:8){
 name - paste(pair,i,sep=)
 ? ? ? - read.table(paste(/home/andersm/tmp/,name,sep=))
}
after which I want to have pair1,...,pair8 as tables.
But I can not get it right. Anybody having a smart solution?
Best regards,
Anders Malmberg
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: T-test syntax question

2004-11-24 Thread Vito Ricci
Hi,

In case of paired data, if you have only differencies
and not original data you can get this t test based on
differencies:

Say d is the vector with differencies data and suppose
you wish to test if the mean of differency is equal to
zero:

md-mean(d) ## sample mean of differencies
sdd-sd(d) ## sample sd of differencies
n-length(d) ## sample size
t.value-(md/(sdd/sqrt(n))) ## sample t-value with n-1
df
pt(t.value,n-1,lower.tail=FALSE) ## p-value of test

 set.seed(13)
 d-rnorm(50)
 md-mean(d) ## sample mean of differencies
 sdd-sd(d) ## sample sd of differencies
 n-length(d) ## sample size
 t.value-(md/(sdd/sqrt(n))) ## sample t-value with
n-1 df
 pt(t.value,n-1,lower.tail=FALSE) ## p-value of test
[1] 0.5755711

Best regards,
Vito



Steven F. Freeman wrote:

I'd like to do a t-test to compare the Delta values of
items with Crit=1
with Delta values of items with Crit=0. What is the
t.test syntax?

It should produce a result like this below (I can't
get in touch with the
person who originally did this for me)

Welch Two Sample t-test

data:  t1$Delta by Crit
t = -3.4105, df = 8.674, p-value = 0.008173
alternative hypothesis: true
difference in means is not equal to 0
95 percent confidence interval:
 -0.04506155 -0.00899827
sample estimates:
mean in group FALSE  mean in group TRUE 
 0.03331391  0.06034382 

Thanks.

=
Diventare costruttori di soluzioni
Became solutions' constructors

The business of the statistician is to catalyze 
the scientific learning process.  
George E. P. Box


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese  http://www.modugno.it/archivio/palese/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: CRLF-terminated Fortran files

2004-11-24 Thread Prof Brian Ripley
I added a check for CRLF termination of Fortran and C++ source files to R
CMD check and found potential problems in packages
BsMD
MCMCpack (C++)
asypow
aws
bayesSurv (C++)
eha
fBasics/fOptions/fSeries
gam
mclust
ncomplete
noverlap
pan
rrcov
subselect (C++)
survrec
I'd be interested to know if C++ gives your problems too.  (Sun cc used to 
object to CRLF, but does not in Forte 7.)  I don't think I've ever tried 
one of the above before on Solaris, but had no CRLF problems with Forte 7, 
just plenty of other problems in MCMCpack and bayesSurv.  Could you please 
try installing subselect.

On Wed, 24 Nov 2004, Prof Brian Ripley wrote:
What did this have to do with GLMM?  I've changed the subject line.
On Wed, 24 Nov 2004, Richard A. O'Keefe wrote:
I was trying to install some more packages and ran into a problem
I hadn't seen before.
We've seen it for C, and test it for C in R CMD check.  I think we should 
check C++ and Fortran files too.

Version:
   platform sparc-sun-solaris2.9
   arch sparc
   os   solaris2.9
   system   sparc, solaris2.9
   status
   major2
   minor0.1
   year 2004
   month11
   day  15
   language R
Fortran compilers available to me:
   f77: Sun WorkShop 6 update 2 FORTRAN 77 5.3 2001/05/15
   f90: Sun WorkShop 6 update 2 Fortran 95 6.2 2001/05/15
   f95: Sun WorkShop 6 update 2 Fortran 95 6.2 2001/05/15
Package:
   gam
   In fact I didn't ask for this one specifically, I had
   dependencies=TRUE in a call to install.packages().
Problem:
   Following the installation instructions for R, I had selected F95
   as my Fortran compiler.
   The f95 compiler complained about nearly every line of
   gam/src/bsplvd.f
   From the error messages as displayed on the screen, I could see no
   reason for complaint.  However, looking at the file with a text
   editor immediately revealed the problem.  The files
bsplvd.fbvalue.fbvalus.floessf.f
qsbart.fsgram.f sinerp.fsslvrg.f
stxwx.f
   all use CR-LF line termination.  The files
linear.flo.fsplsm.f
   all use LF line termination expected on UNIX.
   It turns out that the g77 and f77 compilers don't mind CR at the
   end of a line, but f90 and f95 hate them like poison.
   Removing the CRs makes f90 and f95 happy again.
BTW, in that version of Sun Workshop f90 and f95 are the same compiler, and 
in later versions so is f77.  (I think these compilers are on version 9 now.) 
Even in version 7, there is no problem with line endings.

I did get a warning:
 call dchdc(a,p,p,work,jpvt,job,info)
^
linear.f, Line = 408, Column = 38: WARNING: Procedure DCHDC is defined at 
line 194 (linear.f).  Illegal association of array actual argument with 
scalar dummy argument INFO.

which seems genuine (make it info(1) in the call).

Second-order problem:
   I know how to fix the immediate problem.  What I don't know is how
   to intervene in the installation process.  What I need to do is
- get and unpack files (steps normally done by install.packages)
- make changes (remove CR, edit configuration, whatever)
- resume whatever install.packages normally does
- Use install.packages(destdir=) to retain the tarballs which are 
downloaded.
- Unpack the package tarball, make changes.
- Run R CMD INSTALL on the changed sources.
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Grumble ...

2004-11-24 Thread Duncan Murdoch
On Wed, 24 Nov 2004 10:36:35 - (GMT), (Ted Harding)
[EMAIL PROTECTED] wrote:

Hi Folks,

A Grumble ...

The message I just sent to R-help about The hidden costs of GPL ...
has evoked a Challenge response:

  Hi,
  You´ve just sent a message to [EMAIL PROTECTED]
  In order to confirm the sent message, please click here

Here's a strategy that I hope subverts this irritating mechanism:
Every now and then I get a challenge about a message that I didn't
send, because someone (or some virus) forged me into the From:
address.  Those are the only ones I confirm.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Automatic file reading

2004-11-24 Thread Adaikalavan Ramasamy
for(i in 1:10){ assign( paste(data, i), i ) }
 data1
[1] 1
 data5
[1] 5
 data8 + data5
[1] 13

See help(assign) for more details and examples.


On Wed, 2004-11-24 at 11:10, Anders Malmberg wrote:
 Hi,
 
 I want to do automatic reading of a number of tables (files) stored in 
 ascii format
 without having to specify the variable name in R each time.  Below is an 
 example
 of how I would like to use it (I assume files pair1,...,pair8 exist in 
 spec. dire.)
 
 for (i in 1:8){
   name - paste(pair,i,sep=)
   ? ? ? - read.table(paste(/home/andersm/tmp/,name,sep=))
 }
 
 after which I want to have pair1,...,pair8 as tables.
 
 But I can not get it right. Anybody having a smart solution?
 
 Best regards,
 Anders Malmberg
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
Adaikalavan Ramasamy[EMAIL PROTECTED]
Centre for Statistics in Medicine   http://www.ihs.ox.ac.uk/csm/
Cancer Research UK  Tel : 01865 226 677
Old Road Campus, Headington, Oxford Fax : 01865 226 962

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Automatic file reading

2004-11-24 Thread Uwe Ligges
Anders Malmberg wrote:
Hi,
I want to do automatic reading of a number of tables (files) stored in 
ascii format
without having to specify the variable name in R each time.  Below is an 
example
of how I would like to use it (I assume files pair1,...,pair8 exist in 
spec. dire.)

for (i in 1:8){
 name - paste(pair,i,sep=)
 ? ? ? - read.table(paste(/home/andersm/tmp/,name,sep=))
}
pairlist - vector(8, mode = list)
for (i in 1:8){
  name - paste(pair,i,sep=)
  pairlist[[i]] - read.table(paste(/home/andersm/tmp/,name,sep=))
}
or use assign(), but you don't want to do that really.
Uwe Ligges


after which I want to have pair1,...,pair8 as tables.
But I can not get it right. Anybody having a smart solution?
Best regards,
Anders Malmberg
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Automatic file reading

2004-11-24 Thread Arne Henningsen
Hi Andreas,

what's about:
pair - list()
for (i in 1:8){
   name - paste(pair,i,sep=)
   pair[[ i ]] - read.table(paste(/home/andersm/tmp/,name,sep=))
}

Arne

On Wednesday 24 November 2004 12:10, Anders Malmberg wrote:
 Hi,

 I want to do automatic reading of a number of tables (files) stored in
 ascii format
 without having to specify the variable name in R each time.  Below is an
 example
 of how I would like to use it (I assume files pair1,...,pair8 exist in
 spec. dire.)

 for (i in 1:8){
   name - paste(pair,i,sep=)
   ? ? ? - read.table(paste(/home/andersm/tmp/,name,sep=))
 }

 after which I want to have pair1,...,pair8 as tables.

 But I can not get it right. Anybody having a smart solution?

 Best regards,
 Anders Malmberg

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Arne Henningsen
Department of Agricultural Economics
University of Kiel
Olshausenstr. 40
D-24098 Kiel (Germany)
Tel: +49-431-880 4445
Fax: +49-431-880 1397
[EMAIL PROTECTED]
http://www.uni-kiel.de/agrarpol/ahenningsen/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Automatic file reading

2004-11-24 Thread Sean Davis
If you simply want read all files in a given directory, you can do 
something like:

fullpath = /home/andersm/tmp
filenames - dir(fullpath,pattern=*)
pair - sapply(filenames,function(x) 
{read.table(paste(fullpath,'/',x,sep=))})

Sorry, untested.  But the point is that you can use dir to get all of 
the filenames specified by pattern from a directory specified by 
fullpath.

Sean
On Nov 24, 2004, at 7:31 AM, Arne Henningsen wrote:
Hi Andreas,
what's about:
pair - list()
for (i in 1:8){
   name - paste(pair,i,sep=)
   pair[[ i ]] - read.table(paste(/home/andersm/tmp/,name,sep=))
}
Arne
On Wednesday 24 November 2004 12:10, Anders Malmberg wrote:
Hi,
I want to do automatic reading of a number of tables (files) stored in
ascii format
without having to specify the variable name in R each time.  Below is 
an
example
of how I would like to use it (I assume files pair1,...,pair8 exist in
spec. dire.)

for (i in 1:8){
  name - paste(pair,i,sep=)
  ? ? ? - read.table(paste(/home/andersm/tmp/,name,sep=))
}
after which I want to have pair1,...,pair8 as tables.
But I can not get it right. Anybody having a smart solution?
Best regards,
Anders Malmberg
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
--
Arne Henningsen
Department of Agricultural Economics
University of Kiel
Olshausenstr. 40
D-24098 Kiel (Germany)
Tel: +49-431-880 4445
Fax: +49-431-880 1397
[EMAIL PROTECTED]
http://www.uni-kiel.de/agrarpol/ahenningsen/
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data.frame into vector

2004-11-24 Thread Jan Goebel
Hi,

as other already pointed out as.matrix is what you need.
Just one comment:

as.matrix(x[1,]) 

should be much faster for larger data frames compared to

as.matrix(x)[1,]

Best 

jan


On Tue, 23 Nov 2004, Tiago R Magalhaes wrote:

 Hi
 
 I want to extract a row from a data.frame but I want that object to 
 be a vector . After trying some different ways I end up always with a 
 data.frame or with the wrong vector. Any pointers?
 
  x - data.frame(a = factor(c('a',2,'b')), b = c(4,5,6))
 I want to get
 a 4
 
 I tried:
 
 as.vector(x[1,])
   a b
 1 a 4
 (resulting in a data.frame even after in my mind having coerced it 
 into a vector!)
 
 as.vector(c[1,], numeric='character')
 [1] 2 4
 (almost what I want, except that 2 instead of a - I guess this as 
 to do with levels and factors)
 
 Thanks for any help
 
  R.Version()
 $platform
 [1] powerpc-apple-darwin6.8
 
 $arch
 [1] powerpc
 
 $os
 [1] darwin6.8
 
 $system
 [1] powerpc, darwin6.8
 
 $status
 [1] 
 
 $major
 [1] 2
 
 $minor
 [1] 0.1
 
 $year
 [1] 2004
 
 $month
 [1] 11
 
 $day
 [1] 15
 
 $language
 [1] R
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

-- 
+-
 Jan Goebel 
 j g o e b e l @ d i w . d e

 DIW Berlin 
 German Socio-Economic Panel Study (GSOEP) 
 Königin-Luise-Str. 5
 D-14195 Berlin -- Germany --
 phone: 49 30 89789-377
+-

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] timeDate

2004-11-24 Thread Gabor Grothendieck
Gabor Grothendieck ggrothendieck at myway.com writes:

: 
: Yasser El-Zein abu3ammar at gmail.com writes:
: 
: : 
: : I am looking for up to the millisecond resolution. Is there a package
: : that has that?
: : 
: : On Mon, 22 Nov 2004 21:48:20 + (UTC), Gabor Grothendieck
: : ggrothendieck at myway.com wrote:
: :  Yasser El-Zein abu3ammar at gmail.com writes:
: :  
: :  
: :   From the document it is apparent to me that I need as.POSIXct  (I have
: :   a double representing the number of millis since 1/1/1970 and I need
: :   to construct a datetime object). I see it showing how to construct the
: :   time object from a string representing the time but now fro a double
: :   of millis. Does anyone know hoe to do that?
: :  
: :  
: :  If by millis you mean milliseconds (i.e. one thousandths of a second)
: :  then POSIXct does not support that resolution, but if rounding to
: :  seconds is ok then
: :  
: :structure(round(x/1000), class = c(POSIXt, POSIXct))
: :  
: :  should give it to you assuming x is the number of milliseconds.
: 
: There is no package/class that represents times and dates
: internally as milliseoncds since Jan 1, 1970.   You can
: rework your data into chron's internal representation, viz.
: day number plus fraction of day, like this:
: 
:   # x is vector of milliseconds since Jan 1/70
:   # x.chron is corresponding chron date/time
:   # untested
:   library(chron) 
:   ms.in.day - 1000*24*60*60 
:   day - floor(x/ms.in.day) 
:   frac - (x-1000*day)/ms.in.day
:   x.chron - chron(day+frac)

Not sure why I made the above so complicated but it can
be written just as:

library(chron)
ms.in.day - 1000*24*60*60
x.chron - chron(x/ms.in.day)

: If you need to take leap seconds into account (which the above
: does not) then note that R comes with a builtin vector called
: leap.seconds.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Grumble ...

2004-11-24 Thread Martin Maechler
 Ted == Ted Harding [EMAIL PROTECTED]
 on Wed, 24 Nov 2004 10:36:35 - (GMT) writes:

Ted Hi Folks, A Grumble ...

Ted The message I just sent to R-help about The hidden
Ted costs of GPL ...  has evoked a Challenge response:

Ted   Hi, You´ve just sent a message to
Ted [EMAIL PROTECTED] In order to confirm the sent
Ted message, please click here

Ted   This confirmation is necessary because
Ted [EMAIL PROTECTED] uses Antispam UOL, a service
Ted that avoids unwanted messages like advertising,
Ted pornography, viruses, and spams.

Ted   Other messages sent to [EMAIL PROTECTED]
Ted won't need to be confirmed*.  *If you receive another
Ted confirmation request, please ask
Ted [EMAIL PROTECTED] to include you in his/her
Ted authorized e-mail list.

Ted I won't be responding to this. Let the recipient simply
Ted not receive the mail. Of no great importance in this
Ted case, but a disadvantage to the recipient in the long
Ted run.

Ted I disapprove strongly of this mechanism, and want to
Ted oppose it.  There must be a few thousand subscribers to
Ted R-help. If the Challenge mechanism became widespread,
Ted then I would receive thousands of such messages. Rather
Ted than respond to all these, I would quit the list (and
Ted of course probably many others).  The Challenge
Ted mechanism would destroy the mailing-list community if
Ted it became widely adopted.

Exactly.
I've received such a message myself from the same machine and
-- as mailing list manager -- tried to find out more.

The problem is that [EMAIL PROTECTED] is not subscribed
to R-help. One other person is and I have written e-mail to that
address withOUT getting such a message back..

Again, I completely agree that it is absolutely inacceptable 
to subscribe from such a spam-blocking address.


Ted One reason I am posting this grumble to R-help is in
Ted the hope that I get a challenge to this one too. In
Ted that case, once and for all, I shall respond, so that
Ted the recipient will see this message and (I hope) do
Ted something about it, to eliminate the Challenge
Ted responder (I can't find the true recipient's email
Ted address from the Challenge).

please let me (or R-help too) know what you find out.

Martin Maechler, ETH Zurich
(R-help mailing list maintainer)

Ted The recipient may be able to recognise themselves from
Ted the fact that they receive this message but not the
Ted message which triggered the response, which began:
Ted === On 24-Nov-04
Ted John wrote:
 Off hand, the costs of GPL'd software are not hidden at
 all.  R for instance demands that a would be user sit
 down and learn the language. This in turn pushes a user
 into learning more about statistics than the simple
 overview that Stat 1 presents a student.

Ted I'd see this as less a cost than a benefit!
Ted ===

Ted My apologies for bothering you with this if you didn't
Ted want to know about it.

Ted Best wishes to all, Ted.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Grumble ...

2004-11-24 Thread Peter Dalgaard
Duncan Murdoch [EMAIL PROTECTED] writes:

 On Wed, 24 Nov 2004 10:36:35 - (GMT), (Ted Harding)
 [EMAIL PROTECTED] wrote:
 
 Hi Folks,
 
 A Grumble ...
 
 The message I just sent to R-help about The hidden costs of GPL ...
 has evoked a Challenge response:
 
   Hi,
   You´ve just sent a message to [EMAIL PROTECTED]
   In order to confirm the sent message, please click here
 
 Here's a strategy that I hope subverts this irritating mechanism:
 Every now and then I get a challenge about a message that I didn't
 send, because someone (or some virus) forged me into the From:
 address.  Those are the only ones I confirm.

Hehe... But don't you risk getting listed as an active spammer or
something that way? Personally I just send them to the bogus folder
for later update to the spamfilter.

Imagine if everyone had this challenge stuff installed and we had to
confirm every message ~1e3 times (how many subscribers are we these
days). The vacation messages are annoying enough. 

I wonder how this guy got on the list in the first place. I suspect
that he couldn't actually have completed the subscription process
unless the mechanism was installed after the subscription.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Ramon Diaz-Uriarte
Dear Cristoph,

I guess you want to assess the error rate of a LDA that has been fitted to a 
set of currently existing training data, and that in the future you will get 
some new observation(s) for which you want to make a prediction.
Then, I'd say that you want to use the second approach. You might find that 
the first step turns out to be crucial and, after all, your whole subsequent 
LDA is contingent on the PC scores you obtain on the previous step. Somewhat 
similar issues have been discussed in the microarray literature. Two 
references are:


@ARTICLE{ambroise-02,
  author = {Ambroise, C. and McLachlan, G. J.},
  title = {Selection bias in gene extraction on the basis of microarray 
gene-expression data},
  journal = {Proc Natl Acad Sci USA},
  year = {2002},
  volume = {99},
  pages = {6562--6566},
  number = {10},
}


@ARTICLE{simon-03,
  author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.},
  title = {Pitfalls in the use of DNA microarray data for diagnostic and 
prognostic classification},
  journal = {Journal of the National Cancer Institute},
  year = {2003},
  volume = {95},
  pages = {14--18},
  number = {1},
}


I am not sure, though, why you use PCA followed by LDA. But that's another 
story.

Best,


R.

On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote:
 Dear all, not really a R question but:

 If I want to check for the classification accuracy of a LDA with
 previous PCA for dimensionality reduction by means of the LOOCV method:

 Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA
 with the CV option set to TRUE (runs LOOCV)

 -- OR--

 do I need
 - to compute for each 'test-bag' (the n-1 observations) a PCA
 (my.princomp.1),
 - then run the LDA on the test-bag scores (- my.lda.1)
 - then compute the scores of the left-out-observation using
 my.princomp.1 (- my.scores.2)
 - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of
 the left-out-observation

 ?
 I read some articles, where they choose procedure 1, but I am not sure,
 if this is really correct?

 many thanks for a hint

 Christoph

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Two factor ANOVA in lme

2004-11-24 Thread Fernando Henrique Ferraz P. da Rosa
nat writes:
 I want to specify a two-factor model in lme, which should be easy? 
 Here's what I have:
 
 factor 1 - treatment FIXED (two levels)
 factor 2 - genotype RANDOM (160 genotypes in total)
 
 I need a model that tells me whether the treatment, genotype and 
 interaction terms are significant. I have been reading 'Mixed effects 
 models in S' but in all examples the random factor is not in the main 
 model - it is a nesting factor etc to specify the error structure. Here 
 i need the random factor in the model.
 
 I have tried this:
 
 height.aov-lme(height~trt*genotype,data.reps,random=~1|genotype,na.action=na.exclude)
 
 but the output is nothing like that from Minitab (my only previous 
 experience of stats software). The results for the interaction term are 
 the same but F values for the factors alone are very different between 
 Minitab and R.
 
 This is a very simple model but I can't figure out how to specify it. 
 Help would be much appreciated.
 
 As background: The data are from a QTL mapping population, which is why 
 I must test to see if genotype is significant and also why genotype is a 
 random factor.
 
 Thanks

It seems your message didn't get any replies (at least none
posted to r-help). 

I recentely adjusted such a model  (two effects, one fixed,
another random, with interaction effects) using lme. I used the
following command: 

 z1 - lme(reacao ~ posicao,data=memoria,random=~1|subject/posicao)

Where my model is 

 reacao = mu + posicao (fixed) + posicao*subject (random) +
subject (random)

 Beware though that minitab uses different estimation methods (in lme
itself you may use maximum likelihood other restricted m.l) and the
results need not to be the same.

 


--
Fernando Henrique Ferraz P. da Rosa
http://www.ime.usp.br/~feferraz

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Torsten Hothorn

On Wed, 24 Nov 2004, Ramon Diaz-Uriarte wrote:

 Dear Cristoph,

 I guess you want to assess the error rate of a LDA that has been fitted to a
 set of currently existing training data, and that in the future you will get
 some new observation(s) for which you want to make a prediction.
 Then, I'd say that you want to use the second approach. You might find that
 the first step turns out to be crucial and, after all, your whole subsequent
 LDA is contingent on the PC scores you obtain on the previous step.

Ramon,

as long as one does not use the information in the response (the class
variable, in this case) I don't think that one ends up with an
optimistically biased estimate of the error (although leave-one-out is
a suboptimal choice). Of course, when one starts to tune the method
used for dimension reduction, a selection of the procedure with minimal
error will produce a bias. Or am I missing something important?

Btw, `ipred::slda' implements something not completely unlike the
procedure Christoph is interested in.

Best,

Torsten

 Somewhat
 similar issues have been discussed in the microarray literature. Two
 references are:


 @ARTICLE{ambroise-02,
   author = {Ambroise, C. and McLachlan, G. J.},
   title = {Selection bias in gene extraction on the basis of microarray
 gene-expression data},
   journal = {Proc Natl Acad Sci USA},
   year = {2002},
   volume = {99},
   pages = {6562--6566},
   number = {10},
 }


 @ARTICLE{simon-03,
   author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.},
   title = {Pitfalls in the use of DNA microarray data for diagnostic and
 prognostic classification},
   journal = {Journal of the National Cancer Institute},
   year = {2003},
   volume = {95},
   pages = {14--18},
   number = {1},
 }


 I am not sure, though, why you use PCA followed by LDA. But that's another
 story.

 Best,


 R.

 On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote:
  Dear all, not really a R question but:
 
  If I want to check for the classification accuracy of a LDA with
  previous PCA for dimensionality reduction by means of the LOOCV method:
 
  Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA
  with the CV option set to TRUE (runs LOOCV)
 
  -- OR--
 
  do I need
  - to compute for each 'test-bag' (the n-1 observations) a PCA
  (my.princomp.1),
  - then run the LDA on the test-bag scores (- my.lda.1)
  - then compute the scores of the left-out-observation using
  my.princomp.1 (- my.scores.2)
  - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of
  the left-out-observation
 
  ?
  I read some articles, where they choose procedure 1, but I am not sure,
  if this is really correct?
 
  many thanks for a hint
 
  Christoph
 
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html

 --
 Ramón Díaz-Uriarte
 Bioinformatics Unit
 Centro Nacional de Investigaciones Oncológicas (CNIO)
 (Spanish National Cancer Center)
 Melchor Fernández Almagro, 3
 28029 Madrid (Spain)
 Fax: +-34-91-224-6972
 Phone: +-34-91-224-6900

 http://ligarto.org/rdiaz
 PGP KeyID: 0xE89B3462
 (http://ligarto.org/rdiaz/0xE89B3462.asc)

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] an R function to search on Prof. Baron's site

2004-11-24 Thread Andy Bunn
Using this function with 2.0.0 XP and Firefox 1.0 (I've rediscovered the
internet) produces a curious result.

 myString - RSiteSearch(string = 'Ripley')
 myString
[1]
http://finzi.psych.upenn.edu/cgi-bin/htsearch?config=htdigrun1;restrict=Rhe
lp00/archive|Rhelp01/archive|Rhelp02a/archive;format=builtin-long;sort=score
;words=Ripley;matchesperpage=10
 version
 _
platform i386-pc-mingw32
arch i386
os   mingw32
system   i386, mingw32
status
major2
minor0.0
year 2004
month10
day  04
language R

If no browser is open, then this is the URL that is browsed in Firefox:
http://finzi.psych.upenn.edu/cgi-bin/htsearch?config=htdigrun1;restrict=Rhel
p00/archive

Oddly, these two other windows are opened too:
http://finzi.psych.upenn.edu/R/Rhelp01/archive/1000.html

and:
http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg17461.html

This happens regardless of what the search string is. If a browser window is
open then everything works as planned. The sticky bit, obviously, is parsing
browseURL which has the same behavior if I try:
 browseURL(myString)

However, the searches:
 RSiteSearch(string = 'browseURL Firefox')
 RSiteSearch(string = 'browseURL Mozilla')

don't turn up much help! If I change browseURL to use IE then browseURL
behaves as expected:

 browseURL(myString, browser=C:/Program Files/Internet
Explorer/iexplore.exe)

Specifying Firefox explicitly in browseURL doesn't help - It still opens
three windows as above (if no browser is open):

 browseURL(myString, browser=C:/Program Files/Mozilla
Firefox/firefox.exe)

So, under Windows the 'NULL' argument in 'browser' which determines the
browser via file association isn't the problem.

Anybody know how I can make Firefox work a little more smoothly?

Thanks, Andy



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Gabor Grothendieck
 Sent: Tuesday, November 23, 2004 11:56 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [R] an R function to search on Prof. Baron's site


 Liaw, Andy andy_liaw at merck.com writes:

 :
 : Inspired by the functions that Barry Rawlingson and Dave
 Forrest posted for
 : searching Rwiki and R-help archive, I've made up a function
 that does the
 : search on Prof. Baron's site (Thanks to Prof. Baron's help on
 setting up the
 : query string!):

 It would be nice if this and the other search functions recently
 posted were collected into a package or even integrated into
 R itself.  In the case of the Windows Rgui, it would be nice if they
 appeared on a menu with the other search and help functions.

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Christoph Lehmann
Thank you, Torsten; that's what I thought, as long as one does not use 
the 'class label' as a constraint in the dimension reduction, the 
procedure is ok. Of course it is computationally more demanding, since 
for each new (unknown in respect of the class label) observation one has 
to compute a new PCA as well.

Cheers
Christoph
Torsten Hothorn wrote:
On Wed, 24 Nov 2004, Ramon Diaz-Uriarte wrote:

Dear Cristoph,
I guess you want to assess the error rate of a LDA that has been fitted to a
set of currently existing training data, and that in the future you will get
some new observation(s) for which you want to make a prediction.
Then, I'd say that you want to use the second approach. You might find that
the first step turns out to be crucial and, after all, your whole subsequent
LDA is contingent on the PC scores you obtain on the previous step.

Ramon,
as long as one does not use the information in the response (the class
variable, in this case) I don't think that one ends up with an
optimistically biased estimate of the error (although leave-one-out is
a suboptimal choice). Of course, when one starts to tune the method
used for dimension reduction, a selection of the procedure with minimal
error will produce a bias. Or am I missing something important?
Btw, `ipred::slda' implements something not completely unlike the
procedure Christoph is interested in.
Best,
Torsten

Somewhat
similar issues have been discussed in the microarray literature. Two
references are:
@ARTICLE{ambroise-02,
 author = {Ambroise, C. and McLachlan, G. J.},
 title = {Selection bias in gene extraction on the basis of microarray
gene-expression data},
 journal = {Proc Natl Acad Sci USA},
 year = {2002},
 volume = {99},
 pages = {6562--6566},
 number = {10},
}
@ARTICLE{simon-03,
 author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.},
 title = {Pitfalls in the use of DNA microarray data for diagnostic and
prognostic classification},
 journal = {Journal of the National Cancer Institute},
 year = {2003},
 volume = {95},
 pages = {14--18},
 number = {1},
}
I am not sure, though, why you use PCA followed by LDA. But that's another
story.
Best,
R.
On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote:
Dear all, not really a R question but:
If I want to check for the classification accuracy of a LDA with
previous PCA for dimensionality reduction by means of the LOOCV method:
Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA
with the CV option set to TRUE (runs LOOCV)
-- OR--
do I need
- to compute for each 'test-bag' (the n-1 observations) a PCA
(my.princomp.1),
- then run the LDA on the test-bag scores (- my.lda.1)
- then compute the scores of the left-out-observation using
my.princomp.1 (- my.scores.2)
- and only then use predict.lda(my.lda.1, my.scores.2) on the scores of
the left-out-observation
?
I read some articles, where they choose procedure 1, but I am not sure,
if this is really correct?
many thanks for a hint
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
--
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Respuesta Automatica CorreoDirect.

2004-11-24 Thread infocd
###
Este email esta generado de manera automatica
###

Ha escrito usted a una cuenta de correo generica de CorreoDirect.

Si desea conocer nuestra política de privacidad, pulse aquí.
http://www.correodirect.com/public/nosotros/privacidad.php

Si usted esta dado de alta en nuestro servicio, tiene la posibilidad de
darse fácilmente de baja o bien modificar su perfil para que las ofertas
que le enviamos se ajusten mejor a sus intereses.

Si lo que desea es modificar su perfil pulse aquí
http://www.correodirect.com/usuarios/area/modif.php

Si desea darse de baja de nuestro servicio pulse aquí
http://www.correodirect.com/usuarios/area/baja.php

Si cree que sus datos se encuentran por error en nuestra Base de Datos, 
escriba un email a [EMAIL PROTECTED]

Si tiene dudas sobre nuestro funcionamiento, acuda a
http://www.correodirect.com/public/ayuda/faqusuario.php

Atentamente,
Atención al Usuario, CorreoDirect

www.correodirect.com

CorreoDirect, el lider en permission email marketing.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Witold Eryk Wolski
Hi,
I want to draw a scatter plot with 1M  and more points and save it as pdf.
This makes the pdf file large.
So i tried to save the file first as png and than convert it to pdf. 
This looks OK if printed but if viewed e.g. with acrobat as document 
figure the quality is bad.

Anyone knows a way to reduce the size but keep the quality?
/E
--
Dipl. bio-chem. Witold Eryk Wolski
MPI-Moleculare Genetic
Ihnestrasse 63-73 14195 Berlin
tel: 0049-30-83875219 __(_
http://www.molgen.mpg.de/~wolski  \__/'v'
http://r4proteomics.sourceforge.net||/   \
mail: [EMAIL PROTECTED]^^ m m
 [EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] 2GB dataset

2004-11-24 Thread apollo wong
Hi, do any one have experience with loading dataset
that is larger than 2GB into R. My organization is a
SAS oriented shop and I'm in the process of switching
it to R. One of the complain about R has always been
it's inability to handle large dataset (GB)
efficiently. I would like some comments from someone
with experience of working on 2GB dataset in R.
Thanks.
Apollo

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tsdiag for ar?

2004-11-24 Thread Dr Carbon
Is there a way to have the ar function work with tsdiag for on-the-fly
visualization of ar fits? I have to fit a great many models of varying
order and would like to save the diagnostic graphs.

For instance, 
tsdiag(ar(lh))
tsdiag(arima(lh, order = c(3,0,0)))

Thanks...

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] 2GB dataset

2004-11-24 Thread Prof Brian Ripley
Absolutely no problem on 64-bit OSes with enough memory.  Many 32-bit OSes 
have problems with  2Gb files.

Please do read the posting guide and tell us basic facts like which OS you 
are running on, so we don't have to speculate to answer your question.

Also, what you want to do with the dataset?   This matters crucially.
On Wed, 24 Nov 2004, apollo wong wrote:
Hi, do any one have experience with loading dataset
that is larger than 2GB into R. My organization is a
SAS oriented shop and I'm in the process of switching
it to R. One of the complain about R has always been
it's inability to handle large dataset (GB)
efficiently. I would like some comments from someone
with experience of working on 2GB dataset in R.
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tsdiag for ar?

2004-11-24 Thread Prof Brian Ripley
On Wed, 24 Nov 2004, Dr Carbon wrote:
Is there a way to have the ar function work with tsdiag for on-the-fly
visualization of ar fits? I have to fit a great many models of varying
order and would like to save the diagnostic graphs.
First you have to produce them, surely?
For instance,
tsdiag(ar(lh))
That gives an error. All you have to do is to write an ar method for 
tsdiag -- a good exercise in R programming for you.

tsdiag(arima(lh, order = c(3,0,0)))

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Marc Schwartz
On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
 Hi,
 
 I want to draw a scatter plot with 1M  and more points and save it as pdf.
 This makes the pdf file large.
 So i tried to save the file first as png and than convert it to pdf. 
 This looks OK if printed but if viewed e.g. with acrobat as document 
 figure the quality is bad.
 
 Anyone knows a way to reduce the size but keep the quality?

Hi Eryk!

Part of the problem is that in a pdf file, the vector based instructions
will need to be defined for each of your 10 ^ 6 points in order to draw
them.

When trying to create a simple example:

pdf()
plot(rnorm(100), rnorm(100))
dev.off()

The pdf file is 55 Mb in size.

One immediate thought was to try a ps file and using the above plot, the
ps file was only 23 Mb in size. So note that ps can be more efficient.

Going to a bitmap might result in a much smaller file, but as you note,
the quality does degrade as compared to a vector based image.

I tried the above to a png, then converted to a pdf (using 'convert')
and as expected, the image both viewed and printed was pixelated,
since the pdf instructions are presumably drawing pixels and not vector
based objects.

Depending upon what you plan to do with the image, you may have to
choose among several options, resulting in tradeoffs between image
quality and file size.

If you can create the bitmap file explicitly in the size that you
require for printing or incorporating in a document, that is one way to
go and will preserve, to an extent, the overall fixed size image
quality, while keeping file size small.

Another option to consider for the pdf approach, if it does not
compromise the integrity of your plot, is to remove any duplicate data
points if any exist. Thus, you will not need what are in effect
redundant instructions in the pdf file. This may not be possible
depending upon the nature of your data (ie. doubles) without considering
some tolerance level for equivalence.

Perhaps others will have additional ideas.

HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Ted Harding
On 24-Nov-04 Witold Eryk Wolski wrote:
 Hi,
 I want to draw a scatter plot with 1M  and more points
 and save it as pdf.
 This makes the pdf file large.
 So i tried to save the file first as png and than convert
 it to pdf. This looks OK if printed but if viewed e.g. with
 acrobat as document figure the quality is bad.
 
 Anyone knows a way to reduce the size but keep the quality?

If you want the PDF file to preserve the info about all the
1M points then the problem has no solution. The png file
will already have suppressed most of this (which is one
reason for poor quality).

I think you should give thought to reducing what you need
to plot.

Think about it: suppose you plot with a resolution of
1/200 points per inch (about the limit at which the eye
begins to see rough edges). Then you have 4 points
per square inch. If your 1M points are separate but as
closely packed as possible, this requires 25 square inches,
or a 5x5 inch (= 12.7x12.7 cm) square. And this would be
solid black!

Presumably in your plot there is a very large number of
points which are effectively indistinguisable from other
points, so these could be eliminated without spoiling
the plot.

I don't have an obviously best strategy for reducing what
you actually plot, but perhaps one line to think along
might be the following:

1. Multiply the data by some factor and then round the
   results to an integer (to avoid problems in step 2).
   Factor chosen so that the result of (4) below is
   satisfactory.

2. Eliminate duplicates in the result of (1).

3. Divide by the factor you used in (1).

4. Plot the result; save plot to PDF.

As to how to do it in R: the critical step is (2),
which with so many points could be very heavy unless
done by a well-chosen procedure. I'm not expert enough
to advise about that, but no doubt others are.

Good luck!
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 24-Nov-04   Time: 16:16:28
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] OOT: frailty-multinivel

2004-11-24 Thread Kjetil Brinchmann Halvorsen
Hola!
I started to search for information about multilevel survival models, and
found frailty in R. This seems to be something of the same, is it the same?
Then: why the name frailty (weekness?)
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Liaw, Andy
Marc/Eryk,

I have no experience with it, but I believe the hexbin package in BioC was
there for this purpose: avoid heavy over-plotting lots of points.  You might
want to look into that, if you have not done so yet.

Best,
Andy

 From: Marc Schwartz
 
 On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
  Hi,
  
  I want to draw a scatter plot with 1M  and more points and 
 save it as pdf.
  This makes the pdf file large.
  So i tried to save the file first as png and than convert 
 it to pdf. 
  This looks OK if printed but if viewed e.g. with acrobat as 
 document 
  figure the quality is bad.
  
  Anyone knows a way to reduce the size but keep the quality?
 
 Hi Eryk!
 
 Part of the problem is that in a pdf file, the vector based 
 instructions
 will need to be defined for each of your 10 ^ 6 points in 
 order to draw
 them.
 
 When trying to create a simple example:
 
 pdf()
 plot(rnorm(100), rnorm(100))
 dev.off()
 
 The pdf file is 55 Mb in size.
 
 One immediate thought was to try a ps file and using the 
 above plot, the
 ps file was only 23 Mb in size. So note that ps can be more 
 efficient.
 
 Going to a bitmap might result in a much smaller file, but as 
 you note,
 the quality does degrade as compared to a vector based image.
 
 I tried the above to a png, then converted to a pdf (using 'convert')
 and as expected, the image both viewed and printed was pixelated,
 since the pdf instructions are presumably drawing pixels and 
 not vector
 based objects.
 
 Depending upon what you plan to do with the image, you may have to
 choose among several options, resulting in tradeoffs between image
 quality and file size.
 
 If you can create the bitmap file explicitly in the size that you
 require for printing or incorporating in a document, that is 
 one way to
 go and will preserve, to an extent, the overall fixed size image
 quality, while keeping file size small.
 
 Another option to consider for the pdf approach, if it does not
 compromise the integrity of your plot, is to remove any duplicate data
 points if any exist. Thus, you will not need what are in effect
 redundant instructions in the pdf file. This may not be possible
 depending upon the nature of your data (ie. doubles) without 
 considering
 some tolerance level for equivalence.
 
 Perhaps others will have additional ideas.
 
 HTH,
 
 Marc Schwartz
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Witold Eryk Wolski
Hi,
I tried the ps idea. But I am using pdflatex.
You get a even larger size reduction if you convert the ps into a pdf 
using ps2pdf.
But unfortunately there is a quality loss.

I have found almost a working solution:
a) Save the scatterplot without axes and with par(mar=c(0,0,0,0)) as png .
b) convert it using any program to pnm
c) read the pnm file using pixmap
d) Add axes labels and lines afterwards with par(new=TRUE)
And this looks like I would like that it looks like. But unfortunately 
acroread and gv on window is crashing when I try to print the file.

png(file=pepslop.png,width=500,height=500)
par(mar=c(0,0,0,0))
X2-rnorm(10)
Y2-X2*10+rnorm(10)
plot(X2,Y2,pch=.,xlab=,ylab=,main=,axes=F)
dev.off()
pdf(file=pepslop.pdf,width=7,height=7)
par(mar=c(3.2,3.2,1,1))
x - read.pnm(pepslop.pnm )
plot(x)
par(new=TRUE)
par(mar=c(3.2,3.2,1,1))
plot(X2,Y2,pch=.,xlab=,ylab=,main=,type=n)
mtext(expression(m[nominal]),side=1,line=2)
mtext(expression(mod(m[monoisotopic],1)),side=2,line=2)
legend(1000,4,expression(paste(lambda[DB],=,0.000495)),col=2,lty=1,lwd=1)
abline(test,col=2,lwd=2)
dev.off()

Marc Schwartz wrote:
On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
 

Hi,
I want to draw a scatter plot with 1M  and more points and save it as pdf.
This makes the pdf file large.
So i tried to save the file first as png and than convert it to pdf. 
This looks OK if printed but if viewed e.g. with acrobat as document 
figure the quality is bad.

Anyone knows a way to reduce the size but keep the quality?
   

Hi Eryk!
Part of the problem is that in a pdf file, the vector based instructions
will need to be defined for each of your 10 ^ 6 points in order to draw
them.
When trying to create a simple example:
pdf()
plot(rnorm(100), rnorm(100))
dev.off()
The pdf file is 55 Mb in size.
One immediate thought was to try a ps file and using the above plot, the
ps file was only 23 Mb in size. So note that ps can be more efficient.
Going to a bitmap might result in a much smaller file, but as you note,
the quality does degrade as compared to a vector based image.
I tried the above to a png, then converted to a pdf (using 'convert')
and as expected, the image both viewed and printed was pixelated,
since the pdf instructions are presumably drawing pixels and not vector
based objects.
Depending upon what you plan to do with the image, you may have to
choose among several options, resulting in tradeoffs between image
quality and file size.
If you can create the bitmap file explicitly in the size that you
require for printing or incorporating in a document, that is one way to
go and will preserve, to an extent, the overall fixed size image
quality, while keeping file size small.
Another option to consider for the pdf approach, if it does not
compromise the integrity of your plot, is to remove any duplicate data
points if any exist. Thus, you will not need what are in effect
redundant instructions in the pdf file. This may not be possible
depending upon the nature of your data (ie. doubles) without considering
some tolerance level for equivalence.
Perhaps others will have additional ideas.
HTH,
Marc Schwartz
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


--
Dipl. bio-chem. Witold Eryk Wolski
MPI-Moleculare Genetic
Ihnestrasse 63-73 14195 Berlin
tel: 0049-30-83875219 __(_
http://www.molgen.mpg.de/~wolski  \__/'v'
http://r4proteomics.sourceforge.net||/   \
mail: [EMAIL PROTECTED]^^ m m
 [EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Thomas Lumley
On Wed, 24 Nov 2004, Witold Eryk Wolski wrote:
Hi,
I want to draw a scatter plot with 1M  and more points and save it as pdf.
Try the hexbin Bioconductor package, which gives hexagonally-binned 
density scatterplots. Even for tens of thousands of points this is often 
much better than a scatterplot.

-thomas
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Witold Eryk Wolski
Hi,
Yes, indeed the hexbin package generates very cool pix. They look great. 
I was using it already.
But this time I am interested in visualizing exactly the _scatter_ of 
some extreme points.

Eryk
Liaw, Andy wrote:
Marc/Eryk,
I have no experience with it, but I believe the hexbin package in BioC was
there for this purpose: avoid heavy over-plotting lots of points.  You might
want to look into that, if you have not done so yet.
Best,
Andy
 

From: Marc Schwartz
On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
   

Hi,
I want to draw a scatter plot with 1M  and more points and 
 

save it as pdf.
   

This makes the pdf file large.
So i tried to save the file first as png and than convert 
 

it to pdf. 
   

This looks OK if printed but if viewed e.g. with acrobat as 
 

document 
   

figure the quality is bad.
Anyone knows a way to reduce the size but keep the quality?
 

Hi Eryk!
Part of the problem is that in a pdf file, the vector based 
instructions
will need to be defined for each of your 10 ^ 6 points in 
order to draw
them.

When trying to create a simple example:
pdf()
plot(rnorm(100), rnorm(100))
dev.off()
The pdf file is 55 Mb in size.
One immediate thought was to try a ps file and using the 
above plot, the
ps file was only 23 Mb in size. So note that ps can be more 
efficient.

Going to a bitmap might result in a much smaller file, but as 
you note,
the quality does degrade as compared to a vector based image.

I tried the above to a png, then converted to a pdf (using 'convert')
and as expected, the image both viewed and printed was pixelated,
since the pdf instructions are presumably drawing pixels and 
not vector
based objects.

Depending upon what you plan to do with the image, you may have to
choose among several options, resulting in tradeoffs between image
quality and file size.
If you can create the bitmap file explicitly in the size that you
require for printing or incorporating in a document, that is 
one way to
go and will preserve, to an extent, the overall fixed size image
quality, while keeping file size small.

Another option to consider for the pdf approach, if it does not
compromise the integrity of your plot, is to remove any duplicate data
points if any exist. Thus, you will not need what are in effect
redundant instructions in the pdf file. This may not be possible
depending upon the nature of your data (ie. doubles) without 
considering
some tolerance level for equivalence.

Perhaps others will have additional ideas.
HTH,
Marc Schwartz
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

   


--
Notice:  This e-mail message, together with any attachments, contains information of 
Merck  Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
and/or its affiliates (which may be known outside the United States as Merck Frosst, 
Merck Sharp  Dohme or MSD and in Japan, as Banyu) that may be confidential, 
proprietary copyrighted and/or legally privileged. It is intended solely for the use of 
the individual or entity named on this message.  If you are not the intended recipient, 
and have received this message in error, please notify us immediately by reply e-mail 
and then delete it from your system.
--
 


--
Dipl. bio-chem. Witold Eryk Wolski
MPI-Moleculare Genetic
Ihnestrasse 63-73 14195 Berlin
tel: 0049-30-83875219 __(_
http://www.molgen.mpg.de/~wolski  \__/'v'
http://r4proteomics.sourceforge.net||/   \
mail: [EMAIL PROTECTED]^^ m m
 [EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Prof Brian Ripley
On Wed, 24 Nov 2004 [EMAIL PROTECTED] wrote:
On 24-Nov-04 Witold Eryk Wolski wrote:
Hi,
I want to draw a scatter plot with 1M  and more points
and save it as pdf.
This makes the pdf file large.
So i tried to save the file first as png and than convert
it to pdf. This looks OK if printed but if viewed e.g. with
acrobat as document figure the quality is bad.
Anyone knows a way to reduce the size but keep the quality?
If you want the PDF file to preserve the info about all the
1M points then the problem has no solution. The png file
will already have suppressed most of this (which is one
reason for poor quality).
I think you should give thought to reducing what you need
to plot.
Think about it: suppose you plot with a resolution of
1/200 points per inch (about the limit at which the eye
begins to see rough edges). Then you have 4 points
per square inch. If your 1M points are separate but as
closely packed as possible, this requires 25 square inches,
or a 5x5 inch (= 12.7x12.7 cm) square. And this would be
solid black!
Presumably in your plot there is a very large number of
points which are effectively indistinguisable from other
points, so these could be eliminated without spoiling
the plot.
I don't have an obviously best strategy for reducing what
you actually plot, but perhaps one line to think along
might be the following:
1. Multiply the data by some factor and then round the
  results to an integer (to avoid problems in step 2).
  Factor chosen so that the result of (4) below is
  satisfactory.
2. Eliminate duplicates in the result of (1).
3. Divide by the factor you used in (1).
4. Plot the result; save plot to PDF.
As to how to do it in R: the critical step is (2),
which with so many points could be very heavy unless
done by a well-chosen procedure. I'm not expert enough
to advise about that, but no doubt others are.
unique will eat that for breakfast
x - runif(1e6)
system.time(xx - unique(round(x, 4)))
[1] 0.55 0.09 0.64 0.00 0.00
length(xx)
[1] 10001
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Matt Nelson
Witold,

I have found that plotting more than a few thousand data points at a time
quickly becomes a loosing proposition.  That is, the dense overlap of data
points tends to obscure the patterns of interest, with only outliers
distinctly visible.  I typically deal with this in two ways.  

The most straight forward is to select a much smaller subset data points to
plot, say on the order of 100-1000, depending on the nature of the data and
the features you want to illustrate.  How you sample depends on the
structure of your data set.  E.g. you may want to sample fixed proportions
within subgroups.  You can add loess lines or confidence ellipses estimated
from the complete data.

Another approach is to estimate the two dimensional density using kde2d()
(MASS package) and represent the result with a contour or image plot.  See
?kde2d for an example.  

Both of these will result in much more manageable (and likely more
informative) figures.

Regards,
Matt

Matthew R. Nelson, Ph.D.
Director, Biostatistics
Sequenom, Inc.


 -Original Message-
 From: Witold Eryk Wolski [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 24, 2004 7:35 AM
 To: R Help Mailing List
 Subject: [R] scatterplot of 10 points and pdf file format
 
 
 Hi,
 
 I want to draw a scatter plot with 1M  and more points and 
 save it as pdf.
 This makes the pdf file large.
 So i tried to save the file first as png and than convert it to pdf. 
 This looks OK if printed but if viewed e.g. with acrobat as document 
 figure the quality is bad.
 
 Anyone knows a way to reduce the size but keep the quality?
 
 
 /E
 
 -- 
 Dipl. bio-chem. Witold Eryk Wolski
 MPI-Moleculare Genetic
 Ihnestrasse 63-73 14195 Berlin
 tel: 0049-30-83875219 __(_
 http://www.molgen.mpg.de/~wolski  \__/'v'
 http://r4proteomics.sourceforge.net||/   \
 mail: [EMAIL PROTECTED]^^ m m
   [EMAIL PROTECTED]
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] coplot =? gannt chart + bargraph

2004-11-24 Thread Jeff D. Hamann
I would like to display some results from simulations in the form of a
Gantt chart (progress) with a barchart (production) of another variable
below (something very similar to coplot charts). I'm not sure if I should
attempt to build this from scratch (using grid or some of the basic
graphics features) or if there's a similar feature in one of the existing
packages.

I need to take the following (truncated) results,

unit,week,machine,volume,pdxratio
0,14,1,925.402525,1.00
0,15,1,925.402525,1.00
0,16,1,925.402525,1.00
0,17,1,702.792425,0.759445
1,46,1,1007.664896,1.00
1,47,1,1007.664896,1.00
1,48,1,1007.664896,1.00
1,49,1,563.005311,0.558723
2,33,1,1019.781108,1.00
2,34,1,1019.781108,1.00
2,35,1,1019.781108,1.00
2,36,1,697.656677,0.684124
3,41,2,1043.451341,1.00
3,42,2,1043.451341,1.00
3,43,2,1043.451341,1.00
3,44,2,741.645977,0.710762
4,7,2,1048.494508,1.00
4,8,2,1048.494508,1.00


and generate charts over unit and week (both as factors?) and think I
should be using aggregate, but wanted to find out if there's a better
method.

Thanks,
Jeff.


-- 
Jeff D. Hamann
Forest Informatics, Inc.
PO Box 1421
Corvallis, Oregon 97339-1421
phone 541-754-1428
fax 541-752-0288
[EMAIL PROTECTED]
http://www.forestinformatics.com

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OOT: frailty-multinivel

2004-11-24 Thread Thomas Lumley
On Wed, 24 Nov 2004, Kjetil Brinchmann Halvorsen wrote:
Hola!
I started to search for information about multilevel survival models, and
found frailty in R. This seems to be something of the same, is it the same?
More or less. Shared frailty models are the same as hierarchical/mixed 
survival models.  R uses a fitting method that is equivalent to maximum 
likelihood only when exp(random effects) has a Gamma distribution.  The 
survival package can fit random intercept models; the new kinship 
package fits much more general mixed models.

[I will put in my usual objection to the term multilevel model
being used to refer solely to models that include unmeasured 
variables]

Then: why the name frailty (weekness?)
The idea is that 'weaker' individuals fail earlier than `stronger' 
individuals for the same values of covariates.  The concept has been used 
both for modelling correlation between survival times and to motivate 
parametric models that give an initially decreasing hazard.  I have a 
vague impression that the term originated in Scandinavia somewhere.

-thomas
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] what does order() stand for in an lme formula?

2004-11-24 Thread Harry Athanassiou
I'm a beginner in R, and trying to fit linear models with different
intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical
variable. I cannot understand whether I should use
y1 ~ x1 +1
or
y1 ~ order(x1) + 1
Although in the toy example included it makes a small difference, in models
with many groups the models without order() converge slower if at all!
 
Please help

---
R script : START
---
##
# what does order() do in an lme formula?
##
 
# prep data
y1 - c(rnorm(25, 35, sd=5), rnorm(17, 55, sd=4)) # this is a line paralle
to x-axis (slope=0) with noise
x1 - c(sample(1:25, 25, replace=F), sample(1:17, 17, replace=F)) # scramble
the x, so that they do not appear in-oorder
f1 - c(rep(A,25), rep(B, 17))
dat1 - data.frame(y1, x1, f1)
x1 - NULL
y1 - NULL
f1 - NULL
 
# load libraries
require(nlme)
require(gmodels) # for the ci()
 
# fit model with and w/o order()
dat1.lm.1  - lm(y1 ~ x1 + f1 -1, data=dat1)
dat1.lm.2  - lm(y1 ~ order(x1) + f1 -1, data=dat1)
#
#using lme, and assigning f1 to the random effects; this is different
than in lm(), but in my larger models
#f1 is a repeated experiment vs a fixed factor
dat1.lme.1 - lme(y1 ~ x1 + 1, random= ~  1 | f1, data=dat1, method=ML)
dat1.lme.2  - update(dat1.lme.1, fixed=y1 ~ order(x1) + 1, random= ~ 1 |
f1)

# compare
summary(dat1.lm.1)
summary(dat1.lm.2)
ci(dat1.lm.1)
ci(dat1.lm.2)
#
summary(dat1.lme.1)
summary(dat1.lme.2)
ci(dat1.lme.1)
ci(dat1.lme.2)
 
---
R script : END
---
 
 
 
---
R session: START
---
 # compare
 summary(dat1.lm.1)
 
Call:
lm(formula = y1 ~ x1 + f1 - 1, data = dat1)
 
Residuals:
Min  1Q  Median  3Q Max 
-7.1774 -2.9020 -0.1616  2.3576 10.0103 
 
Coefficients:
Estimate Std. Error t value Pr(|t|)
x1  -0.061730.09318  -0.6630.512
f1A 35.257041.43540  24.563   2e-16 ***
f1B 54.981931.25519  43.804   2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
 
Residual standard error: 3.851 on 39 degrees of freedom
Multiple R-Squared: 0.9928, Adjusted R-squared: 0.9923 
F-statistic:  1799 on 3 and 39 DF,  p-value:  2.2e-16 
 
 summary(dat1.lm.2)
 
Call:
lm(formula = y1 ~ order(x1) + f1 - 1, data = dat1)
 
Residuals:
Min  1Q  Median  3Q Max 
-7.0089 -3.0955  0.1829  2.3387 10.0083 
 
Coefficients:
   Estimate Std. Error t value Pr(|t|)
order(x1) -0.002098   0.049830  -0.0420.967
f1A   34.502668   1.381577  24.973   2e-16 ***
f1B   54.466926   1.346118  40.462   2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
 
Residual standard error: 3.872 on 39 degrees of freedom
Multiple R-Squared: 0.9927, Adjusted R-squared: 0.9922 
F-statistic:  1779 on 3 and 39 DF,  p-value:  2.2e-16 
 
 ci(dat1.lm.1)
   Estimate   CI lower   CI upper Std. Error  p-value
x1  -0.06173363 -0.2502001  0.1267329 0.09317612 5.115170e-01
f1A 35.25703803 32.3536752 38.1604009 1.43539619 2.461184e-25
f1B 54.98192838 52.4430771 57.5207797 1.25518501 8.793394e-35
 ci(dat1.lm.2)
  Estimate   CI lowerCI upper Std. Error  p-value
order(x1) -0.002097868 -0.1028889  0.09869315 0.04983016 9.666335e-01
f1A   34.502667919 31.7081648 37.29717105 1.38157694 1.338416e-25
f1B   54.466925648 51.7441455 57.18970575 1.34611772 1.819408e-33
 #
 summary(dat1.lme.1)
Linear mixed-effects model fit by maximum likelihood
 Data: dat1 
   AIC  BIClogLik
  249.2458 256.1965 -120.6229
 
Random effects:
 Formula: ~1 | f1
(Intercept) Residual
StdDev:9.819254 3.802406
 
Fixed effects: y1 ~ x1 + 1 
   Value Std.Error DF   t-value p-value
(Intercept) 45.14347  7.215924 39  6.256090  0.
x1  -0.06517  0.094245 39 -0.691489  0.4934
 Correlation: 
   (Intr)
x1 -0.144
 
Standardized Within-Group Residuals:
Min  Q1 Med  Q3 Max 
-1.90575027 -0.76278244 -0.02205757  0.60302788  2.61718120 
 
Number of Observations: 42
Number of Groups: 2 
 summary(dat1.lme.2)
Linear mixed-effects model fit by maximum likelihood
 Data: dat1 
  AIC  BIClogLik
  249.741 256.6917 -120.8705
 
Random effects:
 Formula: ~1 | f1
(Intercept) Residual
StdDev:9.944285 3.823607
 
Fixed effects: y1 ~ order(x1) 
   Value Std.Error DF   t-value p-value
(Intercept) 44.48952  7.309834 39  6.086256  0.
order(x1)   -0.00297  0.050415 39 -0.058968  0.9533
 Correlation: 
  (Intr)
order(x1) -0.146
 
Standardized Within-Group Residuals:
Min  Q1 Med  Q3 Max 
-1.85021774 -0.81922106  0.03922119  0.61748886  2.60194590 
 
Number of Observations: 42
Number of Groups: 2 
 ci(dat1.lme.1)
   Estimate   CI lower   CI upper Std. Error DF  p-value
(Intercept) 45.14346803 30.5478836 59.7390525  7.2159242 39 2.283608e-07
x1  -0.06516927 -0.2557976  0.1254590  0.0942449 39 

Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread John
On Wednesday 24 November 2004 07:34, Witold Eryk Wolski wrote:
 Hi,

 I want to draw a scatter plot with 1M  and more points and save it as pdf.
 This makes the pdf file large.
 So i tried to save the file first as png and than convert it to pdf.
 This looks OK if printed but if viewed e.g. with acrobat as document
 figure the quality is bad.

 Anyone knows a way to reduce the size but keep the quality?

I would strongly suggest a different method to present the data such as a 
contour plot or 3D bar plot.  An XY plot with a million points is unlikely to 
be readable unless it is produced as a large format print.  At 200 DPI 
printed, 1,000,000 discrete points requires a minimum of a 5 inch (12.7 
 
cm) by 5 inch area.  Besides, other than being visually overwhelming, what 
information would such a plot offer a viewer?

John

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Sean Davis
Do you have a measures of scatter or can you pick outliers that 
could allow you to produce a mixed plot using either density or 
hexbinned data with only outliers placed after-the-fact using points()?

Sean
-Original Message-
From: Witold Eryk Wolski [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 24, 2004 7:35 AM
To: R Help Mailing List
Subject: [R] scatterplot of 10 points and pdf file format
Hi,
I want to draw a scatter plot with 1M  and more points and
save it as pdf.
This makes the pdf file large.
So i tried to save the file first as png and than convert it to pdf.
This looks OK if printed but if viewed e.g. with acrobat as document
figure the quality is bad.
Anyone knows a way to reduce the size but keep the quality?
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread james . holtman




Have you tried

plot(...,pch='.')

This will use the period as the plotting character instead of the 'circle'
which is drawn.  This should reduce the size of the PDF file.

I have done scatter plots with 2M points and they are typically meaningless
with that many points overlaid.  Check out 'hexbin' on Bioconductor (you
can download the package from the RGUI window.  This is a much better way
of showing some information since it will plot the number of points that
are within a hexagon.  I have found this to be a better way of looking at
some data.
__
James HoltmanWhat is the problem you are trying to solve?
Executive Technical Consultant  --  Office of Technology, Convergys
[EMAIL PROTECTED]
+1 (513) 723-2929



   
  Witold Eryk Wolski
   
  [EMAIL PROTECTED]To:   R Help Mailing List 
[EMAIL PROTECTED]
  cc:  
   
  Sent by: Subject:  [R] scatterplot of 
10 points and pdf file format  
  [EMAIL PROTECTED] 
   
  ath.ethz.ch   
   

   

   
  11/24/2004 10:34  
   

   

   




Hi,

I want to draw a scatter plot with 1M  and more points and save it as pdf.
This makes the pdf file large.
So i tried to save the file first as png and than convert it to pdf.
This looks OK if printed but if viewed e.g. with acrobat as document
figure the quality is bad.

Anyone knows a way to reduce the size but keep the quality?


/E

--
Dipl. bio-chem. Witold Eryk Wolski
MPI-Moleculare Genetic
Ihnestrasse 63-73 14195 Berlin
tel: 0049-30-83875219 __(_
http://www.molgen.mpg.de/~wolski  \__/'v'
http://r4proteomics.sourceforge.net||/   \
mail: [EMAIL PROTECTED]^^ m m
  [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Re: T-test syntax question

2004-11-24 Thread Jessie F
Actually, you can still use t.test with one vector of data. Say, the 
differences is d ( a vector (or an array of numbers), you can use t.test(d), 
then by default, it testing whether mu=0, you can also specify confidence 
level by adding conf.level = 0.95 etc.

You can also type ?t.test in R command to get more information with R.help.
Hope this helps!
S Fan

From: Vito Ricci [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: [R] Re: T-test syntax question
Date: Wed, 24 Nov 2004 12:32:06 +0100 (CET)
Hi,
In case of paired data, if you have only differencies
and not original data you can get this t test based on
differencies:
Say d is the vector with differencies data and suppose
you wish to test if the mean of differency is equal to
zero:
md-mean(d) ## sample mean of differencies
sdd-sd(d) ## sample sd of differencies
n-length(d) ## sample size
t.value-(md/(sdd/sqrt(n))) ## sample t-value with n-1
df
pt(t.value,n-1,lower.tail=FALSE) ## p-value of test
 set.seed(13)
 d-rnorm(50)
 md-mean(d) ## sample mean of differencies
 sdd-sd(d) ## sample sd of differencies
 n-length(d) ## sample size
 t.value-(md/(sdd/sqrt(n))) ## sample t-value with
n-1 df
 pt(t.value,n-1,lower.tail=FALSE) ## p-value of test
[1] 0.5755711
Best regards,
Vito

Steven F. Freeman wrote:
I'd like to do a t-test to compare the Delta values of
items with Crit=1
with Delta values of items with Crit=0. What is the
t.test syntax?
It should produce a result like this below (I can't
get in touch with the
person who originally did this for me)
Welch Two Sample t-test
data:  t1$Delta by Crit
t = -3.4105, df = 8.674, p-value = 0.008173
alternative hypothesis: true
difference in means is not equal to 0
95 percent confidence interval:
 -0.04506155 -0.00899827
sample estimates:
mean in group FALSE  mean in group TRUE
 0.03331391  0.06034382
Thanks.
=
Diventare costruttori di soluzioni
Became solutions' constructors
The business of the statistician is to catalyze
the scientific learning process.
George E. P. Box
Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese  
http://www.modugno.it/archivio/palese/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Technology. 

 Start enjoying all the benefits of MSN® Premium right now and get the 
first two months FREE*.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Barry Rowlingson

I would strongly suggest a different method to present the data such as a 
contour plot or 3D bar plot.  An XY plot with a million points is unlikely to 
be readable unless it is produced as a large format print.  At 200 DPI 
printed, 1,000,000 discrete points requires a minimum of a 5 inch (12.7  
cm) by 5 inch area.  Besides, other than being visually overwhelming, what 
information would such a plot offer a viewer?
 I recall some of our extreme value statistics people printing things 
like this. Several million points on a plot. Most of which were in a 
big, thick block of toner, and then a few hundred at the extremes which 
was where they where interested in looking.

 Of course these things took an hour to print on a PostScript printer 
at the time. I think I suggested only plotting points for which X  
someThreshold. Saved on toner and time. Got a bit tricky in the 
bivariate case though, where you really needed to plot points outside 
some ellipse that you knew would otherwise be a big black blob, and then 
you filled that in with a black ellipse.

 Contours or aggregation wasn't any use, since they were interested in 
the point patterns of the extreme value data.

Baz
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Searching for antilog function

2004-11-24 Thread Heather J. Branton
Dear R-users,
I have a basic question about how to determine the antilog of a variable.
Say I have some number, x, which is a factor of 2 such that x = 2^y. I 
want to figure out what y is, i.e. I am looking for the antilog base 2 of x.

I have found log2 in the Reference Manual. But I am struggling how to 
get the antilog of that.

Any help will be appreciated!
 version
platform i386-pc-mingw32
arch i386  
os   mingw32   
system   i386, mingw32 
status 
major1 
minor9.1   
year 2004  
month06
day  21
language R

...heather
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] what does order() stand for in an lme formula?

2004-11-24 Thread Peter Dalgaard
Harry Athanassiou [EMAIL PROTECTED] writes:

 I'm a beginner in R, and trying to fit linear models with different
 intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical
 variable. I cannot understand whether I should use
 y1 ~ x1 +1
 or
 y1 ~ order(x1) + 1
 Although in the toy example included it makes a small difference, in models
 with many groups the models without order() converge slower if at all!

Er?

What gave you the idea of using order in the first place? To the best
of my knowledge, order(x) is also in this context just a function,
which for the nth observation returns the position of the nth largest
observation in x. This is not likely to make sense as a predictor in a
model. 


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] SAS or R software

2004-11-24 Thread bogdan romocea
neela v writes:
 Hi all there
  
 Can some one clarify me on this issue, features wise which is
better R or SAS, leaving the commerical aspect associated with it. I
suppose there are few people who have worked on both R and SAS and
wish they would be able to help me in deciding on this.
  
 THank you for the help
 


I very much doubt you can make an informed decision if you leave the
commercial aspect (license) aside. A single Base SAS installation
(server) can cost tens of thousands of [[your currency here; may need
to multiply by 10 or 100 or more]] in the first year, then a
percentage of that in the following years. (SAS software is not
purchased, but licensed on a yearly basis.) Want more than Base SAS?
Prepare your wallet: thousands upon thousands (per year) for
regression, anova, clustering (SAS/Stat), graphics (SAS/Graph), time
series (SAS/ETS), optimizations (SAS/OR) etc. Then, if you want
decision trees and neural networks (Enterprise Miner), I warmly
recommend you to quickly find a chair and sit down before you hear
the price tag. 

Will you always work for an organization that licenses SAS software?
Will the organization license all the modules you'll need? Will those
modules do everything you want? As others have said, R is a lot more
flexible, and the GPL ensures that whatever you can do today will
continue to be expanded and improved (much faster than SAS Institute
would want or be able to expand/improve SAS). 

All in all, if you're primarily interested in data analysis (and
don't want, for example, to get a job as a SAS programmer) and still
choose SAS, you will regret it one day. The benefits are few (such as
robust manipulation of massive data sets - I mean in excess of
hundreds of millions of rows) and the risks are high (whatever you do
is dependent on proprietary, very expensive software). With R, almost
the opposite is true: lots of benefits and no risks (nothing can take
R away from you).

HTH,
b.






__ 

All your favorites on one personal page – Try My Yahoo!

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] SMVs

2004-11-24 Thread stephenc
Hi Everyone
 
I am struggling to  get going with support  vector machines in R - smv()
and predict() etc.  Does anyone know of a good tutorial covering R and
these things?  
 
Stephen

[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Searching for antilog function

2004-11-24 Thread Liaw, Andy
What's wrong with log2()?

 log2(16)
[1] 4

Isn't that exactly what you asked for?

Andy

 From: Heather J. Branton
 
 Dear R-users,
 
 I have a basic question about how to determine the antilog of 
 a variable.
 
 Say I have some number, x, which is a factor of 2 such that x 
 = 2^y. I 
 want to figure out what y is, i.e. I am looking for the 
 antilog base 2 of x.
 
 I have found log2 in the Reference Manual. But I am struggling how to 
 get the antilog of that.
 
 Any help will be appreciated!
 
   version
 
 platform i386-pc-mingw32
 arch i386  
 os   mingw32   
 system   i386, mingw32 
 status 
 major1 
 minor9.1   
 year 2004  
 month06
 day  21
 language R
 
 ...heather
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Searching for antilog function

2004-11-24 Thread Spencer Graves
 Consider: 

 exp(log(1:11))
[1]  1  2  3  4  5  6  7  8  9 10 11
 2^log(1:11, 2)
[1]  1  2  3  4  5  6  7  8  9 10 11
 2^logb(1:11, 2)
[1]  1  2  3  4  5  6  7  8  9 10 11
 10^log10(1:11)
[1]  1  2  3  4  5  6  7  8  9 10 11
 2^log2(1:11)
[1]  1  2  3  4  5  6  7  8  9 10 11
 Does this answer the question? 

 hope this helps. spencer graves
Heather J. Branton wrote:
Dear R-users,
I have a basic question about how to determine the antilog of a variable.
Say I have some number, x, which is a factor of 2 such that x = 2^y. I 
want to figure out what y is, i.e. I am looking for the antilog base 2 
of x.

I have found log2 in the Reference Manual. But I am struggling how to 
get the antilog of that.

Any help will be appreciated!
 version
platform i386-pc-mingw32
arch i386  os   mingw32   system   i386, mingw32 
status major1 minor9.1   
year 2004  month06day  21
language R

...heather
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Searching for antilog function

2004-11-24 Thread Duncan Murdoch
On Wed, 24 Nov 2004 12:26:46 -0500, Heather J. Branton [EMAIL PROTECTED]
wrote :

Dear R-users,

I have a basic question about how to determine the antilog of a variable.

Say I have some number, x, which is a factor of 2 such that x = 2^y. I 
want to figure out what y is, i.e. I am looking for the antilog base 2 of x.

I have found log2 in the Reference Manual. But I am struggling how to 
get the antilog of that.

You seem to be confusing log with antilog, but log2(x) and 2^y are
inverses of each other, i.e.

log2(2^y) equals y

and 

2^log2(x) equals x

(up to rounding error, of course).

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Searching for antilog function

2004-11-24 Thread Heather J. Branton
Yes! Somehow I must have made an entry error when I tried that before as 
I was getting something completely different!

Thank you.
...heather
Liaw, Andy wrote:
What's wrong with log2()?
 

log2(16)
   

[1] 4
Isn't that exactly what you asked for?
Andy
 

From: Heather J. Branton
Dear R-users,
I have a basic question about how to determine the antilog of 
a variable.

Say I have some number, x, which is a factor of 2 such that x 
= 2^y. I 
want to figure out what y is, i.e. I am looking for the 
antilog base 2 of x.

I have found log2 in the Reference Manual. But I am struggling how to 
get the antilog of that.

Any help will be appreciated!
 version
platform i386-pc-mingw32
arch i386  
os   mingw32   
system   i386, mingw32 
status 
major1 
minor9.1   
year 2004  
month06
day  21
language R

...heather
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

   


--
Notice:  This e-mail message, together with any attachments, contains information of 
Merck  Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
and/or its affiliates (which may be known outside the United States as Merck Frosst, 
Merck Sharp  Dohme or MSD and in Japan, as Banyu) that may be confidential, 
proprietary copyrighted and/or legally privileged. It is intended solely for the use of 
the individual or entity named on this message.  If you are not the intended recipient, 
and have received this message in error, please notify us immediately by reply e-mail 
and then delete it from your system.
--
 

--
___
Heather J. Branton  Public Data Queries
Data Specialist 310 Depot Street, Ste C
734.213.4964 x312  Ann Arbor, MI  48104
  U.S. Census Microdata At Your Fingertips
 http://www.pdq.com
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] seriesMerge

2004-11-24 Thread Yasser El-Zein
Is there a function in R that is equivalent to S-PLUS's 
seriesMerge(x1, x2, pos=union)
where x1, and x2 are of class timeSeries

seriesMerge is in S-PLUS's finmetrics. I looked into R's mergeSeries
(in fSeries part of Rmetrics) but I could not make it behave quite the
same. In R it expected a timeSeries object and a matrix of the same
row count. In S-PLUS when using the union option both objects can be
of different lengths.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Greg Snow
How about the following to plot only the 1,000 or so most extreem points
(the outliers):

x - rnorm(1e6)
y - 2*x+rnorm(1e6)

plot(x,y,pch='.')

tmp - chull(x,y)

while( length(tmp)  1000 ){
tmp - c(tmp, seq(along=x)[-tmp][ chull(x[-tmp],y[-tmp]) ] )
}

points(x[tmp],y[tmp], col='red')

now just replace the initial plot with a hexbin or contour plot and you
should have something that takes a lot less room but still shows the
locations of the outer points.



Greg Snow, Ph.D.
Statistical Data Center
[EMAIL PROTECTED]
(801) 408-8111

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Searching for antilog function

2004-11-24 Thread Heather J. Branton
Thank you so much for each of your responses. But to make sure I am 
clear (in my own mind), is this correct?

If  x = 2^y
Then  y = log2(x)
Thanks again. I know this is basic.
...heather
Duncan Murdoch wrote:
On Wed, 24 Nov 2004 12:26:46 -0500, Heather J. Branton [EMAIL PROTECTED]
wrote :
 

Dear R-users,
I have a basic question about how to determine the antilog of a variable.
Say I have some number, x, which is a factor of 2 such that x = 2^y. I 
want to figure out what y is, i.e. I am looking for the antilog base 2 of x.

I have found log2 in the Reference Manual. But I am struggling how to 
get the antilog of that.
   

You seem to be confusing log with antilog, but log2(x) and 2^y are
inverses of each other, i.e.
log2(2^y) equals y
and 

2^log2(x) equals x
(up to rounding error, of course).
Duncan Murdoch
 

--
___
Heather J. Branton  Public Data Queries
Data Specialist 310 Depot Street, Ste C
734.213.4964 x312  Ann Arbor, MI  48104
  U.S. Census Microdata At Your Fingertips
 http://www.pdq.com
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] reshaping of data for barplot2

2004-11-24 Thread Marc Schwartz
On Wed, 2004-11-24 at 19:24 +0100, Jean-Louis Abitbol wrote:
 Dear All,
 
 I have  the following data coming out from 
 
 s - with(final,
summarize(norm, llist(gtt,fdiab),
  function(norm) {
   n - sum(!is.na(norm))
   s - sum(norm, na.rm=T)
   binconf(s, n)
  }, type='matrix')
 )
 ie 
 
  gtt fdiab   norm.norm  norm.norm2  norm.norm3
 18PLNo  3.70370370  0.18997516 18.28346593
 19PL   Yes  3.57142857  0.18319034 17.71219774
 13TT1   No  9.09090909  3.59221932 21.15923917
 14TT1  Yes  1.81818182  0.09326054  9.60577606
 ...
 10  HIGHNo 26.53061224 16.21128213 40.26228897
 11  HIGH   Yes 10.  4.66428345 20.14946472
 
 I would like to reshape the data so that I can barplot2 treatments (gtt)
 with 2 beside bars for fdiab  yes/no and add CI.
 
 Various attemps have been unsuccessful as I have not understood both the
 logic of beside and the nature of structures to be passed to barplot2.
 Not enough know-how with reshape and transpose either.
 
 Needless to say Dotplot works great with this kind of data but some
 Authority requests side:side bars with CI.
 
 Thanks for any help.

Jean-Louis,

For an easy example, see the help in barplot2, which uses the VADeaths
dataset. The dataset looks like:

 VADeaths
  Rural Male Rural Female Urban Male Urban Female
50-54   11.7  8.7   15.4  8.4
55-59   18.1 11.7   24.3 13.6
60-64   26.9 20.3   37.0 19.3
65-69   41.0 30.9   54.6 35.1
70-74   66.0 54.3   71.1 50.0

Now use:

barplot2(VADeaths)

This will yield a stacked bar plot, where there are 4 bars (one for each
column in the matrix). Each bar then consists of 5 stacked sections,
with each section representing the row values in each column.

Now try:

barplot(VADeaths, beside = TRUE)

This now yields 4 groups of bars, with one group for each column. Each
group then consists of 5 bars, one bar for each row value.

Hopefully, that gives you some insight into how the matrix structure
interacts with the 'beside' argument.

In the case of your data above, I read the few rows into a data frame
called 'df'. So 'df' looks like:

 df
   gtt fdiab norm.norm  norm.norm2 norm.norm3
1   PLNo  3.703704  0.18997516  18.283466
2   PL   Yes  3.571429  0.18319034  17.712198
3  TT1No  9.090909  3.59221932  21.159239
4  TT1   Yes  1.818182  0.09326054   9.605776
5 HIGHNo 26.530612 16.21128213  40.262289
6 HIGH   Yes 10.00  4.66428345  20.149465


To follow the VADeaths example above, you need to re-shape the required
columns, each as three column matrices, as follows:

height - matrix(df$norm.norm, ncol = 3)
ci.l - matrix(df$norm.norm2, ncol = 3)
ci.u - matrix(df$norm.norm3, ncol = 3)
bars - matrix(df$fdiab, ncol = 3)

Now, 'height' looks like:

 height
 [,1] [,2] [,3]
[1,] 3.703704 9.090909 26.53061
[2,] 3.571429 1.818182 10.0

ci.l and ci.u and bars will of course look similar.


So, now you could use barplot2 as follows:

mp - barplot2(height, plot.ci = TRUE, 
   ci.l = ci.l, ci.u = ci.u, beside = TRUE,
   names.arg = bars)


Note that I save the bar midpoints in 'mp'.

Now, you can go back and put in the bar group labels as follows. First
break out the unique values of 'gtt' keeping the order intact by using
matrix():

labels - matrix(df$gtt, ncol = 2, byrow = TRUE)
mtext(side = 1, at = colMeans(mp), text = labels[, 1], line = 3)

Note that I use 'byrow = TRUE' in the call to matrix() so that the order
of the matrix is set properly. Thus, each column contains the group
labels and looks like:

 labels
 [,1]   [,2]  
[1,] PL   PL  
[2,] TT1  TT1 
[3,] HIGH HIGH

So we just use the first column above in the call to mtext().


So that should do it and can be extended to your full dataset if the
format is consistent with what you have above.


One final (and important) note.  There is another approach here that can
be used, which is to keep your data in its initial state and specify the
'space' argument explicitly in the call to barplot2. This is actually
less work than what we did above. In this case, we use the 'space'
argument to group the bars explicitly, which is in effect, what the
'beside' argument does internally.

We use each column from 'df' directly and set the 'space' argument to a
repeating sequence of c(1, 0) for each of the 3 groups. Note that here
we need to explicitly define the colors to use, since barplot2 uses
'grey' by default when 'height' is a vector (as does barplot). We also
need to convert df$diab to a vector, otherwise the numeric factor codes
will be used.

The sequence then goes like this:

mp - barplot2(df$norm.norm, plot.ci = TRUE, 
   ci.l = df$norm.norm2, ci.u = df$norm.norm3,
   space = rep(c(1, 0), 3),
   col = rep(c(red, yellow), 3),
  

Re: [R] seriesMerge

2004-11-24 Thread Dirk Eddelbuettel
On Wed, Nov 24, 2004 at 03:29:53PM -0500, Yasser El-Zein wrote:
 Is there a function in R that is equivalent to S-PLUS's 
 seriesMerge(x1, x2, pos=union)
 where x1, and x2 are of class timeSeries
 
 seriesMerge is in S-PLUS's finmetrics. I looked into R's mergeSeries
 (in fSeries part of Rmetrics) but I could not make it behave quite the
 same. In R it expected a timeSeries object and a matrix of the same
 row count. In S-PLUS when using the union option both objects can be
 of different lengths.

The its (short for irregular timeseries) package has union() and
intersect(). The zoo package also some functions for this, I think.

Topics like this get discussed a bit on the r-sig-finance list, you may want
to glance at the archives or do some conditional googleing.

Hth, Dirk

-- 
If your hair is standing up, then you are in extreme danger.
  -- http://www.usafa.af.mil/dfp/cockpit-phys/fp1ex3.htm

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] confidence interval of a average...

2004-11-24 Thread Duncan Harris
I have a sample of lung capacities from a population measured against 
height.  I need to know the 95% CI of the lung capacity of a person of 
average height.

I have fitted a regression line.
How do I get a minimum and maximum values of the 95% CI?
My thinking was that this has something to do with covariance, but how?
My other thinking was that I could derive the 0.975 (sqrt 0.95) CI for the 
height.  Then I could take the lower height 0.975 CI value and calculate 
from that the lower 0.975 value from the lung capacity. And then do the same 
for the taller people.  That is bound to be wrong though.

Dunc
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Installing gregmisc under windows 2000

2004-11-24 Thread Robert W. Baer, Ph.D.
I went through the following steps using RGUI menus to install gregmisc from
CRAN.  It appears to install but at the end R does not seem to be able to
find it.  Any idea what I'm doing wrong?

Thankjs,
Rob

 local({a - CRAN.packages()
+ install.packages(select.list(a[,1],,TRUE), .libPaths()[1], available=a,
dependencies=TRUE)})
trying URL `http://cran.r-project.org/bin/windows/contrib/2.0/PACKAGES'
Content type `text/plain; charset=iso-8859-1' length 23113 bytes
opened URL
downloaded 22Kb

trying URL
`http://cran.r-project.org/bin/windows/contrib/2.0/gregmisc_2.0.0.zip'
Content type `application/zip' length 687958 bytes
opened URL
downloaded 671Kb

bundle 'gregmisc' successfully unpacked and MD5 sums checked

Delete downloaded files (y/N)? y

updating HTML package descriptions
 library(gregmisc)
Error in library(gregmisc) : There is no package called 'gregmisc'

 version
 _
platform i386-pc-mingw32
arch i386
os   mingw32
system   i386, mingw32
status
major2
minor0.1
year 2004
month11
day  15
language R

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Modeling censored binomial / Poisson data

2004-11-24 Thread Greg Tarpinian
I have some count data (0 - 1 at each time point for
each test subject) that I would like to model.  Since
the 1's are rather sparse, the Poisson distribution
comes to mind but I would also consider the binomial.

The data are censored as the data come from a clinical
trial and subjects were able to leave the study and
some were therefore lost to follow-up.

I am aware of the capabilities of lme( ) and nlme( )
through the excellent book by Pinheiro and Bates, but
am at a loss as to what to do with these count data.
Ideally, I would like to compare the placebo and 
treatment groups in a meaningful way.

Any input would be greatly appreciated.

Thanks,

 Greg



__ 

All your favorites on one personal page – Try My Yahoo!

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] how to remove time series trend in R?

2004-11-24 Thread Terry Mu
I got a set of data which has seasonal trend in form of sinx, cosx, I
don't have any idea on how to deal with it.

Can you give me a starting point? Thanks,

Terry

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Installing gregmisc under windows 2000

2004-11-24 Thread Prof Brian Ripley
gregmisc is a bundle, not a package.  Its description on CRAN is
gregmiscBundle of gtools, gdata, gmodels, gplots
so try one of those packages.
On Wed, 24 Nov 2004, Robert W. Baer, Ph.D. wrote:
I went through the following steps using RGUI menus to install gregmisc from
CRAN.  It appears to install but at the end R does not seem to be able to
find it.  Any idea what I'm doing wrong?
Thankjs,
Rob

local({a - CRAN.packages()
+ install.packages(select.list(a[,1],,TRUE), .libPaths()[1], available=a,
dependencies=TRUE)})
trying URL `http://cran.r-project.org/bin/windows/contrib/2.0/PACKAGES'
Content type `text/plain; charset=iso-8859-1' length 23113 bytes
opened URL
downloaded 22Kb
trying URL
`http://cran.r-project.org/bin/windows/contrib/2.0/gregmisc_2.0.0.zip'
Content type `application/zip' length 687958 bytes
opened URL
downloaded 671Kb
bundle 'gregmisc' successfully unpacked and MD5 sums checked
 ^^^
Delete downloaded files (y/N)? y
updating HTML package descriptions
library(gregmisc)
Error in library(gregmisc) : There is no package called 'gregmisc'
   ^^^
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Installing gregmisc under windows 2000

2004-11-24 Thread Peter Dalgaard
Robert W. Baer, Ph.D. [EMAIL PROTECTED] writes:

 I went through the following steps using RGUI menus to install gregmisc from
 CRAN.  It appears to install but at the end R does not seem to be able to
 find it.  Any idea what I'm doing wrong?

It's a bundle nowadays, so you need to load one of it's constituent
packages. 

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RE: RODBC and Table views

2004-11-24 Thread Niels Steen Krogh
channel2- odbcConnectAccess(C:\\Documents and
Settings\\Fælles\\Journal\\DATASUPERMARKED\\DANBIONOVEMBER2004.mdb, uid=)
sqlQuery(channel2,select * from Afdelinger_output_tabel1B order by antal desc)

Does take views and tables!
Niels Steen Krogh
Konsulent
ZiteLab

Mail: -- [EMAIL PROTECTED]
Telefon: --- +45 38 88 86 13
Mobil: - +45 22 67 37 38
Adresse: --- Zitelab
 Solsortvej 44
 2000 F.

ZiteLab
-Let's Empower Your Data with Webservices

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Installing gregmisc under windows 2000

2004-11-24 Thread Robert W. Baer, Ph.D.
Thanks for the clarification.

Pursuant to the recent dicussion of GUI promoting ignornce among users, I
plead guilty for CRAN installs  but they have generally saved so much
time.g.  This does raise the question as to whether gregmisc and other
bundles should appear on the install packages from CRAN pop-up in RGUI.
It also leaves me wondering what exactly was the REAL result of the
apparently successful  gregmisc install .

I did help.search(bundle) and coming away with nada.  I am not sure where
I should head to de-dumb myself.  I found a little in writing R extensions,
but this did not clarify the interaction with the RGUI install procedure for
me.

Thanks again.

Rob

-
- Original Message - 
From: Peter Dalgaard [EMAIL PROTECTED]
To: Robert W. Baer, Ph.D. [EMAIL PROTECTED]
Cc: R-Help [EMAIL PROTECTED]
Sent: Wednesday, November 24, 2004 4:13 PM
Subject: Re: [R] Installing gregmisc under windows 2000


 Robert W. Baer, Ph.D. [EMAIL PROTECTED] writes:

  I went through the following steps using RGUI menus to install gregmisc
from
  CRAN.  It appears to install but at the end R does not seem to be able
to
  find it.  Any idea what I'm doing wrong?

 It's a bundle nowadays, so you need to load one of it's constituent
 packages.

 -- 
O__   Peter Dalgaard Blegdamsvej 3
   c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
  (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] confidence interval of a average...

2004-11-24 Thread Robert W. Baer, Ph.D.
It depends on whether you want to do 95% ocnfidence intervals on the
predicition or the mean vital capacity.  Try the following and see if it
gets you started:
#Simulate data
height=48:72
vc=height*10+20*rnorm(72-48+1)
# Do regression
lm.vc=lm(vc~height)

# Confidence interval on mean vc
predict.lm(lm.vc,interval=confidence)
#confidence interval on prediced vc
predict.lm(lm.vc,interval=prediction)

#plot everything
plot(vc~height)

matlines(height,predict.lm(lm.vc,interval=c), lty=c(1,2,2),col='blue')
matlines(height,predict.lm(lm.vc,interval=p),lty=c(1,3,3),col=c('black','r
ed','red')) Rob
 --
 Fom: Duncan Harris [EMAIL PROTECTED]
  I have a sample of lung capacities from a population measured against
  height.  I need to know the 95% CI of the lung capacity of a person of
  average height.
 
  I have fitted a regression line.
 
  How do I get a minimum and maximum values of the 95% CI?
 
  My thinking was that this has something to do with covariance, but how?
 
  My other thinking was that I could derive the 0.975 (sqrt 0.95) CI for
the
  height.  Then I could take the lower height 0.975 CI value and calculate
  from that the lower 0.975 value from the lung capacity. And then do the
 same
  for the taller people.  That is bound to be wrong though.
 
  Dunc
 
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] confidence interval of a average...

2004-11-24 Thread Robert W. Baer, Ph.D.
Sorry.  The last code line got destroyed by my emailer and should read:
 matlines(height,predict.lm(lm.vc,interval=p),
+ lty=c(1,3,3),col=c('black','red','red'))


- Original Message - 
From: Robert W. Baer, Ph.D. [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, November 24, 2004 4:56 PM
Subject: Re: [R] confidence interval of a average...


 It depends on whether you want to do 95% ocnfidence intervals on the
 predicition or the mean vital capacity.  Try the following and see if it
 gets you started:
 #Simulate data
 height=48:72
 vc=height*10+20*rnorm(72-48+1)
 # Do regression
 lm.vc=lm(vc~height)

 # Confidence interval on mean vc
 predict.lm(lm.vc,interval=confidence)
 #confidence interval on prediced vc
 predict.lm(lm.vc,interval=prediction)

 #plot everything
 plot(vc~height)

 matlines(height,predict.lm(lm.vc,interval=c), lty=c(1,2,2),col='blue')

matlines(height,predict.lm(lm.vc,interval=p),lty=c(1,3,3),col=c('black','r
 ed','red')) Rob
  --
  Fom: Duncan Harris [EMAIL PROTECTED]
   I have a sample of lung capacities from a population measured against
   height.  I need to know the 95% CI of the lung capacity of a person of
   average height.
  
   I have fitted a regression line.
  
   How do I get a minimum and maximum values of the 95% CI?
  
   My thinking was that this has something to do with covariance, but
how?
  
   My other thinking was that I could derive the 0.975 (sqrt 0.95) CI for
 the
   height.  Then I could take the lower height 0.975 CI value and
calculate
   from that the lower 0.975 value from the lung capacity. And then do
the
  same
   for the taller people.  That is bound to be wrong though.
  
   Dunc
  
   __
   [EMAIL PROTECTED] mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
  
 

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Installing gregmisc under windows 2000

2004-11-24 Thread Yuandan Zhang
That seems not the case under linux in term of installation. You can install 
this bundle in the same way as installing an individul package.

eg 

R CMD INSTALL  CRAN/contrib/main/gregmisc_2.0.0.tar.gz

to get all constituent packages installed. 
 
 
On Wed, 24 Nov 2004 22:14:14 + (GMT)
Prof Brian Ripley [EMAIL PROTECTED] wrote:

 gregmisc is a bundle, not a package.  Its description on CRAN is
 
 gregmisc  Bundle of gtools, gdata, gmodels, gplots
 
 so try one of those packages.


-- 

--
Yuandan Zhang, PhD

Animal Genetics and Breeding Unit
The University of New England
Armidale, NSW, Australia, 2351

E-mail:   [EMAIL PROTECTED]
Phone:(61) 02 6773 3786
Fax:  (61) 02 6773 3266
http://agbu.une.edu.au

  AGBU is a joint venture of NSW Primary Industries 
  and The University of New England to undertake 
  genetic RD for Australia's Livestock Industries   


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Patrick Connolly
On Wed, 24-Nov-2004 at 10:22AM -0600, Marc Schwartz wrote:

| On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
|  Hi,
|  
|  I want to draw a scatter plot with 1M  and more points and save it as pdf.
|  This makes the pdf file large.
|  So i tried to save the file first as png and than convert it to pdf. 
|  This looks OK if printed but if viewed e.g. with acrobat as document 
|  figure the quality is bad.
|  
|  Anyone knows a way to reduce the size but keep the quality?
| 
| Hi Eryk!
| 
| Part of the problem is that in a pdf file, the vector based instructions
| will need to be defined for each of your 10 ^ 6 points in order to draw
| them.
| 
| When trying to create a simple example:
| 
| pdf()
| plot(rnorm(100), rnorm(100))
| dev.off()
| 
| The pdf file is 55 Mb in size.
| 
| One immediate thought was to try a ps file and using the above plot, the
| ps file was only 23 Mb in size. So note that ps can be more efficient.
| 
| Going to a bitmap might result in a much smaller file, but as you note,
| the quality does degrade as compared to a vector based image.
| 
| I tried the above to a png, then converted to a pdf (using 'convert')
| and as expected, the image both viewed and printed was pixelated,
| since the pdf instructions are presumably drawing pixels and not vector
| based objects.

Using bitmap( ... , res = 300), I get a bitmap file of 56 Kb.

It's rather slow, most of the time being taken up using gs which is
converting the vector image I suspect.  Time would be much shorter if,
say a circle of diameter of 4 is left unplotted in the middle and
others have mentioned other ways of reducing redundant points.

A pdf file slightly larger than the png file can be made directly from
OpenOffice that has the png imported into it.  For a plot of 160mm
square, this pdf printed unpixelated.

Depending on what size (dimensions) you need to finish up with, you
might find you could get away with a lower resolution than 300 dpi,
but I usually find 200 too ragged.

HTH

-- 
Patrick Connolly
HortResearch
Mt Albert
Auckland
New Zealand 
Ph: +64-9 815 4200 x 7188
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Installing gregmisc under windows 2000

2004-11-24 Thread Liaw, Andy
If I'm not mistaken, bundle is really only useful as a concept for
distribution and installation.  You distribute and install a bundle, but
load the individual packages when you want to use them.  Once you install
the bundle, you won't see the name of the bundle in the list of installed
packages, but you see the constituent packages, and those are what you load
when you want to use them.  [This is the same on all platforms, BTW.]

Andy

 From: Robert W. Baer, Ph.D.
 Sent: Wednesday, November 24, 2004 5:45 PM
 To: Peter Dalgaard
 Cc: R-Help
 Subject: Re: [R] Installing gregmisc under windows 2000
 
 
 Thanks for the clarification.
 
 Pursuant to the recent dicussion of GUI promoting ignornce 
 among users, I
 plead guilty for CRAN installs  but they have generally saved so much
 time.g.  This does raise the question as to whether 
 gregmisc and other
 bundles should appear on the install packages from CRAN 
 pop-up in RGUI.
 It also leaves me wondering what exactly was the REAL result of the
 apparently successful  gregmisc install .
 
 I did help.search(bundle) and coming away with nada.  I am 
 not sure where
 I should head to de-dumb myself.  I found a little in writing 
 R extensions,
 but this did not clarify the interaction with the RGUI 
 install procedure for
 me.
 
 Thanks again.
 
 Rob
 
 -
 - Original Message - 
 From: Peter Dalgaard [EMAIL PROTECTED]
 To: Robert W. Baer, Ph.D. [EMAIL PROTECTED]
 Cc: R-Help [EMAIL PROTECTED]
 Sent: Wednesday, November 24, 2004 4:13 PM
 Subject: Re: [R] Installing gregmisc under windows 2000
 
 
  Robert W. Baer, Ph.D. [EMAIL PROTECTED] writes:
 
   I went through the following steps using RGUI menus to 
 install gregmisc
 from
   CRAN.  It appears to install but at the end R does not 
 seem to be able
 to
   find it.  Any idea what I'm doing wrong?
 
  It's a bundle nowadays, so you need to load one of it's constituent
  packages.
 
  -- 
 O__   Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
   (*) \(*) -- University of Copenhagen   Denmark  Ph: 
 (+45) 35327918
  ~~ - ([EMAIL PROTECTED]) FAX: 
 (+45) 35327907
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Ted Harding
On 24-Nov-04 Prof Brian Ripley wrote:
 On Wed, 24 Nov 2004 [EMAIL PROTECTED] wrote:
 
 1. Multiply the data by some factor and then round the
   results to an integer (to avoid problems in step 2).
   Factor chosen so that the result of (4) below is
   satisfactory.

 2. Eliminate duplicates in the result of (1).

 3. Divide by the factor you used in (1).

 4. Plot the result; save plot to PDF.

 As to how to do it in R: the critical step is (2),
 which with so many points could be very heavy unless
 done by a well-chosen procedure. I'm not expert enough
 to advise about that, but no doubt others are.
 
 unique will eat that for breakfast
 
 x - runif(1e6)
 system.time(xx - unique(round(x, 4)))
 [1] 0.55 0.09 0.64 0.00 0.00
 length(xx)
 [1] 10001

'unique' will eat x for breakfast, indeed, but will have some
trouble chewing (x,y).

I still can't think of a neat way of doing that.

Best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 25-Nov-04   Time: 00:37:15
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Austin, Matt


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of
 [EMAIL PROTECTED]
 Sent: Wednesday, November 24, 2004 16:37 PM
 To: R Help Mailing List
 Subject: RE: [R] scatterplot of 10 points and pdf file format
 
 
 On 24-Nov-04 Prof Brian Ripley wrote:
  On Wed, 24 Nov 2004 [EMAIL PROTECTED] wrote:
  
  1. Multiply the data by some factor and then round the
results to an integer (to avoid problems in step 2).
Factor chosen so that the result of (4) below is
satisfactory.
 
  2. Eliminate duplicates in the result of (1).
 
  3. Divide by the factor you used in (1).
 
  4. Plot the result; save plot to PDF.
 
  As to how to do it in R: the critical step is (2),
  which with so many points could be very heavy unless
  done by a well-chosen procedure. I'm not expert enough
  to advise about that, but no doubt others are.
  
  unique will eat that for breakfast
  
  x - runif(1e6)
  system.time(xx - unique(round(x, 4)))
  [1] 0.55 0.09 0.64 0.00 0.00
  length(xx)
  [1] 10001
 
 'unique' will eat x for breakfast, indeed, but will have some
 trouble chewing (x,y).
 


  xx - data.frame(x=round(runif(100),4), y=round(runif(100),4))
  system.time(xx2 - unique(xx))
[1] 14.23  0.06 14.34NANA

The time does not seem too bad, depending on how many times it has to be
performed.
--Matt

Matt Austin
Statistician

Amgen 
One Amgen Center Drive
M/S 24-2-C
Thousand Oaks CA 93021
(805) 447 - 7431

 I still can't think of a neat way of doing that.
 
 Best wishes,
 Ted.
 
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
 Date: 25-Nov-04   Time: 00:37:15
 -- XFMail --
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Ted Harding
On 25-Nov-04 Ted Harding wrote:
 'unique' will eat x for breakfast, indeed, but will have some
 trouble chewing (x,y).
 
 I still can't think of a neat way of doing that.
 
 Best wishes,
 Ted.

Sorry, I don't want to be misunderstood.
I didn't mean that 'unique' won't work for arrays.
What I meant was:

 X-round(rnorm(1e6),3);Y-round(rnorm(1e6),3)
 system.time(unique(X))
[1] 0.74 0.07 0.81 0.00 0.00
 system.time(unique(cbind(X,Y)))
[1] 350.81   4.56 356.54   0.00   0.00

However, still rounding to 3 d.p. we can try packing:

 Z-1*X + 1000*Y
 system.time(W-unique(Z))
[1] 0.83 0.05 0.88 0.00 0.00
 length(W)
[1] 961523

Though the runtime is small we don't get much reduction
and still W has to be unpacked.

With rounding to 2 d.p.

 X-round(rnorm(1e6),2);Y-round(rnorm(1e6),2)
 Z-1*X + 1000*Y
 system.time(W-unique(Z))
[1] 1.31 0.01 1.32 0.00 0.00
 length(W)
[1] 209882

so now it's about 1/5, but visible discretisation must be
getting close.

With 1 d.p.

 X-round(rnorm(1e6),1);Y-round(rnorm(1e6),1)
 Z-1*X + 1000*Y
 system.time(W-unique(Z))
[1] 0.92 0.01 0.93 0.00 0.00
 length(W)
[1] 4953

there's a good reduction (about 1/200) but the discretisation
would definitely now be visible. However, as I suggested before,
there's an issue of choice of constant (i.e. of the resolution
of the discretisation so that there's a useful reduction and
also the plot is acceptable).

I'd still like to learn of a method which avoids the
above method of packing, which strikes me as clumsy
(but maybe it's the best way after all).

Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 25-Nov-04   Time: 01:45:48
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] confidence interval of a average...

2004-11-24 Thread Robert W. Baer, Ph.D.
Sorry if this was not clear.  This is more of a theoreticla question 
rather than a R-coding question.  I need to calculate

The predicted response and 95% prediction interval for a man of average 
height

So I need to predict the average response, which is easily done by taking 
the mean height and using the regression formula.

However, average height has to be calculated from the sample, and thus I 
have confidence in that.  Let's say the mean is 163cm, I think that I 
can't take the 163cm value and calculate the CI from just the sd of the 
lung capacity because that would be too narrow; I think covariance must 
come into it somehow, or can I just do a 97.5% CI on the height and take 
those extreme values and do a 97.% CI on them?
Then, you want the predition interval on the mean VC which is the thighter 
of the two confidence intervals and does not include the extra variability 
of VC about its mean.  As always with confidence intevals, you are free to 
look at either 95% CI or 97.5% CI depending on what kind of satement you'd 
like to make about your confidence.  I don't not understand you comment 
about covariance at all.

Let me try again with data in your units.  Note that CI varies with height 
and is smallest at the mean height whether you are talking about CI on the 
mean VC or CI on the predicted VC.  For comparison, the red lines are the 
95% CI on mean regression fit VC and the blue lines are 95% CI on 
predicted VC.   The simulated data is set to have a mean height that 
varies around 163 cm.

# Make simulated data with mean height near 163
# vc approximately in liter values with scatter
height=sort(rnorm(50,mean=163,sd=35))
vc=0.03*height+.5*rnorm(50)
#Plot the simulated data
plot(vc~height,ylab='vital capacity (l)',xlab='Height (cm)')
# Set up data frame with values of height you wish a ci on
# column heading must be same as for lm() fit x variable
# in this case, dataframe contains only mean height
mean.height.fit.ci=data.frame(height=mean(height))
#print out the mean height
mean.height.fit.ci
# fit the regression model
vc.lm=lm(vc~height)
#Draw 95% confidence intervals on mean vc at various heights(red) (min at 
mean(height)
matlines(height,predict.lm(vc.lm,interval=c),lty=c(1,2,2), 
col=c('black','red','red'))

#Draw 95% confidence intervals on new vc at various heights(blue) (min again 
at mean(height)
matlines(height,predict.lm(vc.lm,interval=p),lty=c(1,3,3), 
col=c('black','blue','blue'))

# Determine 95% CI on mean vc at mean height
predict.lm(vc.lm,mean.height.fit.ci,interval=confidence)
# Determine 97.5 5% CI on mean vc at mean height
predict.lm(vc.lm,mean.height.fit.ci,interval=confidence, level=0.975)
You might wish to read a little more about regression CIs in a good 
statistics book.

HTH,
Rob
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re:Hi!

2004-11-24 Thread APHA Registration Services
Thank you for your interest in APHA 2004. This is an automated reply 
confirming that we have received your email inquiry.

General questions, requests, modifications and/or new registrations 
received by email, fax or mail will be processed within 7 business 
days. At that time, you will receive either a letter of confirmation 
reflecting your requested modifications or a response to your inquiry 
via email.

Confirmation letters will be sent to the email address you had provided 
on your registration form, or by fax, if no email address was provided.

Meanwhile, the most up-to-date meeting and program information is 
online! Visit www.APHA.org and register at the same time! Registration 
does not get any more convenient than with One-Stop-Registration.

Should you have any questions, please do not hesitate to contact us.
Sincerely,
APHA Registrar.
Phone: (514) 228-3009
Fax: (514) 228-3148
Email: [EMAIL PROTECTED]
APHA c/o Laser Registration
1200 G Street, NW
Suite 800
Washington, DC 20005-3967
On Nov 24, 2004, at 9:03 PM, [EMAIL PROTECTED] wrote:
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Ted Harding
On 25-Nov-04 Austin, Matt wrote:
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of
 [EMAIL PROTECTED]
 Sent: Wednesday, November 24, 2004 16:37 PM
 To: R Help Mailing List
 Subject: RE: [R] scatterplot of 10 points and pdf file format
 
 
 On 24-Nov-04 Prof Brian Ripley wrote:
  On Wed, 24 Nov 2004 [EMAIL PROTECTED] wrote:
  
  1. Multiply the data by some factor and then round the
results to an integer (to avoid problems in step 2).
Factor chosen so that the result of (4) below is
satisfactory.
 
  2. Eliminate duplicates in the result of (1).
 
  3. Divide by the factor you used in (1).
 
  4. Plot the result; save plot to PDF.
 
  As to how to do it in R: the critical step is (2),
  which with so many points could be very heavy unless
  done by a well-chosen procedure. I'm not expert enough
  to advise about that, but no doubt others are.
  
  unique will eat that for breakfast
  
  x - runif(1e6)
  system.time(xx - unique(round(x, 4)))
  [1] 0.55 0.09 0.64 0.00 0.00
  length(xx)
  [1] 10001
 
 'unique' will eat x for breakfast, indeed, but will have some
 trouble chewing (x,y).
 
 
 
  xx - data.frame(x=round(runif(100),4),
  y=round(runif(100),4))
  system.time(xx2 - unique(xx))
 [1] 14.23  0.06 14.34NANA
 
 The time does not seem too bad, depending on how many times it
 has to be performed.
 --Matt

Interesting! Let's see:

Starting again,

  X-round(rnorm(1e6),3);Y-round(rnorm(1e6),3)
  XY-cbind(X,Y)
  system.time(unique(XY))
  [1] 288.22   3.00 291.38   0.00   0.00

  XY-data.frame(x=X,y=Y)
  system.time(unique(XY))
  [1] 72.38  0.84 74.44  0.00  0.00


Data Frames Are Fast Food!!!

Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 25-Nov-04   Time: 02:12:20
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Error in anova(): objects must inherit from classes

2004-11-24 Thread Andrew Criswell
Hello:
Let me rephrase my question to attract interest in the problem I'm having. When 
I appply anova() to two equations
estimated using glmmPQL, I get a complaint,
anova(fm1, fm2)
Error in anova.lme(fm1, fm2) : Objects must inherit from classes gls,
gnls lm,lmList, lme,nlme,nlsList, or nls

The two equations I estimated are these:
fm1 - glmmPQL(choice ~ day + stereotypy,
+random = ~ 1 | bear, data = learning, family = binomial)
fm2 - glmmPQL(choice ~ day + envir + stereotypy,
+random = ~ 1 | bear, data = learning, family = binomial)
Individually, I get results from anova():
anova(fm1)
   numDF denDF   F-value p-value
(Intercept) 1  2032   7.95709  0.0048
day 1  2032 213.98391  .0001
stereotypy  1  2032   0.42810  0.5130
anova(fm2)
   numDF denDF   F-value p-value
(Intercept) 1  2031   5.70343  0.0170
day 1  2031 213.21673  .0001
envir   1  2031  12.50388  0.0004
stereotypy  1  2031   0.27256  0.6017

I did look through the archives but didn't finding anything relevant to my 
problem.
Hope someone can help.
ANDREW

_
platform i586-mandrake-linux-gnu
arch i586
os   linux-gnu
system   i586, linux-gnu
status
major2
minor0.0
year 2004
month10
day  04
language R

--
Andrew R. Criswell, Ph.D.
Graduate School, Bangkok University
mailto:[EMAIL PROTECTED] 
http://email.bu.ac.th/src/compose.php?send_to=andrew.c%40bu.ac.th
mailto:[EMAIL PROTECTED] 
http://email.bu.ac.th/src/compose.php?send_to=andrew%40arcriswell.com
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread Liaw, Andy
 From: [EMAIL PROTECTED]
 
 On 25-Nov-04 Ted Harding wrote:
  'unique' will eat x for breakfast, indeed, but will have some
  trouble chewing (x,y).
  
  I still can't think of a neat way of doing that.
  
  Best wishes,
  Ted.
 
 Sorry, I don't want to be misunderstood.
 I didn't mean that 'unique' won't work for arrays.
 What I meant was:
 
  X-round(rnorm(1e6),3);Y-round(rnorm(1e6),3)
  system.time(unique(X))
 [1] 0.74 0.07 0.81 0.00 0.00
  system.time(unique(cbind(X,Y)))
 [1] 350.81   4.56 356.54   0.00   0.00

Do you know if majority of that time is spent in unique() itself?  If so,
which method?  What I see is:

 X-round(rnorm(1e6),3);Y-round(rnorm(1e6),3)
 system.time(unique(X), gcFirst=TRUE)
[1] 0.25 0.01 0.26   NA   NA
 system.time(unique(cbind(X,Y)), gcFirst=TRUE)
[1] 101.80   0.34 104.61 NA NA
 system.time(dat - data.frame(x=X, y=Y), gcFirst=TRUE)
[1] 10.17  0.00 10.24NANA
 system.time(unique(dat), gcFirst=TRUE)
[1] 23.94  0.11 24.15NANA

Andy

 
 However, still rounding to 3 d.p. we can try packing:
 
  Z-1*X + 1000*Y
  system.time(W-unique(Z))
 [1] 0.83 0.05 0.88 0.00 0.00
  length(W)
 [1] 961523
 
 Though the runtime is small we don't get much reduction
 and still W has to be unpacked.
 
 With rounding to 2 d.p.
 
  X-round(rnorm(1e6),2);Y-round(rnorm(1e6),2)
  Z-1*X + 1000*Y
  system.time(W-unique(Z))
 [1] 1.31 0.01 1.32 0.00 0.00
  length(W)
 [1] 209882
 
 so now it's about 1/5, but visible discretisation must be
 getting close.
 
 With 1 d.p.
 
  X-round(rnorm(1e6),1);Y-round(rnorm(1e6),1)
  Z-1*X + 1000*Y
  system.time(W-unique(Z))
 [1] 0.92 0.01 0.93 0.00 0.00
  length(W)
 [1] 4953
 
 there's a good reduction (about 1/200) but the discretisation
 would definitely now be visible. However, as I suggested before,
 there's an issue of choice of constant (i.e. of the resolution
 of the discretisation so that there's a useful reduction and
 also the plot is acceptable).
 
 I'd still like to learn of a method which avoids the
 above method of packing, which strikes me as clumsy
 (but maybe it's the best way after all).
 
 Ted.
 
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
 Date: 25-Nov-04   Time: 01:45:48
 -- XFMail --
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scatterplot of 100000 points and pdf file format

2004-11-24 Thread hadley wickham
Another possibility might be to use a 2d kernel density estimate (eg.
kde2d from library(MASS).  Then for the high density areas plot the
density contours, for the low density areas plot the individual
points.

Hadley

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] logistic regression and 3PL model

2004-11-24 Thread Michael Lau
Hello colleagues,

 

I am a novice with R and am stuck with an analysis I am trying to conduct.
Any suggestions or feedback would be very much appreciated.

 

I am analyzing a data set of psi (ESP) ganzfeld trials.  The response
variable is binary (correct/incorrect), with a 25% base rate.  I've looked
around the documentation and other online resources and cannot find how I
can correct for that base rate when I conduct a logistic regression.  I
understand that the correction would be equivalent to the three parameter
logistic model (3PL) in IRT but am unsure how to best fit it from a logistic
regression in R.

 

Thanks much,

 

Mike Lau

 

__ 
Michael Y. Lau, M.A. 
118 Haggar Hall 
Department of Psychology 
University of Notre Dame 
Notre Dame, IN 46556 
 
  

 


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Error in anova(): objects must inherit from classes

2004-11-24 Thread Austin, Matt
The lme method for anova() checks the inheritance of the object when a
single object is supplied, which is why there is no error when you use one
object at a time.  When two objects are supplied, the method uses the class
of the object by invoking the data.class function (which does not list
glmmPQL class).  If you replace the check of the class with a check of
inheritance it should work.

Following is a check from the example listed in MASS (Venables and Ripley)

  library(MASS)
  library(nlme) 
  x1 - glmmPQL(y ~ I(week  2), random = ~ 1 | ID,
+  family = binomial, data = bacteria)
iteration 1 
iteration 2 
iteration 3 
iteration 4 
iteration 5 
iteration 6 
  x2 - glmmPQL(y ~ trt + I(week  2), random = ~ 1 | ID,
+  family = binomial, data = bacteria)
iteration 1 
iteration 2 
iteration 3 
iteration 4 
iteration 5 
iteration 6 
  anova(x1)
numDF denDF F-value p-value
(Intercept) 1   169  35  .0001
I(week  2) 1   169  21  .0001
  anova(x2)
numDF denDF F-value p-value
(Intercept) 1   169  35  .0001
trt 247   20.22
I(week  2) 1   169  20  .0001

  anova(x1, x2)
Error in anova.lme(x1, x2) : Objects must inherit from classes gls, gnls
lm,lmList, lme,nlme,nlsList, or nls

After replacement:

  anovaLME(x1, x2)
   Model df  AIC  BIC logLik   Test L.Ratio p-value
x1 1  4 1107 1121   -550   
x2 2  6 1114 1134   -551 1 vs 2 2.60.28


Matt Austin
Statistician

Amgen 
One Amgen Center Drive
M/S 24-2-C
Thousand Oaks CA 93021
(805) 447 - 7431


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Andrew Criswell
 Sent: Wednesday, November 24, 2004 18:47 PM
 To: R-help
 Subject: [R] Error in anova(): objects must inherit from classes
 
 
 Hello:
 
 Let me rephrase my question to attract interest in the 
 problem I'm having. When I appply anova() to two equations
 estimated using glmmPQL, I get a complaint,
 
  anova(fm1, fm2)
 Error in anova.lme(fm1, fm2) : Objects must inherit from 
 classes gls,
 gnls lm,lmList, lme,nlme,nlsList, or nls
 
 
 The two equations I estimated are these:
 
  fm1 - glmmPQL(choice ~ day + stereotypy,
 +random = ~ 1 | bear, data = learning, family 
 = binomial)
  fm2 - glmmPQL(choice ~ day + envir + stereotypy,
 +random = ~ 1 | bear, data = learning, family 
 = binomial)
 
 Individually, I get results from anova():
 
  anova(fm1)
 numDF denDF   F-value p-value
 (Intercept) 1  2032   7.95709  0.0048
 day 1  2032 213.98391  .0001
 stereotypy  1  2032   0.42810  0.5130
 
  anova(fm2)
 numDF denDF   F-value p-value
 (Intercept) 1  2031   5.70343  0.0170
 day 1  2031 213.21673  .0001
 envir   1  2031  12.50388  0.0004
 stereotypy  1  2031   0.27256  0.6017
 
 
 I did look through the archives but didn't finding anything 
 relevant to my problem.
 
 Hope someone can help.
 
 ANDREW
 
  _
 platform i586-mandrake-linux-gnu
 arch i586
 os   linux-gnu
 system   i586, linux-gnu
 status
 major2
 minor0.0
 year 2004
 month10
 day  04
 language R
 
 
 
 -- 
 Andrew R. Criswell, Ph.D.
 Graduate School, Bangkok University
 
 mailto:[EMAIL PROTECTED] 
 http://email.bu.ac.th/src/compose.php?send_to=andrew.c%40bu.ac.th
 mailto:[EMAIL PROTECTED] 
 http://email.bu.ac.th/src/compose.php?send_to=andrew%40arcris
well.com

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] seriesMerge

2004-11-24 Thread Gabor Grothendieck
Yasser El-Zein abu3ammar at gmail.com writes:

: 
: Is there a function in R that is equivalent to S-PLUS's 
: seriesMerge(x1, x2, pos=union)
: where x1, and x2 are of class timeSeries
: 
: seriesMerge is in S-PLUS's finmetrics. I looked into R's mergeSeries
: (in fSeries part of Rmetrics) but I could not make it behave quite the
: same. In R it expected a timeSeries object and a matrix of the same
: row count. In S-PLUS when using the union option both objects can be
: of different lengths.

merge.zoo in package zoo handles union, intersection, left
and right join of unequal length time series according to the 
setting of the all= argument.  zoo can also work with chron dates 
and times which would allow you to work with your millisecond data
and can also merge more than two series at a time.   (The its 
package (see ?itsJoin) and for regular time series, cbind.ts, also 
support merging unequal length series but neither of these support 
chron which I gather is a requirement for you.)

eg. zoo example.
In the following x has  length 8 and y has length 6 and
they overlap for chron(5:8).  chron(1:4) only belongs
to x and chron(9:10) only belongs to y.

library(chron)
library(zoo)
x - zoo(1:8, chron(1:8)) 
y - zoo(5:10, chron(5:10))
merge(x,y) # union
merge(x,y,all=FALSE) # intersection
merge(x,y,all=c(FALSE, TRUE)) # right join
merge(x,y,all=c(TRUE, FALSE)) # left join

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Searching for antilog function

2004-11-24 Thread Gabor Grothendieck
Heather J. Branton hjb at pdq.com writes:

: 
: Thank you so much for each of your responses. But to make sure I am 
: clear (in my own mind), is this correct?
: 
: If  x = 2^y
: Then  y = log2(x)
: 
: Thanks again. I know this is basic.

Although its not a proof, you can still use R to help you
verify such hypotheses.  Just use actual vectors of numbers 
and check that your hypothesis, in this case y equals log2(x),
holds.

For example,

R # try it out with the vector 1,2,3,...,10
R y - 1:10
R y
 [1]  1  2  3  4  5  6  7  8  9 10
R # now calculate x
R x - log2(y)
R # lets see what 2^x looks like:
R 2^x 
 [1]  1  2  3  4  5  6  7  8  9 10
R # it gave back y!

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Installing gregmisc under windows 2000

2004-11-24 Thread Prof Brian Ripley
On Thu, 25 Nov 2004, Yuandan Zhang wrote:
That seems not the case under linux in term of installation. You can install 
this bundle in the same way as installing an individul package.
eg
R CMD INSTALL  CRAN/contrib/main/gregmisc_2.0.0.tar.gz
to get all constituent packages installed.
And the same under Windows.  Please read the rest of the message you 
silently excised.

As gregmisc is not one of the constituent packages, library(gregmisc) does 
not work under Linux.

What exactly were you trying to `correct'?

On Wed, 24 Nov 2004 22:14:14 + (GMT)
Prof Brian Ripley [EMAIL PROTECTED] wrote:
gregmisc is a bundle, not a package.  Its description on CRAN is
gregmiscBundle of gtools, gdata, gmodels, gplots
so try one of those packages.
[Important details removed here.]

--
--
Yuandan Zhang, PhD
Animal Genetics and Breeding Unit
The University of New England
Armidale, NSW, Australia, 2351
E-mail:   [EMAIL PROTECTED]
Phone:(61) 02 6773 3786
Fax:  (61) 02 6773 3266
http://agbu.une.edu.au

 AGBU is a joint venture of NSW Primary Industries
 and The University of New England to undertake
 genetic RD for Australia's Livestock Industries

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html