date:20061017

[R] Are there ANOVA for compositional data?

2006-10-17 Thread S.Q. WEN

The compositional data xi=(x_i1, x_i2,..., x_in),  for each fixed i  ,
xij0, and   sum(xij)=1;


I want to compare the mean( u_i) of several groups
i.e.
H0: u_1=u_2=...=u_N
or
Hj0: u_1j=u_2j=...=u_Nj

Are there any ANOVA tpye tools  to do this work in R?

Thanks,
WEN S Q

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About compositional data analysis

2006-10-17 Thread Frede Aakmann Tøgersen

Well, one place to start is to read the following vignette

http://finzi.psych.upenn.edu/R/library/compositions/doc/UsingCompositions.pdf

This was found using the search function

RSiteSearch(compositional data)

in R.

You may also want to study

@Article{aitchison82, 
author = J. Aitchison, 
title = The statistical analysis of compositional data, 
journal = jrssb, 
year = 1982, 
volume = 44, 
number = 2, 
pages = 139-177, 
annote = With discussion. }

@Book{aitchison86, author = J. Aitchison, 
title = The Statistical Annalysis of Compositional Data, 
publisher = Chapman and Hall, 
year = 1986, 
series = Monographs on Statistics and Applied Probability, 
address = London, 
annote = A greatly expanded version of the original 1982 paper, with lots of 
examples of hypothesis testing }


Med venlig hilsen
Frede Aakmann Tøgersen
 

 

 -Oprindelig meddelelse-
 Fra: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] På vegne af S.Q. WEN
 Sendt: 17. oktober 2006 06:50
 Til: R-help@stat.math.ethz.ch
 Emne: [R] About compositional data analysis
 
 The compositional data xi=(x_i1,x_i2,...,x_in),  for each 
 fixed i  ,  xij0,
 and   sum(xij)=1;
 
 
 I want to compare the mean( u_i) of several groups i.e.
 H0: u_1=u_2=...=u_N
 or
 H0: u_11=u_21=...=u_N1
 
 Are there any ANOVA tpye tools  to do this work in R?
 
 Thanks,
 WEN S Q
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error Correcting Codes, Simplex

2006-10-17 Thread Richard Graham

On 10/16/06, Björn Egert [EMAIL PROTECTED] wrote:
 On 10/8/06, Egert, Bjoern [EMAIL PROTECTED] wrote:
  Hello,
 
Is there a way in R to construct an (error correcting) binary code
e.g. for an source alphabet containing integers from 1 to say 255
with the property that each pair of distinct codewords of length m
is at Hamming distance exactly m/2 ?
 
I was suggested to use so called simplex codes, which should be
fairly standard, but I haven't found a direct way via R packages
to do so, that's why I ask whether there might be in indirect way
to solve this problem.
 
Example:
v1  =c(1,2,3,4)
v2  =c(1,2,5,6)
similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal).
Obviously, a binary representation of would yield a different
similarity of:
binary(v1) =001 010 011 100
binary(v1) =001 010 101 110
similarity(binary(v1),binary(v2))= 9/12
 
Remark: The focus here is not on error correction, but rather the
binary encoding retaining similarity of the elements of vectors.
 
  Many thanks,
  Bjoern
 
  Bjoern,
 
  NB:  I'm an R newbie and I only know a bit about error correcting codes.
 
  I haven't seen any responses to your questions and I don't know if you
  still
  have a need, but it is certainly possible to construct forward error
  correction
  codes with all the great math capability in R.
 
  It seems you want to generate code words that still have the original
  bits
  present.  These are systematic codes and there are lots of them available
  to use.  Many codes are specified by the code word length (n), number
  of original data
  bits in each code word (k), and the minimum Hamming distance of the
  code words (d)
  as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k,
  2^(k - 1)].  These
  codes could be generated as a simple matrix multiply in R, but are you
  sure that's what
  you want?  The code words will be quite long.
 
  Regards,
  Richard Graham


 Hello,

   thank you.

   yes, basically, that's what I want.
   Just a binary encoding of an arbitrary integer value (or vector of
 integers)
   with the property that each pair of distinct integer values have an
 equal Hamming-
   distance (m/2), so as to be able to a similarity search

   I got the idea from: Gionis: Efficient and Tunable Similar Set
 Retrieval (Chap 3.2)

 regards
 Bjoern



Bjoern,

I read only the section of the paper you mention and I'll trust that
the stated properties of Simplex Codes are true.  I haven't researched
or verified it.

[from http://magma.maths.usyd.edu.au/]
Magma is a large, well-supported software package designed to solve
computationally hard problems in algebra, number theory, geometry and
combinatorics. It provides a mathematically rigorous environment for
computing with algebraic, number-theoretic, combinatoric and geometric
objects.

I don't understand a fraction of its capability but I still find it to
be very useful.  In fact, they have an online calculator that will
give you the generator matrix you want.  The online Magma calculator
is at:

http://magma.maths.usyd.edu.au/calc/

To calculate the generator matrix I think you are asking for, go to
the above URL and cut/paste the following command:

ExtendCode(SimplexCode(8));

Click Evaluate and the output window will contain a [256, 8, 128]
Linear Code over GF(2).  You'll need to massage this a bit to use
it as a matrix for R.  I'd use Ruby to do this, but anything will do.
If you want to encode more/less than 8 bits, you can modify the above
argument to SimplexCode.

I used ExtendCode so that the codeword length == Dmin * 2

The Gionis claim I'll research or verify sometime is that _every_ pair
of Simplex Code words  of length m have Hamming distance == m/2.  If
you have a reference to a proof, I'd like to read it (like I said, I
only know a bit about ECC).

Good Luck with your work!
Richard Graham

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Review process for new packages

2006-10-17 Thread Andreas Wittmann

Hi all, 

i'm currently working on a creditmetrics package which includes functions for 
computing the credit risk model creditmetrics. I guess it would be finished in 
a few days. 

My question now is, does there exist some review process before sending it to 
ctan or is it reviewed after having sended it?


best regards
Andreas
-- 

NEU: Jetzt bis zu 16.000 kBit/s! http://www.gmx.net/de/go/dsl

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lda

2006-10-17 Thread Uwe Ligges



Pieter Vermeesch wrote:
 I'm trying to do a linear discriminant analysis on a dataset of three
 classes (Affinities), using the MASS library:
 
 data.frame2 - na.omit(data.frame1)

 data.ld = lda(AFFINITY ~ ., data.frame2, prior = c(1,1,1)/3)
 
 Error in var(x - group.means[g, ]) : missing observations in cov/cor
 
 What does this error message mean and how can I get rid of it?

What does str(data.frame2) tell us?

Uwe Ligges


 Thanks!
 
 Pieter
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lda

2006-10-17 Thread Martin Maechler

 Pieter == Pieter Vermeesch [EMAIL PROTECTED]
 on Mon, 16 Oct 2006 19:15:59 +0200 writes:

Pieter I'm trying to do a linear discriminant analysis on a
Pieter dataset of three classes (Affinities), using the
Pieter MASS library:
 ^^^
No, no!MASS *package*   (please!)

 data.frame2 - na.omit(data.frame1)
 
 data.ld = lda(AFFINITY ~ ., data.frame2, prior = c(1,1,1)/3)

Pieter Error in var(x - group.means[g, ]) : missing observations in cov/cor

Pieter What does this error message mean and how can I get rid of it?

You have (+ or -) 'Inf' data values which na.omit() does not
omit and  'x - group.means[g, ]' contains 'Inf - Inf' which is NaN.

Ideally, MASS:::lda.default() would check for such a case and
give a more user-friendly error message.

Pieter Thanks!

you're welcome.
Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance of fitted value in lm

2006-10-17 Thread David Barron

You can get these via the predict function.

On 17/10/06, Li Zhang [EMAIL PROTECTED] wrote:
 Hi,

 I am wondering if a linear model

 lm(y~ x1+x2) calculates the variance of a fitted
 value.

 Thank you

 Li

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lda

2006-10-17 Thread Pieter Vermeesch

Dear Martin and Uwe,

I did indeed have a few -Inf values in my data frame. Few enough that
I didn't notice them when I inspected my data.

Thanks a lot for helping me better understand the MASS *package* :-)

Pieter

On 10/17/06, Martin Maechler [EMAIL PROTECTED] wrote:
  Pieter == Pieter Vermeesch [EMAIL PROTECTED]
  on Mon, 16 Oct 2006 19:15:59 +0200 writes:

 Pieter I'm trying to do a linear discriminant analysis on a
 Pieter dataset of three classes (Affinities), using the
 Pieter MASS library:
  ^^^
 No, no!MASS *package*   (please!)

  data.frame2 - na.omit(data.frame1)
 
  data.ld = lda(AFFINITY ~ ., data.frame2, prior = c(1,1,1)/3)

 Pieter Error in var(x - group.means[g, ]) : missing observations in 
 cov/cor

 Pieter What does this error message mean and how can I get rid of it?

 You have (+ or -) 'Inf' data values which na.omit() does not
 omit and  'x - group.means[g, ]' contains 'Inf - Inf' which is NaN.

 Ideally, MASS:::lda.default() would check for such a case and
 give a more user-friendly error message.

 Pieter Thanks!

 you're welcome.
 Martin Maechler, ETH Zurich



-- 
Pieter Vermeesch
ETH Zürich, Isotope Geology and Mineral Resources
Clausiusstrasse 25, NW C 85, CH-8092 Zurich, Switzerland
email: [EMAIL PROTECTED], tel: +41 44 632 4643

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question about managing searching path

2006-10-17 Thread Tong Wang

Hi all,
 I'm having sometrouble with managing the seach path, in a function , I 
need to attach some data set at the begining
and detach them at the end,  say, myfunction- function() { attach(mylist); 
 .detach(mylist) } ,
the problem is, since I am still debugging this code,  sometimes it got error 
and ended before reaching the end, thus
the data is left in the searching path.  
  What 's the right way to make mylist detached no matter what ?

Thanks a lot.

best

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about managing searching path

2006-10-17 Thread vito muggeo

Dear Tong
I think on.exit() makes the job..Namely:

attach(Yourdata) on.exit(detach(YourData))

vito


Tong Wang wrote:
 Hi all,
  I'm having sometrouble with managing the seach path, in a function , I 
 need to attach some data set at the begining
 and detach them at the end,  say, myfunction- function() { 
 attach(mylist);  .detach(mylist) } ,
 the problem is, since I am still debugging this code,  sometimes it got error 
 and ended before reaching the end, thus
 the data is left in the searching path.  
   What 's the right way to make mylist detached no matter what ?
 
 Thanks a lot.
 
 best
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 

Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 6626240
fax: 091 485726/485612

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New package Ryacas

2006-10-17 Thread Gabor Grothendieck

Maybe there are timing problems using that setup with sockets?  I once tried
VMware (not with Ryacas but just to try it out) and found it slow as can
be expected with an emulated environment. Since you have Windows XP just
use the Windows version of Ryacas directly.

On 10/17/06, Simon Blomberg [EMAIL PROTECTED] wrote:
 Hi Gabor,

 I'm running Quantian (Debian) inside a VMware virtual machine, on a
 Windows XP host.

 I installed the latest version of yacas from the source tarball. I
 remembered to ./configure --enable-server to allow server connections.
 make and make install worked ok, after some fiddling. I checked that the
 yacas server option worked, by doing yacas --server , and then
 telnet'ing to 127.0.0.1  to check. It worked fine. I installed
 Ryacas. I then tried it out and got the following error:

   library(Ryacas)
 Loading required package: XML
   yacas('Integrate(x)x;')
 [1] Starting Yacas!
 Error in socketConnection(host = 127.0.0.1, port = 9734, server =
 FALSE,  :
   unable to open connection
 In addition: Warning message:
 127.0.0.1:9734 cannot be opened
   Accepting requests from port 9734

 I tried again (stubborn, I guess):

   yacas('Integrate(x)x;')
  [1] Starting Yacas!
  Accepting requests from port 9734
  YacasServer Could not bind to the socket
  : Address already in use
  /usr/local/lib/R/site-library/Ryacas/yacdir/R.ys(1) : File not found
  CommandLine(1) : Expecting ) closing bracket for sub-expression, but
 got x instead

 Any ideas where I may be going wrong? I don't know anything about
 sockets. I've cross-posted to r-sig-debian. They may be interested.

 Cheers,

 Simon.

 --
 Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
 Centre for Resource and Environmental Studies
 The Australian National University
 Canberra ACT 0200
 Australia
 T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
 F: +61 2 6125 0757
 CRICOS Provider # 00120C

 The combination of some data and an aching desire for
 an answer does not ensure that a reasonable answer
 can be extracted from a given body of data.
 - John Tukey.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RODBC and NULL values

2006-10-17 Thread Mark Wardle

Dear All,

Writing sooner than I thought I'd need to.

I'm using R 2.4 on Mac OS X, with RODBC, PostgreSQL 8.1 and Actual's
ODBC driver. I have all my data in Filemaker 8.5, but it is
automatically exported into PostgreSQL for analysis as Filemaker's ODBC
and JDBC access is awful, slow and has a tendency to crash.

I have disability data where for each patient there is a survival time
in years from disease onset to a particular disease stage, namely
unilateral support, bilateral support, wheelchair use, and death. Valid
values may include NULL (patient hasn't reached that stage), 0 (for
example, patient needed support immediately at disease onset), and any
positive integer.

When I query the database manually using psql, it is clear there are
NULL values.
3 | 3 | 18 |   |   27 |1
  |   ||   |   13 |1
1 | 5 ||   |   10 |0
   10 |13 | 13 |   |   22 |0

However, these are all converted to zeros when I use RODBC's sqlQuery(),
making interpretation impossible. I have tried using the nullstring and
na.strings options, but these don't seem to have any effect. I have
tried various combinations of NULL, NA and . Forgive my awkward SQL.

 channel = odbcConnect(ataxia, uid=mark)
 disease = sqlQuery(channel, select calc_survival_unilateral_support
as unlateral, calc_survival_bilateral_support as bilateral,
calc_survival_wheelchair as wheelchair,calc_survival_death as death,
calc_follow_up as followup, has_family_history_ataxia as familial from
clinical, patient where clinical.patient_fk = patient_id and excluded=0
and calc_walking_disability_valid=1)
 disease   # and show results

1273 3 18 0   271
1280 0  0 0   131
1291 5  0 0   100
130   1013 13 0   220

It doesn't seem to be the old repeating rows NULL bug talked about a
href=http://tolstoy.newcastle.edu.au/R/help/04/07/0803.html;here/a.

Is this because my ODBC driver is not returning the correct values for
RODBC to parse? Is there anyway of debugging this (the intricacies of
ODBC are beyond my skill) and is my only alternative to store a
non-valid number in the database (999?) and use my query or R to remove
those datapoints afterwards?

Looking in the archives, there are lots of people asking about how to
convert NAs to numeric, but I want the NAs passed through unaltered!

Many thanks in advance,

Mark



-- 
Dr. Mark Wardle
Clinical research fellow and Specialist Registrar in Neurology,
C2-B2 link, Cardiff University, Heath Park, CARDIFF, CF14 4XN. UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New package Ryacas

2006-10-17 Thread Gabor Grothendieck

Just one other comment.  If you want to try running Linux over Windows
you might want to check out how the AndLinux project (google to find)
is progressing.  I had tried it about a year ago and it was much faster
than VMware although at that time it was still a bit immature.

On 10/17/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Maybe there are timing problems using that setup with sockets?  I once tried
 VMware (not with Ryacas but just to try it out) and found it slow as can
 be expected with an emulated environment. Since you have Windows XP just
 use the Windows version of Ryacas directly.

 On 10/17/06, Simon Blomberg [EMAIL PROTECTED] wrote:
  Hi Gabor,
 
  I'm running Quantian (Debian) inside a VMware virtual machine, on a
  Windows XP host.
 
  I installed the latest version of yacas from the source tarball. I
  remembered to ./configure --enable-server to allow server connections.
  make and make install worked ok, after some fiddling. I checked that the
  yacas server option worked, by doing yacas --server , and then
  telnet'ing to 127.0.0.1  to check. It worked fine. I installed
  Ryacas. I then tried it out and got the following error:
 
library(Ryacas)
  Loading required package: XML
yacas('Integrate(x)x;')
  [1] Starting Yacas!
  Error in socketConnection(host = 127.0.0.1, port = 9734, server =
  FALSE,  :
unable to open connection
  In addition: Warning message:
  127.0.0.1:9734 cannot be opened
Accepting requests from port 9734
 
  I tried again (stubborn, I guess):
 
yacas('Integrate(x)x;')
   [1] Starting Yacas!
   Accepting requests from port 9734
   YacasServer Could not bind to the socket
   : Address already in use
   /usr/local/lib/R/site-library/Ryacas/yacdir/R.ys(1) : File not found
   CommandLine(1) : Expecting ) closing bracket for sub-expression, but
  got x instead
 
  Any ideas where I may be going wrong? I don't know anything about
  sockets. I've cross-posted to r-sig-debian. They may be interested.
 
  Cheers,
 
  Simon.
 
  --
  Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
  Centre for Resource and Environmental Studies
  The Australian National University
  Canberra ACT 0200
  Australia
  T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
  F: +61 2 6125 0757
  CRICOS Provider # 00120C
 
  The combination of some data and an aching desire for
  an answer does not ensure that a reasonable answer
  can be extracted from a given body of data.
  - John Tukey.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC and NULL values

2006-10-17 Thread Mark Wardle

Mark Wardle wrote:

 ...
 Is this because my ODBC driver is not returning the correct values for
 RODBC to parse? Is there anyway of debugging this (the intricacies of
 ODBC are beyond my skill) and is my only alternative to store a
 non-valid number in the database (999?) and use my query or R to remove
 those datapoints afterwards?
 ...
 
Actually, it appears that the Actual ODBC driver isn't returning the
data properly. I've just tested it using Excel and it returns zeros for
NULLs. Wasn't able to use iodbctest as it got very confused and tried to
connect to a MySQL database (which I don't have). There is nothing RODBC
can magic to fix this. It's a bit odd, as I use Filemaker to export data
via raw SQL commands against the ODBC driver, and that does cope with
NULLs, but it appears fetching, at least with Excel and RODBC, does not.

I was just going to try installing Rdbi to see whether that has better
luck, but I can't access CRAN this morning. Hopefully the 403
Forbidden message will be temporary!

So unless anyone knows a better alternative, I shall have to store
nonsense values rather than NULLs in the database (or fix it within the
SELECT query as a quick hack solution instead).

Best wishes,

Mark

-- 
Dr. Mark Wardle
Clinical research fellow and Specialist Registrar in Neurology,
C2-B2 link, Cardiff University, Heath Park, CARDIFF, CF14 4XN. UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i compute the average of three blocks for each column ?

2006-10-17 Thread Petr Pikal

Hi

I haven't seen any answer yet so I try

From your not very clear explanation I suspect you want to do some 
block aggregation

 test
  block x1 x2 x3 x4 x5
1 1 23 22 23 24 23
2 1 21 25 26 21 39
3 1 23 24 22 23 23
4 2 20 21 23 24 28
5 2 32 23 34 24 26
6 2 19 34 34 13 34
7 3 12 32 23 34 19
8 3 23 24 25 26 27
9 3 12 78 23 24 24
 by(test[,-1], test$block, mean)
test$block: 1
  x1   x2   x3   x4   x5 
22.3 23.7 23.7 22.7 28.3 
-
 
test$block: 2
  x1   x2   x3   x4   x5 
23.7 26.0 30.3 20.3 29.3 
-
 
test$block: 3
  x1   x2   x3   x4   x5 
15.7 44.7 23.7 28.0 23.3 
 aggregate(test[,-1], list(test$block), mean)
  Group.1   x1   x2   x3   x4   x5
1   1 22.3 23.7 23.7 22.7 28.3
2   2 23.7 26.0 30.3 20.3 29.3
3   3 15.7 44.7 23.7 28.0 23.3


Regarding your second question with plotting see arguments in par, 
especially mar or mai.

HTH
Petr

On 15 Oct 2006 at 22:22, Yen Ngo wrote:

Date sent:  Sun, 15 Oct 2006 22:22:19 +0200 (CEST)
From:   Yen Ngo [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Subject:[R] how can i compute the average of three blocks for 
each column ?

 Dear all, 
 
 
   I want to compute the average of the three blocks for each
   x-variable which is equal slide in the code below. How can I do that
   ?
 
 
   block  x1x2x3x4x5
   12322   232423
   12125   262139
   123242223   23
   220 21   232428
   2   32 2334 24   26
   2   19 34341334
   3   12 32  ´ 233419
   3   23  24   252627
   3   12 78   232424
 
 
   # read table of data for this slide=(x1)
   a-read.table(file = slide[i],header=T,sep='\t',na.strings=NA)
   #length(a$ID) #Eleminate Neg. and Pos. controls from the dataset.
   The logical negation of the %in% function, #tells subset to only
   select those row where the ID column does not contain either
   empty or none new - subset(a,!ID %in% c(empty,none, ))
   #length(new$ID) #new[1:20,c(1,4,5,9)]
 
 
   #five first columns give position identifiers, include a column with
   block layout=new[,1:5] layout[1:30,]
 
   #9th columns which give the median foreground =values of x-variables
   fg1=as.matrix(new[,9]) length(fg1) mean(fg1)  # calculate the mean
   of x1
 
 
 
    I try to do something like :##
 
   block1=fg1[layout$Block==1,]
   block2=fg1[layout$Block==1,]
   block2=fg1[layout$Block==1,]
   average=(block1+block2+block3)/3
 
   but it did not work.
 
   ## How can i calculate the means of remaining
   x_variables? #   Read data for the remaining slides
   =x2,x3,x4,x5  ###
 
   for (i in 2:num.slides){
   na1 - strsplit(na[[i]][k],.txt)
   na2 - strsplit(na1[[1]][1],-)
   bat=na2[[1]][1]
   sli=na2[[1]][2]
   nslide - cbind(nslide,as.numeric(sli))
   # nslide is a vector giving the number of the slide in the batch #
   read table of data for this slide
   a-read.table(file=slide[i],header=T,sep='\t',na.strings=NA) new-
   subset(a,!ID %in% c(empty,none, )) # append FG data to the
   matrices containing the slides already read
   fg1=cbind(fg1,as.matrix(new[,9])) }
 
   colnames(fg1)=nslide
   fg-data.frame(peptide=c(new$Name),fg1)
   fg - edit(fg)
 
 
 
   # Another question : I have three graphs which are displayed one
   after one with a large space between them. Can I move these graph
   closer each other by making them bigger and how ? Below is the code
   that i have written for plotting the graphs.
 
   par(mfrow=c(3,1))
 for (j in 1:3)
 {
 boxplot(split(pos$y[pos$Block==j],pos$Slide[pos$Block==j]),
 col=lightgray, cex=.65, outline=TRUE, main=paste(Positive Controls
 Block,j)) }
 
 
   Thank you for your help,
   Regards,
 
   Yen
 
  [[alternative HTML version deleted]]
 
 

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep function with patterns list...

2006-10-17 Thread Martin Maechler

 Anupam == Anupam Tyagi [EMAIL PROTECTED]
 on Mon, 16 Oct 2006 18:15:06 + (UTC) writes:

Anupam Hi Stephane,
Anupam Stéphane CRUVEILLER scruveil at genoscope.cns.fr writes:
 is there a way to pass a list of patterns to the grep function? I
 vaguely remember something with %in% operator...

Anupam I think you are looking for the %in% and %nin% which
Anupam are part of Design package, and also in Hmisc
Anupam library. You have to install and load these packages
Anupam to access these functions.

Hmm,  '%in%' has been part of standard R for years ...
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Convert Contingency Table to Flat File

2006-10-17 Thread Marco LO

Hello All,
   
  Is there any R function out there to turn a multi-way contingency table back 
to a flat file table of individual rows and attribute columns.?
   
  Thanks!
  marco
   


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC and NULL values

2006-10-17 Thread Prof Brian Ripley

What sqltype(s) are your variables?

For numeric types, RODBC merely maps values the ODBC driver says are NULL 
to NA.  Since you appear not to have character data,

nullstring: character string to be used when reading 'SQL_NULL_DATA'
   character items from the database.

na.strings: character string(s) to be mapped to 'NA' when reading
   character data.

are not relevant to you.

At least on Windows and Linux the PostgreSQL 8.1 ODBC driver works 
correctly, and NULLs in numeric columns are mapped to NAs in R.  (There is 
an example in my test suite.)

On Tue, 17 Oct 2006, Mark Wardle wrote:

 Dear All,

 Writing sooner than I thought I'd need to.

 I'm using R 2.4 on Mac OS X, with RODBC, PostgreSQL 8.1 and Actual's
 ODBC driver. I have all my data in Filemaker 8.5, but it is
 automatically exported into PostgreSQL for analysis as Filemaker's ODBC
 and JDBC access is awful, slow and has a tendency to crash.

 I have disability data where for each patient there is a survival time
 in years from disease onset to a particular disease stage, namely
 unilateral support, bilateral support, wheelchair use, and death. Valid
 values may include NULL (patient hasn't reached that stage), 0 (for
 example, patient needed support immediately at disease onset), and any
 positive integer.

 When I query the database manually using psql, it is clear there are
 NULL values.
3 | 3 | 18 |   |   27 |1
  |   ||   |   13 |1
1 | 5 ||   |   10 |0
   10 |13 | 13 |   |   22 |0

No, it is not clear.  It is clear that there are values which are printed 
as blank or empty strings.

 However, these are all converted to zeros when I use RODBC's sqlQuery(),
 making interpretation impossible. I have tried using the nullstring and
 na.strings options, but these don't seem to have any effect. I have
 tried various combinations of NULL, NA and . Forgive my awkward SQL.

 channel = odbcConnect(ataxia, uid=mark)
 disease = sqlQuery(channel, select calc_survival_unilateral_support
 as unlateral, calc_survival_bilateral_support as bilateral,
 calc_survival_wheelchair as wheelchair,calc_survival_death as death,
 calc_follow_up as followup, has_family_history_ataxia as familial from
 clinical, patient where clinical.patient_fk = patient_id and excluded=0
 and calc_walking_disability_valid=1)
 disease   # and show results

 1273 3 18 0   271
 1280 0  0 0   131
 1291 5  0 0   100
 130   1013 13 0   220

 It doesn't seem to be the old repeating rows NULL bug talked about a
 href=http://tolstoy.newcastle.edu.au/R/help/04/07/0803.html;here/a.

That was about R 1.9.1, about a problem solved long before then.  Let's 
not drag up ancient history 

 Is this because my ODBC driver is not returning the correct values for
 RODBC to parse? Is there anyway of debugging this (the intricacies of
 ODBC are beyond my skill) and is my only alternative to store a
 non-valid number in the database (999?) and use my query or R to remove
 those datapoints afterwards?

Find out what the types involved are.  Perhaps try as.is=FALSE?

 Looking in the archives, there are lots of people asking about how to
 convert NAs to numeric, but I want the NAs passed through unaltered!

Since the mapping of NULLs to NAs works in other examples, I find it hard 
to see how this can be an RODBC issue.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculate NAs from known data: how to?

2006-10-17 Thread Torleif Markussen Lunde

Hi

In a dataset I have length and age for cod. The age, however, is ony 
given for 40-100% of the fish. What I need to do is to fill inn the NAs 
in a correct way, so that age has a value for each length. This is to be 
done for each sample seperately (there are 324 samples), meaning the NAs 
for sampleno 1 shall be calculated from the known values from sampleno 1.

As for example length 55 cm can be both 4 and 5 years, I guess a fish 
with NA age and length 55 cm should be given a random age given a 
probability for example 55 cm = 4 years has a p=75%, while 55 cm = 4 
years has a p=25%. Those p-values should be calculated from the real 
data.

How can this be done in R, and what is the right way to do it?

Sample number 1 is given below.

Best regards
Torleif Markussen Lunde

length  age sampleno
55  5   1
45  4   1
55  4   1
55  5   1
60  6   1
45  5   1
52  5   1
48  4   1
51  6   1
53  4   1
54  5   1
48  5   1
50  6   1
55  6   1
55  4   1
50  5   1
49  5   1
40  4   1
50  6   1
36  4   1
46  6   1
35  3   1
41  3   1
44  5   1
36  3   1
29  2   1
28  2   1
32  2   1
31  2   1
30  2   1
29  2   1
32  2   1
28  2   1
25  2   1
27  2   1
27  2   1
24  2   1
27  2   1
24  2   1
19  1   1
23  1   1
23  1   1
20  1   1
23  1   1
19  1   1
17  1   1
53  5   1
58  5   1
52  4   1
42  3   1
50  5   1
94  7   1
35  3   1
71  7   1
52  6   1
50  6   1
45  4   1
52  5   1
37  3   1
45  4   1
59  5   1
47  4   1
48  4   1
39  3   1
37  3   1
31  3   1
39  2   1
39  2   1
31  2   1
40  3   1
52  5   1
62  5   1
72  5   1
53  5   1
61  5   1
54  6   1
54  5   1
63  6   1
58  5   1
45  4   1
43  4   1
55  4   1
39  3   1
39  3   1
58  5   1
65  6   1
52  6   1
48  3   1
49  3   1
44  3   1
45  4   1
35  2   1
38  3   1
30  2   1
29  1   1
27  1   1
44  NA  1
48  NA  1
37  NA  1
27  NA  1
30  NA  1
67  NA  1
28  NA  1
65  NA  1
42  NA  1
27  NA  1
37  NA  1
30  NA  1
28  NA  1
26  NA  1
36  NA  1
29  NA  1
32  NA  1
45  NA  1
39  NA  1
27  NA  1
29  NA  1
28  NA  1
27  NA  1
53  NA  1
21  NA  1
15  NA  1
23  NA  1

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC and NULL values

2006-10-17 Thread Mark Wardle

Prof Brian Ripley wrote:
 What sqltype(s) are your variables?
 

The variables are all numeric.

 For numeric types, RODBC merely maps values the ODBC driver says are
 NULL to NA.  Since you appear not to have character data,
 
 nullstring: character string to be used when reading 'SQL_NULL_DATA'
   character items from the database.
 
 na.strings: character string(s) to be mapped to 'NA' when reading
   character data.
 
 are not relevant to you.

I thought that, but was grasping at straws because at that point I
didn't know whether it was problem with the ODBC driver misinforming
RODBC about the correct character types.


 
 At least on Windows and Linux the PostgreSQL 8.1 ODBC driver works
 correctly, and NULLs in numeric columns are mapped to NAs in R.  (There
 is an example in my test suite.)

I'm using Actual's ODBC driver. In my previous email, I did a test with
another ODBC client (Microsoft Excel/Query) and found it too was
misinterpreting NULL values as zero, concluding it was an issue with the
ODBC driver itself. However, I was wrong - using the iodbctest program,
the ODBC driver *is* successfully returning NULLs. It is only Microsoft
Excel/Query and R that I am having the problem with these empty
spaces/NULL characters being converted to zeros.

 ...
 When I query the database manually using psql, it is clear there are
 NULL values.
3 | 3 | 18 |   |   27 |1
  |   ||   |   13 |1
1 | 5 ||   |   10 |0
   10 |13 | 13 |   |   22 |0
 
 No, it is not clear.  It is clear that there are values which are
 printed as blank or empty strings.
 

I *think* postgresql is regarding them as NULL values. I don't know
whether this proves it? [The first two must be functionally equivalent)

ataxia=#select count(calc_survival_bilateral_support) from clinical;
 count
---
53
(1 row)

ataxia=#select count(calc_survival_bilateral_support) from clinical
where calc_survival_bilateral_support is NOT NULL;
 count
---
53
(1 row)


ataxia=# select count(*) from clinical;
 count
---
   140
(1 row)


 Find out what the types involved are.  Perhaps try as.is=FALSE?
 
Have done, and I'm afraid it doesn't change anything.

 Since the mapping of NULLs to NAs works in other examples, I find it
 hard to see how this can be an RODBC issue.
 
Perhaps it is a peculiarity in my set-up, or I'm missing something
obvious and making some assumption somewhere. I will retrace my steps!
Perhaps I should use a different approach, but I always have difficulty
giving up on a problem unsolved!

-- 
Dr. Mark Wardle
Clinical research fellow and Specialist Registrar in Neurology,
C2-B2 link, Cardiff University, Heath Park, CARDIFF, CF14 4XN. UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about managing searching path

2006-10-17 Thread Duncan Murdoch

On 10/17/2006 3:49 AM, Tong Wang wrote:
 Hi all,
  I'm having sometrouble with managing the seach path, in a function , I 
 need to attach some data set at the begining
 and detach them at the end,  say, myfunction- function() { 
 attach(mylist);  .detach(mylist) } ,
 the problem is, since I am still debugging this code,  sometimes it got error 
 and ended before reaching the end, thus
 the data is left in the searching path.  
   What 's the right way to make mylist detached no matter what ?

on.exit, as Vito said.

But you may find that doing the calculations in with does a better 
job, i.e.

with(mylist,  [do something])

The advantages of with() are:

  - it takes precedence over local variables; attach (by default) comes 
behind local variables and the global environment.  This may mean your 
code fails when a user happens to have variables with the same name defined.

  - it is a temporary change, so no detach is needed.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Review process for new packages

2006-10-17 Thread Duncan Murdoch

On 10/17/2006 2:22 AM, Andreas Wittmann wrote:
 Hi all, 
 
 i'm currently working on a creditmetrics package which includes functions for 
 computing the credit risk model creditmetrics. I guess it would be finished 
 in a few days. 
 
 My question now is, does there exist some review process before sending it to 
 ctan or is it reviewed after having sended it?

You should read the instructions in the Writing R Extensions manual, and 
make sure it passes R CMD check without errors or warnings, before you 
send it.

CRAN will run its own checks on a number of different platforms, and if 
your package doesn't pass, they'll probably ask you to fix it -- but you 
should do your best to make their job easier by getting it right before 
you send it.

If your package passes those checks, it will likely be posted to CRAN. 
(There are exceptions, e.g. if they notice your license is not 
compatible with CRAN, etc.)

There's no review process to decide whether your package is useful or 
well-written.  If you want that kind of review you should submit it to 
the Journal of Statistical Software.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Finding out about objects and classes

2006-10-17 Thread michael watson \(IAH-C\)

When R help simply states something like:

Value:

 An object of class 'loess'.

How do I find out more about that class?  Shouldn't there be a link in
the help file or something?

ATB
Mick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert Contingency Table to Flat File

2006-10-17 Thread Philipp Pagel

On Tue, Oct 17, 2006 at 03:08:49AM -0700, Marco LO wrote:
   Is there any R function out there to turn a multi-way contingency
   table back to a flat file table of individual rows and attribute
   columns.?

Are you looking for something like this?

# generate some data
x = sample(c(0,1), 100, replace=T)
y = sample(c(0,1), 100, replace=T)
z = sample(c(0,1), 100, replace=T)
# contingency table
mytab = table(x,y,z)
# flat contingency table
as.data.frame( mytab )

cu
Philipp

-- 
Dr. Philipp PagelTel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics  Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS  Tel.  +49-89-3187 3675
GSF - National Research Center   Fax.  +49-89-3187 3585
  for Environment and Health
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Review process for new packages

2006-10-17 Thread Gabor Grothendieck

One thing you might want to do is an R CMD CHECK with both the development
and released versions of R since CRAN will check it against both:
   http://cran.r-project.org/src/contrib/checkSummary.html

On 10/17/06, Duncan Murdoch [EMAIL PROTECTED] wrote:
 On 10/17/2006 2:22 AM, Andreas Wittmann wrote:
  Hi all,
 
  i'm currently working on a creditmetrics package which includes functions 
  for computing the credit risk model creditmetrics. I guess it would be 
  finished in a few days.
 
  My question now is, does there exist some review process before sending it 
  to ctan or is it reviewed after having sended it?

 You should read the instructions in the Writing R Extensions manual, and
 make sure it passes R CMD check without errors or warnings, before you
 send it.

 CRAN will run its own checks on a number of different platforms, and if
 your package doesn't pass, they'll probably ask you to fix it -- but you
 should do your best to make their job easier by getting it right before
 you send it.

 If your package passes those checks, it will likely be posted to CRAN.
 (There are exceptions, e.g. if they notice your license is not
 compatible with CRAN, etc.)

 There's no review process to decide whether your package is useful or
 well-written.  If you want that kind of review you should submit it to
 the Journal of Statistical Software.

 Duncan Murdoch

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding out about objects and classes

2006-10-17 Thread Gabor Grothendieck

apropos(loess)
help.search(loess)
methods(class = loess)
class?loess # in this case it does not return anything but sometimes it does
RiteSearch(loess)

On 10/17/06, michael watson (IAH-C) [EMAIL PROTECTED] wrote:
 When R help simply states something like:

 Value:

 An object of class 'loess'.

 How do I find out more about that class?  Shouldn't there be a link in
 the help file or something?

 ATB
 Mick

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate NAs from known data: how to?

2006-10-17 Thread Brian G. Peterson

Torleif Markussen Lunde wrote:
 In a dataset I have length and age for cod. The age, however, is ony
 given for 40-100% of the fish. What I need to do is to fill inn the NAs
 in a correct way, so that age has a value for each length. This is to be
 done for each sample seperately (there are 324 samples), meaning the NAs
 for sampleno 1 shall be calculated from the known values from sampleno 
1.
 
 As for example length 55 cm can be both 4 and 5 years, I guess a fish
 with NA age and length 55 cm should be given a random age given a
 probability for example 55 cm = 4 years has a p=75%, while 55 cm = 4
 years has a p=25%. Those p-values should be calculated from the real
 data.
 
 How can this be done in R, and what is the right way to do it?

Given the size of your sample, wouldn't it be more statistically valid to
set the age of the NA records to the mean age of records of matching
length?  I suppose you could also use resampling or a bootstrap, but I'm
not sure that adding randomization will give results that are any more 
statistically valid than using the mean.

Regards,

   - Brian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ANOVA and Levene's test in nested model

2006-10-17 Thread Emilia Pippola


Dear All,

I sent already before a message concerning Levene's test in nested model,
but I didn't get any answer. Optimistically I hope to get an answer this
time. I also point a new question related to the whole model, because I
haven't find any sure answer if I am analysing it in a suitable way or
not. I really have tried to do my homework.

I have response variable (y) and four factors (a, b, c, d). One of
these four factors (d) is nested within another factor (c). In addition,
I would like to take into account only 2nd degree interactions in my
model.

I tried to analyse this model in the following ways (both gave same
results):

 model1-aov(y~(a+b+c)^2 + Error(d))

 model2-aov(y~(a+b+c)^2 + Error(d%in%c))

Is this correct?


I guess another option would be lme in package nlme

 model3-lme(y~(a+b+c)^2, random=~1|d)
 anova(model3)


I am also willing to test homogenity of variances in this model using
Levene's test. How to do it in this kind of case?


I'll appreciate all advices.

With kind regards

-Emilia

-
Emilia Pippola, research assistant
University of Oulu

Personal address (NEW!):
Rautalammintie 3 B 306
FIN-00550 Helsinki, Finland

Mobile: +358-50-5402551
E-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Mixed effect model in R

2006-10-17 Thread Lina Jansen

Hi,

I am analysing an experiment that has one fixed (6 conditions) and two
random factors (11 subjects, 24 images in the conditions). I read somewhere
else that you can also see such a design as a nested experiment with the
hierarchy: subjects - condition - image. For some analysis I have one
respond variable and for others I have more. The response variables are
non-normally distributed. Now the question:

Is there a package that can deal with such a design? I would like to use a
generalized linear model. Are there glms that are extended to do
multivariate analysis (for the 2 random + 1 fixed variable design)? And how
do you call such a design?

Last question: Can you suggest me some literature about such a problem? I am
quite unsure concerning the analysis.

Thanks for any advice
lisra

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] CTRL-C behaviour with RODBC on Solaris2.8

2006-10-17 Thread Jagat.K.Sheth

After loading the RODBC package version 1.1-7, Ctrl-C changes its
behaviour and is quitting R and returning to the (unix-)command prompt
on the solaris2.8 platform here. Here's what happened before and after
loading RODBC

 for (i in 1:10^5) rnorm(10)
^C
 library(RODBC)
 for (i in 1:10^5) rnorm(10)
^C
bash-3.00$

   
platform   sparc-sun-solaris2.8
arch   sparc
os solaris2.8
system sparc, solaris2.8
status
major  2
minor  3.1
year   2006
month  06
day01
svn rev38247
language   R
version.string Version 2.3.1 (2006-06-01)

This version of R was built with gcc-3.3 (too old?) and ODBC_INCLUDE,
ODBC_LIBS pointing to non-standard locations
/quant/temp/jagat/usr/local/include, /quant/temp/jagat/usr/local/lib,
respectively. Will be glad to provide further details.

Any ideas on how to correct this would be greatly appreciated. 

Thanks 

--
Jagat K. Sheth   
Prepayment Modeling and Economics  
Wells Fargo Home Mortgage 
7911 Forsyth Boulevard
Suite 500, M5001-061
Clayton, MO 63105

Tel: (314)-726-4496
Fax: (314)-726-4483
[EMAIL PROTECTED]


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Book recommendation for newbie to stats and R?

2006-10-17 Thread Zembower, Kevin

I'm trying to learn statistics and R at the same time. I have an
undergraduate science degree and one year of calculus (30 years ago),
but never took a stats course. I hope to take some stats courses in the
next year, but thought I would start to see how much I could teach
myself.

I work for an organization that analyses behavior change communication
programs regarding HIV/AIDS and reproductive health. A typical question
we're trying to answer is, Watching which television programs in South
Africa is related to an increased use of condoms? All of our work is in
the social sciences, I'd say. I'd like to help analyze our data using R.

I found these titles that may teach me both stats and R:
--Data Analysis and Graphics Using R by John Maindonald, John Braun
--Introductory Statistics with R by Peter Dalgaard
--Statistics: An Introduction using R by Michael J. Crawley
--Using R for Introductory Statistics by John Verzani

I recognize some of the authors by their postings here.

Can anyone recommend any of these books over the others? I'm interested
in a book that I can learn statistics by reading the chapters and
working out the exercises and problems, therefore having access to many
or all of the problem solutions is important.

Do you have any other recommendations for me in learning both R and
stats? Is it an impossible quest to learn enough stats by myself to be
useful in analyzing real data sets?

Thanks so much for your advice and suggestions.

Kevin Zembower
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
www.jhuccp.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Some questions on Rpart algorithm

2006-10-17 Thread Marcus, Jeffrey

Hello:
  I am using rpart and would like more background on how the splits are made
and how to interpret results - also how to properly use text(.rpart). I have
looked through Venables and Ripley and through the rpart help and still have
some questions. If there is a source (say, Breiman et al)  on decision trees
that would clear this all up,  please let me know. The questions below
pertain to a classification task (ie., I'm using the class method). Many
thanks in advance. 


(1)  I'd like text(.rpart) to print percentages of each class rather then
counts. I don't see an option for this so would like to modify the
text.rpart. However, I can't find the source since it is a method that's
hidden. How can I find the source? 

(2) printcp prints a table with columns cp, nsplit, rel error, xerror, xstd.
I am guessing that cp is complexity, nsplit is the number of the split, rel
error is the error on test set, xerror is cross-validation error and xstd is
standard deviation of error across the cross-validation sets. Is there any
documentation on this? For instance, how exactly is complexity computed? 

(3)  What's a loss matrix? Is it the cost place on each type of
misclassification? 

(4) [More of a methodology question] In practice, when would one use
different costs on different splitting variables?

Thanks for any help on this.

  Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Book recommendation for newbie to stats and R?

2006-10-17 Thread BBands

On 10/17/06, Zembower, Kevin [EMAIL PROTECTED] wrote:
 I work for an organization that analyses behavior change communication
 programs regarding HIV/AIDS and reproductive health. A typical question
 we're trying to answer is, Watching which television programs in South
 Africa is related to an increased use of condoms? All of our work is in
 the social sciences, I'd say. I'd like to help analyze our data using R.

I recently bought Peter Dalgaard's book and have found it to be quite helpful.

  jab
-- 
John Bollinger, CFA, CMT
www.BollingerBands.com

If you advance far enough, you arrive at the beginning.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Book recommendation for newbie to stats and R?

2006-10-17 Thread Ben Fairbank

Kevin --

There are at least two that I recommend:
Using R for Introductory Statistics, John Verzani, published by
Chapman  Hall, 2005, and Introductory Statistics with R, by Peter
Dalgaard (a frequent contributor to this list)published by Springer (in
paperback) 2002.  Of these, IMHO you will find more basic, fundamental,
ground level stat in Verzani (which is also longer by about 40%), but
more elegant, insightful use of R and more creative ideas in Dalgaard.
These two together with the R Introduction that comes with R and maybe
Jon Baron's notes on the use of R in psychology will get you off on the
right foot.  Good luck!

Ben Fairbank




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zembower, Kevin
Sent: Tuesday, October 17, 2006 9:08 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Book recommendation for newbie to stats and R?

I'm trying to learn statistics and R at the same time. I have an
undergraduate science degree and one year of calculus (30 years ago),
but never took a stats course. I hope to take some stats courses in the
next year, but thought I would start to see how much I could teach
myself.

I work for an organization that analyses behavior change communication
programs regarding HIV/AIDS and reproductive health. A typical question
we're trying to answer is, Watching which television programs in South
Africa is related to an increased use of condoms? All of our work is in
the social sciences, I'd say. I'd like to help analyze our data using R.

I found these titles that may teach me both stats and R:
--Data Analysis and Graphics Using R by John Maindonald, John Braun
--Introductory Statistics with R by Peter Dalgaard
--Statistics: An Introduction using R by Michael J. Crawley
--Using R for Introductory Statistics by John Verzani

I recognize some of the authors by their postings here.

Can anyone recommend any of these books over the others? I'm interested
in a book that I can learn statistics by reading the chapters and
working out the exercises and problems, therefore having access to many
or all of the problem solutions is important.

Do you have any other recommendations for me in learning both R and
stats? Is it an impossible quest to learn enough stats by myself to be
useful in analyzing real data sets?

Thanks so much for your advice and suggestions.

Kevin Zembower
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
www.jhuccp.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split-plot analysis with lme()

2006-10-17 Thread i.m.s.white

Thanks, that clarifies things. And this gets all 5 interaction degrees
of freedom:

oats - read.table(testlme.dat, head=T)
# This is a subset of the standard data set with
# the combination Variety=Golden Rain, nitro=0 deleted
oats$nitro - factor(oats$nitro)
attach(oats)
library(nlme)
M - model.matrix(~Variety*nitro)
fit - lme(yield ~ Variety+nitro+M[,7:11], random=~1|Block/Variety)
anova(fit)


On Sun, Oct 15, 2006 at 08:28:43AM -0700, Spencer Graves wrote:
  The problem in your example is that 'lme' doesn't know how to 
 handle the Variety*nitro interaction when all 12 combinations are not 
 present.  The error message singularity in backsolve means that with 
 data for only 11 combinations, which is what you have in your example, 
 you can only estimate 11 linearly independent fixed-effect coefficients, 
 not the 12 required by this model:  1 for intercept + (3-1) for Variety 
 + (4-1) for nitro + (3-1)*(4-1) for Variety*nitro = 12. 
 
  Since 'nitro' is a fixed effect only, you can get what you want by 
 keeping it as a numeric factor and manually specifying the (at most 5, 
 not 6) interaction contrasts  you want, something like the following: 
 
 fit2. - lme(yield ~ Variety+nitro+I(nitro^2)+I(nitro^3)
 +Variety:(nitro+I(nitro^2)), data=Oats,
 random=~1|Block/Variety,
subset=!(Variety == Golden Rain  nitro == 0))
 
  NOTE:  This gives us 4 degrees of freedom for the interaction.  
 With all the data, we can estimate 6.  Therefore, there should be some 
 way to get 5, but so far I haven't figured out an easy way to do that.  
 Perhaps someone else will enlighten us both. 
 
  Even without a method for estimating an interaction term with 5 
 degrees of freedom, I hope I've at least answered your basic question. 
 
  Best Wishes,
  Spencer Graves  
 
 i.m.s.white wrote:
 Dear R-help,
 
 Why can't lme cope with an incomplete whole plot when analysing a 
 split-plot
 experiment? For example:
 
 R : Copyright 2006, The R Foundation for Statistical Computing
 Version 2.3.1 (2006-06-01)
 
   
 library(nlme)
 attach(Oats)
 nitro - ordered(nitro)
 fit - lme(yield ~ Variety*nitro, random=~1|Block/Variety)
 anova(fit)
 
   numDF denDF   F-value p-value
 (Intercept)   145 245.14333  .0001
 Variety   210   1.48534  0.2724
 nitro 345  37.68560  .0001
 Variety:nitro 645   0.30282  0.9322
 
 # Excellent! However ---
 
   
 fit2 - lme(yield ~ Variety*nitro, random=~1|Block/Variety, subset=
 
 + !(Variety == Golden Rain  nitro == 0))
 Error in MEEM(object, conLin, control$niterEM) : 
  Singularity in backsolve at level 0, block 1
   

-- 

*I.White   *
*University of Edinburgh   *
*Ashworth Laboratories, West Mains Road*
*Edinburgh EH9 3JT *
*Fax: 0131 650 6564   Tel: 0131 650 5490   *
*E-mail: [EMAIL PROTECTED]  *

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error Correcting Codes, Simplex

2006-10-17 Thread Thomas Lumley


On Tue, 17 Oct 2006, Richard Graham wrote:


On 10/16/06, Björn Egert [EMAIL PROTECTED] wrote:

On 10/8/06, Egert, Bjoern [EMAIL PROTECTED] wrote:

Hello,

  Is there a way in R to construct an (error correcting) binary code
  e.g. for an source alphabet containing integers from 1 to say 255
  with the property that each pair of distinct codewords of length m
  is at Hamming distance exactly m/2 ?

  I was suggested to use so called simplex codes, which should be
  fairly standard, but I haven't found a direct way via R packages
  to do so, that's why I ask whether there might be in indirect way
  to solve this problem.



The survey package has a function hadamard() to construct Hadamard 
matrices, which are what simplex codes come from.


-thomas__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mixed effect model in R

2006-10-17 Thread Stefan Grosse

Interesting packages for you might be the nlme and lme4 packages and as
a book Pinheiro/Bates, Mixed-Effects Models in S and S-Plus

Lina Jansen schrieb:
 Hi,

 I am analysing an experiment that has one fixed (6 conditions) and two
 random factors (11 subjects, 24 images in the conditions). I read somewhere
 else that you can also see such a design as a nested experiment with the
 hierarchy: subjects - condition - image. For some analysis I have one
 respond variable and for others I have more. The response variables are
 non-normally distributed. Now the question:

 Is there a package that can deal with such a design? I would like to use a
 generalized linear model. Are there glms that are extended to do
 multivariate analysis (for the 2 random + 1 fixed variable design)? And how
 do you call such a design?

 Last question: Can you suggest me some literature about such a problem? I am
 quite unsure concerning the analysis.

 Thanks for any advice
 lisra

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC and NULL values

2006-10-17 Thread Jerome Asselin

On Tue, 2006-10-17 at 11:49 +0100, Mark Wardle wrote:
 Prof Brian Ripley wrote:
  What sqltype(s) are your variables?
  
 
 The variables are all numeric.

I don't think this is an RODBC issue. I've had similar problems with
numeric variables in FileMaker without using RODBC.

I have exported from FileMaker to MySQL numeric variables containing
non-numeric strings. Since MySQL won't allow non-numeric characters into
numeric variables, empty strings and other non-numeric values were
replaced by the default (0). I suppose that's the same with PostgreSQL.

You might get more luck if you first convert your variables as TEXT in
FileMaker and then import them to PostgreSQL where you can reconvert
them to numeric there after fixing the NULL values. It's a bit of extra
work...

In my case, the strategy I used was to export the FileMaker data into
XML format and then run a XSLT script to insert the data as TEXT into
MySQL where I could detect and fix non-numeric strings.

Regards,
Jerome

-- 
Jerome Asselin, M.Sc., Agent de recherche, RHCE
CHUM -- Centre de recherche
3875 rue St-Urbain, 3e etage // Montreal QC  H2W 1V1
Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New package Ryacas

2006-10-17 Thread Rob J Goedman

Simon,

  library(Ryacas)
Loading required package: XML
  yacas(Integrate(x) x)
[1] Starting Yacas!
Accepting requests from port 9734
expression(x^2/2)
  yacas('Integrate(x) x')
expression(x^2/2)
  yacas('Integrate(x)x')
expression(x^2/2)
  yacas('Integrate(x)x;')
CommandLine(1) : Expecting ) closing bracket for sub-expression, but  
got ; instead
 

The 'Accepting ...' message shows yacas was already running. Which is  
also the case
on your system (hence the socket already in use message). You can  
ignore that.
On Mac OS and I guess several other unix/linux environments, yacas  
will remain
running.

If you leave off the ;, it should work.

As this is quite Ryacas/yacas specific, for future messages I'll  
respond directly.

Regards,
Rob


On Oct 17, 2006, at 1:39 AM, Gabor Grothendieck wrote:

 Maybe there are timing problems using that setup with sockets?  I  
 once tried
 VMware (not with Ryacas but just to try it out) and found it slow  
 as can
 be expected with an emulated environment. Since you have Windows XP  
 just
 use the Windows version of Ryacas directly.

 On 10/17/06, Simon Blomberg [EMAIL PROTECTED] wrote:
 Hi Gabor,

 I'm running Quantian (Debian) inside a VMware virtual machine, on a
 Windows XP host.

 I installed the latest version of yacas from the source tarball. I
 remembered to ./configure --enable-server to allow server  
 connections.
 make and make install worked ok, after some fiddling. I checked  
 that the
 yacas server option worked, by doing yacas --server , and then
 telnet'ing to 127.0.0.1  to check. It worked fine. I installed
 Ryacas. I then tried it out and got the following error:

 library(Ryacas)
 Loading required package: XML
 yacas('Integrate(x)x;')
 [1] Starting Yacas!
 Error in socketConnection(host = 127.0.0.1, port = 9734, server =
 FALSE,  :
   unable to open connection
 In addition: Warning message:
 127.0.0.1:9734 cannot be opened
 Accepting requests from port 9734

 I tried again (stubborn, I guess):

 yacas('Integrate(x)x;')
  [1] Starting Yacas!
  Accepting requests from port 9734
  YacasServer Could not bind to the socket
  : Address already in use
  /usr/local/lib/R/site-library/Ryacas/yacdir/R.ys(1) : File not found
  CommandLine(1) : Expecting ) closing bracket for sub-expression, but
 got x instead

 Any ideas where I may be going wrong? I don't know anything about
 sockets. I've cross-posted to r-sig-debian. They may be interested.

 Cheers,

 Simon.

 --
 Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
 Centre for Resource and Environmental Studies
 The Australian National University
 Canberra ACT 0200
 Australia
 T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
 F: +61 2 6125 0757
 CRICOS Provider # 00120C

 The combination of some data and an aching desire for
 an answer does not ensure that a reasonable answer
 can be extracted from a given body of data.
 - John Tukey.



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error: STRING_ELT() can only be applied to a 'character vector', not a 'builtin'

2006-10-17 Thread Brahm, David

I have a daily job that attaches hundreds of pseudo-packages containing
data as promise objects (DDP's, ref: g.data package), and plots the
results to a multi-page pdf device.  Sometimes it fails.  Under R-2.2.1
it just gave segfaults.  Under R-2.3.1 it gave this error message:

   *** caught segfault ***
  address (nil), cause 'memory not mapped'
  Traceback:
   1: load(system.file(data, paste(i, RData, sep = .), package =
pkg), env)
   2: g.data.load(tm.time, hist.20051012)
   3: g.inorder(93500, tm.time, 16)
  aborting ...
  Segmentation fault

Under R-2.4.0, it now gives this message:

  Error: STRING_ELT() can only be applied to a 'character vector', not a
'builtin'

(which appears to be generated inside main/memory.c).

I'm sorry I can't give a reproducible example, because it seems to
happen randomly, and at different points in the process.  So this is
just a shot in the dark -- does anybody recognize this behavior?  TIA.

-- David Brahm ([EMAIL PROTECTED])

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] barplot error

2006-10-17 Thread Farrel Buchinsky


I created a dataframe called OSA
here is what it looks like
 no.surgery surgery
00.4 6.9
60.2 0.3

I have also attached it as an R data file

I cannot understand why I am getting the following error.


barplot(OSA)

Error in barplot.default(OSA) : 'height' must be a vector or a matrix

OSA is a data.frame which means R should see it as a matrix.
What am I not understanding?

--
Farrel Buchinsky
Mobile: (412) 779-1073
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re : Book recommendation for newbie to stats and R?

2006-10-17 Thread justin bem

You can try the statistic book of T H WONNACOT. It is a good introduction to 
statistic for social sciences including economics, medecine, ... 
 
Justin BEM
Elève Ingénieur Statisticien Economiste
BP 294 Yaoundé.
Tél (00237)9597295.



- Message d'origine 
De : Ben Fairbank [EMAIL PROTECTED]
À : Zembower, Kevin [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Envoyé le : Mardi, 17 Octobre 2006, 15h18mn 56s
Objet : Re: [R] Book recommendation for newbie to stats and R?


Kevin --

There are at least two that I recommend:
Using R for Introductory Statistics, John Verzani, published by
Chapman  Hall, 2005, and Introductory Statistics with R, by Peter
Dalgaard (a frequent contributor to this list)published by Springer (in
paperback) 2002.  Of these, IMHO you will find more basic, fundamental,
ground level stat in Verzani (which is also longer by about 40%), but
more elegant, insightful use of R and more creative ideas in Dalgaard.
These two together with the R Introduction that comes with R and maybe
Jon Baron's notes on the use of R in psychology will get you off on the
right foot.  Good luck!

Ben Fairbank




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zembower, Kevin
Sent: Tuesday, October 17, 2006 9:08 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Book recommendation for newbie to stats and R?

I'm trying to learn statistics and R at the same time. I have an
undergraduate science degree and one year of calculus (30 years ago),
but never took a stats course. I hope to take some stats courses in the
next year, but thought I would start to see how much I could teach
myself.

I work for an organization that analyses behavior change communication
programs regarding HIV/AIDS and reproductive health. A typical question
we're trying to answer is, Watching which television programs in South
Africa is related to an increased use of condoms? All of our work is in
the social sciences, I'd say. I'd like to help analyze our data using R.

I found these titles that may teach me both stats and R:
--Data Analysis and Graphics Using R by John Maindonald, John Braun
--Introductory Statistics with R by Peter Dalgaard
--Statistics: An Introduction using R by Michael J. Crawley
--Using R for Introductory Statistics by John Verzani

I recognize some of the authors by their postings here.

Can anyone recommend any of these books over the others? I'm interested
in a book that I can learn statistics by reading the chapters and
working out the exercises and problems, therefore having access to many
or all of the problem solutions is important.

Do you have any other recommendations for me in learning both R and
stats? Is it an impossible quest to learn enough stats by myself to be
useful in analyzing real data sets?

Thanks so much for your advice and suggestions.

Kevin Zembower
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
www.jhuccp.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






___ 
Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] crush in edit()

2006-10-17 Thread crazybuddy Vincent

Dear all,

I am new to R system. When I tried to edit data read from a csv file, R
system crushed, I got an error message as follows:

 edit(data)
*** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
=== Backtrace: =
/lib/libc.so.6(__chk_fail+0x41)[0x49d020b1]
/lib/libc.so.6[0x49d034a2]
/usr/lib/R/modules//R_X11.so[0x33ed7a]
/usr/lib/R/modules//R_X11.so[0x34050d]
/usr/lib/R/modules//R_X11.so[0x341858]
/usr/lib/R/modules//R_X11.so(RX11_dataentry+0xa25)[0x342f45]
/usr/lib/R/lib/libR.so[0xa34675]
/usr/lib/R/lib/libR.so[0x954ed6]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
/usr/lib/R/lib/libR.so[0x929ed8]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
/usr/lib/R/lib/libR.so[0x926a37]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
/usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x928117]
/usr/lib/R/lib/libR.so[0x95661f]
/usr/lib/R/lib/libR.so(Rf_usemethod+0x609)[0x957a89]
/usr/lib/R/lib/libR.so[0x95825e]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
/usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x928117]
/usr/lib/R/lib/libR.so(Rf_eval+0x2f4)[0x925994]
/usr/lib/R/lib/libR.so(Rf_ReplIteration+0x311)[0x945361]
/usr/lib/R/lib/libR.so[0x945571]
/usr/lib/R/lib/libR.so(run_Rmainloop+0x60)[0x9458c0]
/usr/lib/R/lib/libR.so(Rf_mainloop+0x1c)[0x9458ec]
/usr/lib/R/bin/exec/R(main+0x46)[0x80486f6]
/lib/libc.so.6(__libc_start_main+0xdc)[0x49c3b4e4]
/usr/lib/R/bin/exec/R[0x80485f1]
=== Memory map: 
00111000-0012f000 r-xp  fd:00 16943095
/usr/lib/R/library/grDevices/libs/grDevices.so
0012f000-0013 rwxp 0001d000 fd:00 16943095
/usr/lib/R/library/grDevices/libs/grDevices.so
0013-00181000 r-xp  fd:00 16976568
/usr/lib/R/library/stats/libs/stats.so
00181000-00183000 rwxp 00051000 fd:00 16976568
/usr/lib/R/library/stats/libs/stats.so
00339000-00352000 r-xp  fd:00 15959326   /usr/lib/R/modules/R_X11.so
00352000-00353000 rwxp 00018000 fd:00 15959326   /usr/lib/R/modules/R_X11.so
00353000-0035f000 rwxp 00353000 00:00 0
0048-00496000 r-xp  fd:00 15303387   /usr/lib/gconv/SJIS.so
00496000-00498000 rwxp 00015000 fd:00 15303387   /usr/lib/gconv/SJIS.so
0056e000-00598000 r-xp  fd:00 16452204   /usr/lib/R/lib/libRblas.so
00598000-00599000 rwxp 00029000 fd:00 16452204   /usr/lib/R/lib/libRblas.so
00848000-00851000 r-xp  fd:00 15204401   /lib/libnss_files-2.4.so
00851000-00852000 r-xp 8000 fd:00 15204401   /lib/libnss_files-2.4.so
00852000-00853000 rwxp 9000 fd:00 15204401   /lib/libnss_files-2.4.so
00885000-00abd000 r-xp  fd:00 16452203   /usr/lib/R/lib/libR.so
00abd000-00aca000 rwxp 00238000 fd:00 16452203   /usr/lib/R/lib/libR.so
00aca000-00b61000 rwxp 00aca000 00:00 0
00c47000-00c4d000 r-xp  fd:00 16944203
/usr/lib/R/library/methods/libs/methods.so
00c4d000-00c4e000 rwxp 5000 fd:00 16944203
/usr/lib/R/library/methods/libs/methods.so
00eb6000-00f31000 r-xp  fd:00 15242987
/usr/lib/libgfortran.so.1.0.0
00f31000-00f32000 rwxp 0007b000 fd:00 15242987
/usr/lib/libgfortran.so.1.0.0
00f44000-00f45000 r-xp  fd:00 15303344   /usr/lib/gconv/ISO8859-1.so
00f45000-00f47000 rwxp  fd:00 15303344   /usr/lib/gconv/ISO8859-1.so
08048000-08049000 r-xp  fd:00 15796032   /usr/lib/R/bin/exec/R
08049000-0804a000 rwxp  fd:00 15796032   /usr/lib/R/bin/exec/R
09ef7000-0af9f000 rwxp 09ef7000 00:00 0  [heap]
49c08000-49c09000 r-xp 49c08000 00:00 0  [vdso]
49c09000-49c22000 r-xp  fd:00 15206828   /lib/ld-2.4.so
49c22000-49c23000 r-xp 00018000 fd:00 15206828   /lib/ld-2.4.so
49c23000-49c24000 rwxp 00019000 fd:00 15206828   /lib/ld-2.4.so
49c26000-49d53000 r-xp  fd:00 15206829   /lib/libc-2.4.so
49d53000-49d55000 r-xp 0012d000 fd:00 15206829   /lib/libc-2.4.so
49d55000-49d56000 rwxp 0012f000 fd:00 15206829   /lib/libc-2.4.so
49d56000-49d59000 rwxp 49d56000 00:00 0
49d5b000-49d7e000 r-xp  fd:00 15206830   /lib/libm-2.4.so
49d7e000-49d7f000 r-xp 00022000 fd:00 15206830   /lib/libm-2.4.so
49d7f000-49d8 rwxp 00023000 fd:00 15206830   /lib/libm-2.4.so
49d82000-49d84000 r-xp  fd:00 15206831   /lib/libdl-2.4.so
49d84000-49d85000 r-xp 1000 fd:00 15206831   /Aborted

I am using R 2.4.0 i386 on Fedora core 5, any one please help me on this?

Thank you very much.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot error

2006-10-17 Thread Gabor Grothendieck

On 10/17/06, Farrel Buchinsky [EMAIL PROTECTED] wrote:
 I created a dataframe called OSA
 here is what it looks like
  no.surgery surgery
 00.4 6.9
 60.2 0.3

 I have also attached it as an R data file

 I cannot understand why I am getting the following error.

  barplot(OSA)
 Error in barplot.default(OSA) : 'height' must be a vector or a matrix

 OSA is a data.frame which means R should see it as a matrix.
 What am I not understanding?

A data.frame is not the same as a matrix.

Try one of these using the builtin BOD data frame:

barplot(as.matrix(BOD))

barplot(data.matrix(BOD))

barplot.data.frame - function(height, ...) barplot(as.matrix(height), ...)
barplot(BOD)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re : Book recommendation for newbie to stats and R?

2006-10-17 Thread justin bem

Exact reference is : 
 Wonnacot, T., Wonnacot, R., 
 Introductory Statistics for Business and Economics, 
 New York, 1990

 
Justin BEM
Elève Ingénieur Statisticien Economiste
BP 294 Yaoundé.
Tél (00237)9597295.



- Message d'origine 
De : Ben Fairbank [EMAIL PROTECTED]
À : Zembower, Kevin [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Envoyé le : Mardi, 17 Octobre 2006, 15h18mn 56s
Objet : Re: [R] Book recommendation for newbie to stats and R?


Kevin --

There are at least two that I recommend:
Using R for Introductory Statistics, John Verzani, published by
Chapman  Hall, 2005, and Introductory Statistics with R, by Peter
Dalgaard (a frequent contributor to this list)published by Springer (in
paperback) 2002.  Of these, IMHO you will find more basic, fundamental,
ground level stat in Verzani (which is also longer by about 40%), but
more elegant, insightful use of R and more creative ideas in Dalgaard.
These two together with the R Introduction that comes with R and maybe
Jon Baron's notes on the use of R in psychology will get you off on the
right foot.  Good luck!

Ben Fairbank




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zembower, Kevin
Sent: Tuesday, October 17, 2006 9:08 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Book recommendation for newbie to stats and R?

I'm trying to learn statistics and R at the same time. I have an
undergraduate science degree and one year of calculus (30 years ago),
but never took a stats course. I hope to take some stats courses in the
next year, but thought I would start to see how much I could teach
myself.

I work for an organization that analyses behavior change communication
programs regarding HIV/AIDS and reproductive health. A typical question
we're trying to answer is, Watching which television programs in South
Africa is related to an increased use of condoms? All of our work is in
the social sciences, I'd say. I'd like to help analyze our data using R.

I found these titles that may teach me both stats and R:
--Data Analysis and Graphics Using R by John Maindonald, John Braun
--Introductory Statistics with R by Peter Dalgaard
--Statistics: An Introduction using R by Michael J. Crawley
--Using R for Introductory Statistics by John Verzani

I recognize some of the authors by their postings here.

Can anyone recommend any of these books over the others? I'm interested
in a book that I can learn statistics by reading the chapters and
working out the exercises and problems, therefore having access to many
or all of the problem solutions is important.

Do you have any other recommendations for me in learning both R and
stats? Is it an impossible quest to learn enough stats by myself to be
useful in analyzing real data sets?

Thanks so much for your advice and suggestions.

Kevin Zembower
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
www.jhuccp.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






___ 
Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re : [PS] Re : Book recommendation for newbie to stats and R?

2006-10-17 Thread justin bem

Il y a eu ce débat il y a quelques semaines sur le forum. C'est pas efficace 
d'étudier les deux dans un seul bouquin. La statistique est si vaste qu'un seul 
bouqin ne suffirait pas d'écrire les méandres. Il en est de même de R. Et tu 
sais R c'est de la programmation des rudiments en algorithmiques sont 
nécessaires. 

Tu as essayé R pour débutants d'Emmanuel Paradis ? ( 
http://cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf) c'est par là que 
j'ai commencé. Il y a aussi le MASS de Venable et Ripley qui est très riche 
mais tu dois avoir certains prérequis.

 
Justin BEM
Elève Ingénieur Statisticien Economiste
BP 294 Yaoundé.
Tél (00237)9597295.



- Message d'origine 
De : Ben Fairbank [EMAIL PROTECTED]
À : justin bem [EMAIL PROTECTED]
Envoyé le : Mardi, 17 Octobre 2006, 16h57mn 17s
Objet : RE: [PS] Re : [R] Book recommendation for newbie to stats and R?


Justin 
 
Merci bien.  Le livre de M. Wonnacot, cest un livre au sujet de statistic 
seulement, ou de statistic _et_ R ?
 
Ben Fairbank



From: justin bem [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 17, 2006 10:49 AM
To: Ben Fairbank; Zembower, Kevin; r-help@stat.math.ethz.ch
Subject: [PS] Re : [R] Book recommendation for newbie to stats and R?
 
You can try the statistic book of T H WONNACOT. It is a good introduction to 
statistic for social sciences including economics, medecine, ... 
 
Justin BEM
Elève Ingénieur Statisticien Economiste
BP 294 Yaoundé.
Tél (00237)9597295. 
 
- Message d'origine 
De : Ben Fairbank [EMAIL PROTECTED]
À : Zembower, Kevin [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Envoyé le : Mardi, 17 Octobre 2006, 15h18mn 56s
Objet : Re: [R] Book recommendation for newbie to stats and R?
Kevin --

There are at least two that I recommend:
Using R for Introductory Statistics, John Verzani, published by
Chapman  Hall, 2005, and Introductory Statistics with R, by Peter
Dalgaard (a frequent contributor to this list)published by Springer (in
paperback) 2002.  Of these, IMHO you will find more basic, fundamental,
ground level stat in Verzani (which is also longer by about 40%), but
more elegant, insightful use of R and more creative ideas in Dalgaard.
These two together with the R Introduction that comes with R and maybe
Jon Baron's notes on the use of R in psychology will get you off on the
right foot.  Good luck!

Ben Fairbank




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zembower, Kevin
Sent: Tuesday, October 17, 2006 9:08 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Book recommendation for newbie to stats and R?

I'm trying to learn statistics and R at the same time. I have an
undergraduate science degree and one year of calculus (30 years ago),
but never took a stats course. I hope to take some stats courses in the
next year, but thought I would start to see how much I could teach
myself.

I work for an organization that analyses behavior change communication
programs regarding HIV/AIDS and reproductive health. A typical question
we're trying to answer is, Watching which television programs in South
Africa is related to an increased use of condoms? All of our work is in
the social sciences, I'd say. I'd like to help analyze our data using R.

I found these titles that may teach me both stats and R:
--Data Analysis and Graphics Using R by John Maindonald, John Braun
--Introductory Statistics with R by Peter Dalgaard
--Statistics: An Introduction using R by Michael J. Crawley
--Using R for Introductory Statistics by John Verzani

I recognize some of the authors by their postings here.

Can anyone recommend any of these books over the others? I'm interested
in a book that I can learn statistics by reading the chapters and
working out the exercises and problems, therefore having access to many
or all of the problem solutions is important.

Do you have any other recommendations for me in learning both R and
stats? Is it an impossible quest to learn enough stats by myself to be
useful in analyzing real data sets?

Thanks so much for your advice and suggestions.

Kevin Zembower
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
www.jhuccp.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 
 



Découvrez une nouvelle façon d'obtenir des réponses à toutes vos qu






___ 
Découvrez une nouvelle façon d'obtenir des

Re: [R] Calculate NAs from known data: how to?

2006-10-17 Thread Ted Harding

On 17-Oct-06 Torleif Markussen Lunde wrote:
 Hi
 
 In a dataset I have length and age for cod. The age, however,
 is only given for 40-100% of the fish. What I need to do is
 to fill in the NAs in a correct way, so that age has a value
 for each length. This is to be done for each sample seperately
 (there are 324 samples), meaning the NAs for sample no 1 shall
 be calculated from the known values from sample no 1.
 
 As for example length 55 cm can be both 4 and 5 years, I guess
 a fish with NA age and length 55 cm should be given a random
 age given a probability for example 55 cm = 4 years has a p=75%,
 while 55 cm = 4 years has a p=25%. Those p-values should be
 calculated from the real data.
 
 How can this be done in R, and what is the right way to do it?
 
 Sample number 1 is given below.
[snip]

A question with many ramifications!

First of all, there are several possible approaches to imputing
missing values. You are wise in recognising that there is
uncertainty in this, in general and also for your data set.
For this, I would normally recommend a Multiple Imputation
approach, since this would proceed by sampling from a posterior
distribution for Age given Length, as estimated by Maximum
Likelihood from your data. The differing results of successive
imputations then exhibit a variability corresponding to the
uincertainty about what value to impute. Furthermore, when
subsequent analyses (such as estimating the paameters of a
growth curve from Age and Length, or estimating population
dynamics from Age distributions in successive years) are
carried out, these can be done for each of the imputations
in the multiple set, and the results (and etimated standard
errors) can be combined to give an overall estimate and the
unceertainty in this--which not only includes the variability
in the complete data, but also the uncertainty due to imputation.

For this approach, I would be inclined to start with Shafer's
norm or mix packages, available in R. But see below.

However, I have had a look at the data for Sample 1 which
you included. This throws up several features which should
be taken into account, and which indicate that blind use
of an imputation package may not be the best approach.

First, I made a CSV file from your table (3 columns: length,
age, sample). Then:

  D-read.csv(LengthAge.csv)
  A-D$age
  L-D$length

Now: index which lines have NA for age, then a histogram of
Length when Age is present:

  ix0 - is.na(A)
  hist(L[!ix0],breaks=5*(0:20))

Superimpose a histogram of Length when Age is NA:

  hist(L[ix0],add=TRUE,breaks=5*(0:20),col=red)
  hist(L[!ix0],add=TRUE,breaks=5*(0:20))

(the repetition of L[!ix0] is done because the red has overlaid
it for one length range, and the repetition restores it).

This immediately shows that, in general, missing Age is unusual,
except for Length in the range (25:50) in which it is the
majority. An alternative picture of the same scene appears
if you first make the histogram of all lengths:

  X11()  ## to get a second graphics window
  hist(L,breaks=5*(0:20))
  hist(L[ix0],add=TRUE,breaks=5*(0:20),col=red)

Now you can ask why there should be so many NAs for Age
in the Length range (25:30) and, indeed (2nd histogram)
why there are so many specimens anyway in that Length
range compared with their neghbours. Comparing the two
histograms idicates that the excess in that Length range
arises from the fish with NA Ages: in the first histogram
the number of non-NA Ages in )25:30) is very comparable
with the numbers of all fish in other Length ranges.

So my immediate suspicion is that there is something
special about the Length range (25:30) in relation to
whether Age is recorded or not.

The next thing that emerges from the first histogram
is the sharp decline in numbers above Length=55.

I therefore wonder why this also happens.

Is 55cm a magic (e.g. legal) length threshold for cod?
And is 30cm also special in some way? If so, could there
be some pressure on whoever records the data not to take
measurements of Age when the fish is near but under 30cm,
or to record a value of Length just below the threshold
(i.e. incorrectly)? Does this happen in your other data
sets?

As well as these why questions, however, there is a
a technical issue arising from the fact that many missing
Ages occur in a very limited range of Length:

  sum(ix0)
  [1] 27  ### total number of Age = NA

  sum((L[ix0]25)(L[ix0]=30))
  [1] 12  ### Age = NA in (25:50)
   sum((L[!ix0]25)(L[!ix0]=30))
  [1] 11  ### Age is known in (25:30)

This is: whether Age=NA is uninformative about Age given
recorded Length. More precisely, whether

  Prob[Age=NA | true Age, recorded Length]

  = Prob[Age= NA | recorded Length]

since many imputation techniques depend on this. If it is
not true, then being missing is informative about Age
(i.e. the distribution of Age at Length is different for
Age=NA cases than for Age != NA cases), though you would
not be able to ascertain this without a suitable model
for

[R] Variance of Y_hat in a linear model

2006-10-17 Thread Li Zhang

   
Y   X Z 
   42.07.0   33.0
   33.04.0   41.0
   75.0   16.07.0
   28.03.0   49.0
   91.0   21.05.0
   55.08.0   31.0


data-read.table(d.txt,header=TRUE)
mod-lm(data$Y~data$X+data$Z)
predict(mod)
   123456 
44.69961 34.22997 76.63735 29.32986 91.09000 48.01321 


In the lm, the predicted(fitted) Y_1_hat is 44.6991,

is there a function to give me the variance of
y_1_hat?

Neither anova nor summary gives this value.

Thank You

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mixed effect model in R

2006-10-17 Thread Stefan Grosse

Please always reply to the list as well as there always might be someone
faster/better answering (or it could be that I am wrong, so someone
might correct me)

Indeed Pinheiro/Bates assume gaussian error terms... but I am not really
sure whether you meant that with  non normally distributed respond
variable resp. with non-normal data

however:
/ Mixed-effects models: / The recommended nlme
http://cran.r-project.org/src/contrib/Descriptions/nlme.html package,
associated with Pinheiro and Bates, / Mixed-Effects Models in S and
S-PLUS / (Springer, 2000), fits linear and nonlinear mixed-effects
models, commonly used in the social sciences for hierarchical and
longitudinal data. Generalized linear mixed-effects models may be fit by
the glmmPQL function in the MASS package, and by the lmer function in
the Matrix
http://cran.r-project.org/src/contrib/Descriptions/Matrix.html package
(related to the lme4
http://cran.r-project.org/src/contrib/Descriptions/lme4.html package,
which largely supersedes nlme
http://cran.r-project.org/src/contrib/Descriptions/nlme.html for /
linear / mixed models). Also see the lmeSplines
http://cran.r-project.org/src/contrib/Descriptions/lmeSplines.html and
lmm http://cran.r-project.org/src/contrib/Descriptions/lmm.html
packages. [
http://cran.r-project.org/src/contrib/Views/SocialSciences.html ]

Lina Jansen schrieb:


 2006/10/17, Stefan Grosse [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]:

 Interesting packages for you might be the nlme and lme4 packages
 and as
 a book Pinheiro/Bates, Mixed-Effects Models in S and S-Plus


 Thank you for the answer. I am always unsure concerning the
 non-normality. Can I use the nlme and lme4 with non-normal data?
 First, I thought they would work like an ANOVA but with random and
 fixed effects.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance of Y_hat in a linear model

2006-10-17 Thread Gabor Grothendieck

Using the builtin BOD data set try this:

predict(lm(demand ~., BOD), se.fit = TRUE)


On 10/17/06, Li Zhang [EMAIL PROTECTED] wrote:

Y   X Z
   42.07.0   33.0
   33.04.0   41.0
   75.0   16.07.0
   28.03.0   49.0
   91.0   21.05.0
   55.08.0   31.0


 data-read.table(d.txt,header=TRUE)
 mod-lm(data$Y~data$X+data$Z)
 predict(mod)
   123456
 44.69961 34.22997 76.63735 29.32986 91.09000 48.01321


 In the lm, the predicted(fitted) Y_1_hat is 44.6991,

 is there a function to give me the variance of
 y_1_hat?

 Neither anova nor summary gives this value.

 Thank You

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: STRING_ELT() can only be applied to a 'character vector', not a 'builtin'

2006-10-17 Thread Prof Brian Ripley

I suspect you have a protection problem.  The specific message you quote 
indicates that STRING_ELT is being called on an object of inappropriate 
type: but it is quite likely that it is being called on uninitialized 
memory as the intended object has been garbage-collected.  Messages from a 
corrupted R session do not always make sense: see the debugging info in 
`Writing R Extensions' and especially the use of gctorture and valgrind.

Followups to R-devel, please: this looks very like a programming issue.

On Tue, 17 Oct 2006, Brahm, David wrote:

 I have a daily job that attaches hundreds of pseudo-packages containing
 data as promise objects (DDP's, ref: g.data package), and plots the
 results to a multi-page pdf device.  Sometimes it fails.  Under R-2.2.1
 it just gave segfaults.  Under R-2.3.1 it gave this error message:

   *** caught segfault ***
  address (nil), cause 'memory not mapped'
  Traceback:
   1: load(system.file(data, paste(i, RData, sep = .), package =
 pkg), env)
   2: g.data.load(tm.time, hist.20051012)
   3: g.inorder(93500, tm.time, 16)
  aborting ...
  Segmentation fault

 Under R-2.4.0, it now gives this message:

  Error: STRING_ELT() can only be applied to a 'character vector', not a
 'builtin'

 (which appears to be generated inside main/memory.c).

 I'm sorry I can't give a reproducible example, because it seems to
 happen randomly, and at different points in the process.  So this is
 just a shot in the dark -- does anybody recognize this behavior?  TIA.

 -- David Brahm ([EMAIL PROTECTED])

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] if statement error

2006-10-17 Thread Jenny Stadt

Hi List,

I was not able to make this work. I know it is a simple one, sorry to bother. 
Give me some hints pls. Thanks!

Jen





if(length(real.d)=30  length(real.b)=30  beta1*beta2*theta1*theta20 )

{ r - 1;  corr - 1;  }


real.d and real.b are two vectors, beta1,beta2,theta1,and theta2 are constants. 
The error occurred like this:


Error in if (length(real.d) = 30  length(real.b) = 30  beta1 * beta2 *  : 
missing value where TRUE/FALSE needed

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if statement error

2006-10-17 Thread Alberto Monteiro

Jenny Stadt wrote:
 
 I was not able to make this work. I know it is a simple one, sorry 
 to bother. Give me some hints pls. Thanks!
 
Are you a C programmer? :-)

 if(length(real.d)=30  length(real.b)=30  
 beta1*beta2*theta1*theta20 )
 
 { r - 1;  corr - 1;  }
 
I _think_ you should use  instead of . And drop the second ;.

Also, don't forget that return x is wrong [it took me a long
time to figure out that R != C, and it's just return(x)]

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if statement error

2006-10-17 Thread Dieter Menne

Jenny Stadt jennystadt at yahoo.ca writes:

 if(length(real.d)=30  length(real.b)=30  
   beta1*beta2*theta1*theta20 )
 
 { r - 1;  corr - 1;  }
 
 real.d and real.b are two vectors, beta1,beta2,theta1,and theta2 are
 constants. The error occurred like this:
 
 Error in 
 if (length(real.d) = 30  length(real.b) = 30  beta1 * beta2 *  : 
 missing value where TRUE/FALSE needed

Please follow the advice and provide a full example, where beta1 really is
a vector. This works for me below, but it give the message you mentioned if 
you uncomment second line.

Dieter

-
beta1 = beta2 =  theta1 = theta2 = 1.0
#beta1 = NULL
real.d = runif(35)
real.b = runif(35)
r=corr=0
if(
  length(real.d)=30  
  length(real.b)=30  
  beta1*beta2*theta1*theta20 ) { 
  r - 1;  
  corr - 1;  
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if statement error

2006-10-17 Thread mike waters

Jenny,
are there any missing values in your vectors? If so, what effect do you
think this will have on an expression like that required by the if statement
that must resolve fully to either true or false?

Regards,

Mike

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jenny Stadt
Sent: 17 October 2006 18:19
To: r-help@stat.math.ethz.ch
Subject: [R] if statement error

Hi List,

I was not able to make this work. I know it is a simple one, sorry to
bother. Give me some hints pls. Thanks!

Jen





if(length(real.d)=30  length(real.b)=30  beta1*beta2*theta1*theta20 )

{ r - 1;  corr - 1;  }


real.d and real.b are two vectors, beta1,beta2,theta1,and theta2 are
constants. The error occurred like this:


Error in if (length(real.d) = 30  length(real.b) = 30  beta1 * beta2 *
: 
missing value where TRUE/FALSE needed

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if statement error

2006-10-17 Thread Lucke, Joseph F

Jenny

This following example works: 
 real.d - rep(NA,30)
 real.b - rep(NA,30)
 b1=runif(1); b2=runif(1); t1=runif(1); t2=runif(1)
 if (length(real.d)=30  length(real.b)=30 
b1*b2*t1*t20){bool=TRUE}
 bool
[1] TRUE

But this one doesn't:
 real.d - rep(NA,30)
 real.b - rep(NA,30)
 b1=runif(1); b2=runif(1); t1=runif(1); t2=NA
 if (length(real.d)=30  length(real.b)=30 
b1*b2*t1*t20){bool=TRUE}
Error in if (length(real.d) = 30  length(real.b) = 30  b1 * b2 *
: 
missing value where TRUE/FALSE needed
  

NA's in the vector make no difference.   is correct.
So, it appears at least one of your scalars is missing

JFL

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jenny Stadt
Sent: Tuesday, October 17, 2006 12:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] if statement error

Hi List,

I was not able to make this work. I know it is a simple one, sorry to
bother. Give me some hints pls. Thanks!

Jen





if(length(real.d)=30  length(real.b)=30 
beta1*beta2*theta1*theta20 )

{ r - 1;  corr - 1;  }


real.d and real.b are two vectors, beta1,beta2,theta1,and theta2 are
constants. The error occurred like this:


Error in if (length(real.d) = 30  length(real.b) = 30  beta1 *
beta2 *  : 
missing value where TRUE/FALSE needed

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if statement error

2006-10-17 Thread Alex Brown


On 17 Oct 2006, at 18:34, Alberto Monteiro wrote:

 Jenny Stadt wrote:

 I was not able to make this work. I know it is a simple one, sorry
 to bother. Give me some hints pls. Thanks!

 Are you a C programmer? :-)

 if(length(real.d)=30  length(real.b)=30 
 beta1*beta2*theta1*theta20 )

 { r - 1;  corr - 1;  }

 I _think_ you should use  instead of . And drop the second ;.


The  is correct in this case.
 is the vector logical AND operator in R (and analogously the  
bitwise logical AND in C)
 is the lazy scalar (atomic) logical AND operator in C and R.  If  
it operates on a vector in R, it ignores all but the first element.   
see help()
since if() in R is scalar (atomic) the  is appropriate.

The second ';' is syntactically correct in R and C, although optional  
in R.

-Alex

Out of interest, for a vector equivalent to if, see help(ifelse)

 Also, don't forget that return x is wrong [it took me a long
 time to figure out that R != C, and it's just return(x)]

 Alberto Monteiro

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert Contingency Table to Flat File

2006-10-17 Thread Marc Schwartz

On Tue, 2006-10-17 at 13:09 +0200, Philipp Pagel wrote:
 On Tue, Oct 17, 2006 at 03:08:49AM -0700, Marco LO wrote:
Is there any R function out there to turn a multi-way contingency
table back to a flat file table of individual rows and attribute
columns.?
 
 Are you looking for something like this?
 
 # generate some data
 x = sample(c(0,1), 100, replace=T)
 y = sample(c(0,1), 100, replace=T)
 z = sample(c(0,1), 100, replace=T)
 # contingency table
 mytab = table(x,y,z)
 # flat contingency table
 as.data.frame( mytab )


This thread reminds me of a discussion a while back, but which I cannot
seem to find at the moment in the archives.

The steps elucidated by Philipp result in a flattened contingency table,
which contains the various cross-classifying factors as unique rows and
the addition of a frequency column indicating the number of occurrences
of each unique row.

It does not however result in what might be considered the original raw
data frame' containing a single row per observation, if that is what one
desires.

In other words, we get the following:

set.seed(1)
x - sample(c(0, 1), 100, replace = TRUE)
y - sample(c(0, 1), 100, replace = TRUE)
z - sample(c(0, 1), 100, replace = TRUE)
 
# contingency table
mytab - table(x, y, z)
 
 mytab
, , z = 0

   y
x0  1
  0 17 19
  1 11 15

, , z = 1

   y
x0  1
  0  6 10
  1 12 10

 
# flattened contingency table
FCT - as.data.frame(mytab)
 
 FCT
  x y z Freq
1 0 0 0   17
2 1 0 0   11
3 0 1 0   19
4 1 1 0   15
5 0 0 16
6 1 0 1   12
7 0 1 1   10
8 1 1 1   10



In order to take 'FCT' and convert it to 'raw data rows', we can do the
following:

expand.dft - function(x, na.strings = NA, as.is = FALSE, dec = .)
{
  # Take each row in the source data frame table and replicate it
  # using the Freq value
  DF - sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
   simplify = FALSE)

  # Take the above list and rbind it to create a single DF
  # Also subset the result to eliminate the Freq column
  DF - subset(do.call(rbind, DF), select = -Freq)

  # Now apply type.convert to the character coerced factor columns
  # to facilitate data type selection for each column
  DF - as.data.frame(lapply(DF,
 function(x) 
 type.convert(as.character(x),
  na.strings = na.strings,
  as.is = as.is,
  dec = dec)))

  # Return data frame
  DF
}


# Now use expand.dft() on the table from above
new.DF - expand.dft(FCT)

 str(new.DF)
'data.frame':   100 obs. of  3 variables:
 $ x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ z: int  0 0 0 0 0 0 0 0 0 0 ...


# Re-create the multi-way table
new.tab - table(new.DF)

 new.tab
, , z = 0

   y
x0  1
  0 17 19
  1 11 15

, , z = 1

   y
x0  1
  0  6 10
  1 12 10


# Compare to initial mytab
 identical(new.tab, mytab)
[1] TRUE



So, if one needs it, expand.dft() can be used to take a multi-way
contingency table that has been coerced to a data frame and convert it
back to the raw data frame.

I'm not sure if this functionality is available elsewhere, but thought
that it might be helpful.

I included the use of type.convert() in order to make a reasonable
attempt at restoring original data types, as the lack of this step
results in all columns as factors.

I wonder if it might make sense to add an 'expand' argument to
as.data.frame.table(), which would default to FALSE. It could be then
set to TRUE and utilize expand.dft() to take the additional step and
return the raw data frame as above.

Anyway, I hope that this might be helpful.

Regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if statement error

2006-10-17 Thread Jenny Stadt

Thank you all for the advice here. I followed the suggestion that check the 
output of the parameters, and found that there might be two possibilities to 
cause the problem. First was there was missing value in real.d / real.b; the 
second was when beta2 was NA. I fixed the data set and the error no longer 
shows up.

Thank you very much!

Jen

-Original Message-
From:Alberto Monteiro ,   [EMAIL PROTECTED]
Sent: 2006-10-17,  11:36:40
To: r-help@stat.math.ethz.ch
CC:
Subject: Re: [R] if statement error
Jenny Stadt wrote:
 
 I was not able to make this work. I know it is a simple one, sorry 
 to bother. Give me some hints pls. Thanks!
 
Are you a C programmer? :-)

 if(length(real.d) =30  length(real.b) =30  
 beta1*beta2*theta1*theta2 0 )
 
 { r  - 1;  corr  - 1;  }
 
I _think_ you should use  instead of . And drop the second ;.

Also, don't forget that return x is wrong [it took me a long
time to figure out that R != C, and it's just return(x)]

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Some questions on Rpart algorithm

2006-10-17 Thread bogdan romocea

With regards to your first question, here's a function I used a couple
of times to get plots similar to those you're looking for. (Search the
list for how to find the source code. Also, there's a reference other
than MASS on the ?rpart page.)

#bogdan romocea 2006-06
#adapted source code from
#  - text.rpart() from package mvpart
#  - functions$text from rpart()
#  to get acceptable plots of classification trees
#the tweaked tree plots show the following:
#  - size of each node (counts and percentages)
#  - splitting rules
#  - % cases in each node, or counts
#  - targets with more than 3 categories are properly labelled through colors
#(unlike in text.rpart() from mvpart)
#example:
#  x - rpart(...,method=class)
#  plot(x,uniform=TRUE,margin=0.02)
#  my.tree.text(x,ncomp.offset=4)

my.tree.text - function(x,percent=TRUE,pct.decimals=0,ncomp.offset=2,
   clr=c(red,yellow,blue,green,brown,purple,navy))
{
frame - x$frame ; col - names(frame)
method - x$method ; ylevels - attr(x, ylevels)
xy - rpartco(x) ; node - as.numeric(row.names(x$frame))
leaves - rep(TRUE, nrow(frame))
bar.vals - x$functions$bar(yval2 = frame$yval2)
node.size - rowSums(bar.vals)
node.title - paste(node.size, /
,round(100*node.size/node.size[1]),%,sep=)
#---the node barplots
sub.barplot(xy$x,xy$y,bar.vals,leaves,xadj=1,yadj=1,bord=TRUE,line=TRUE,col=clr)
rx - range(xy$x) ; ry - range(xy$y)
#---the legend
if (!is.null(ylevels)) bar.labs - ylevels else bar.labs - dimnames(x$y)[[2]]
legend(min(xy$x) - 0.1 * rx, max(xy$y) + 0.05 * ry, bar.labs, col =
clr, pch = 15, bty = n)
text(xy$x[leaves],xy$y[leaves],labels=node.title,pos=3,cex=1.5,offset=1)
#---the splitting rules
cxy - par(cxy)
left.child - match(2 * node, node)
right.child - match(node * 2 + 1, node)
rows - labels(x, pretty = pretty)
text(xy$x,xy$y + 0.5 * cxy[2],rows[left.child],pos=2,col=navy)
text(xy$x,xy$y + 0.5 * cxy[2],rows[right.child],pos=4,col=navy)
#---target composition per node (% or counts)
if (is.null(frame$yval2)) yval - frame$yval[leaves] else yval -
frame$yval2[leaves,]
nclass - (ncol(yval) - 1)/2
counts - yval[, 1 + (1:nclass)]
group - yval[, 1]
if (!is.null(bar.labs)) group - bar.labs[group]
if (percent) {
   #identical(counts / rowSums(counts),prop.table(counts,1))
   nbr - round(100*prop.table(counts,1),pct.decimals)
   #t1 - apply(matrix(nbr,ncol=nclass),2,paste,%,sep=)
   #t2 - apply(matrix(t1,ncol=nclass),1,paste,collapse=/)
   t2 - apply(matrix(nbr,ncol=nclass),1,paste,collapse=|)
   nlab - paste(format(group,justify=left),\n%: ,t2,sep = )
} else {
   t2 - apply(matrix(counts,ncol=nclass),1,paste,collapse=|)
   nlab - paste(format(group,justify=left),\nN: ,t2,sep = )
}
text(xy$x[leaves],xy$y[leaves],labels=nlab,pos=1,offset=ncomp.offset)
}


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Marcus, Jeffrey
 Sent: Tuesday, October 17, 2006 10:03 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Some questions on Rpart algorithm

 Hello:
   I am using rpart and would like more background on how the
 splits are made
 and how to interpret results - also how to properly use
 text(.rpart). I have
 looked through Venables and Ripley and through the rpart help
 and still have
 some questions. If there is a source (say, Breiman et al)  on
 decision trees
 that would clear this all up,  please let me know. The questions below
 pertain to a classification task (ie., I'm using the class
 method). Many
 thanks in advance.


 (1)  I'd like text(.rpart) to print percentages of each class
 rather then
 counts. I don't see an option for this so would like to modify the
 text.rpart. However, I can't find the source since it is a
 method that's
 hidden. How can I find the source?

 (2) printcp prints a table with columns cp, nsplit, rel
 error, xerror, xstd.
 I am guessing that cp is complexity, nsplit is the number of
 the split, rel
 error is the error on test set, xerror is cross-validation
 error and xstd is
 standard deviation of error across the cross-validation sets.
 Is there any
 documentation on this? For instance, how exactly is
 complexity computed?

 (3)  What's a loss matrix? Is it the cost place on each type of
 misclassification?

 (4) [More of a methodology question] In practice, when would one use
 different costs on different splitting variables?

 Thanks for any help on this.

   Jeff

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merging tables of differing lengths

2006-10-17 Thread David R. Gagnon


   Hi,
   I  am  using the table() function on two different vectors to obtain a
   frequency distribution for each:
   tabtyp1 - table(wintype1)
   tabtyp2 - table(wintype2)
   The resulting tables look like this:
tabtyp1 - table(wintype1)
tabtyp2 - table(wintype2)
tabtyp1
   wintype1
   0 2 3 4 5 6 7 8
   16826 10031  1636   797   239   39963 6
tabtyp2
   wintype2
   0 2 3 4 5 6 7 810
   16857 10012  1703   788   171   3757714 3
   What I want to do is merge these two tables into a 2X10 table in order
   to  do a chi-square test.  Given the unequal number of columns, all my
   attempts  are failing.  I am also having instances where the number of
   columns is the same, but the values are different, e.g., one table has
   values  for  7  and  not  8,  while the other lacks 7s but has 8s. Any
   suggestions?
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding per-panel text to panel strips in lattice xyplot

2006-10-17 Thread Deepayan Sarkar

On 10/13/06, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 I would like to add auxiliary information to the bottom of two strips on
 each panel that comes from a table look-up using the values of two
 variables that define the panel.  For example I might panel on sex and
 race, showing 3 randomly chosen time series in each panel and want to
 add (n=100) in the bottom strip to indicate the 3 curves were sampled
 from 100.  Is there a not-too-hard way to do that?

 I would like to do this both with and without groups= and superposition,
 but especially with.

There might be, but it might be easier with some changes to lattice.
Can you give a minimal example so that we can try out ideas?

Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re : Generate a random bistochastic matrix

2006-10-17 Thread Martin Maechler

Thank you, Ravi,

 Ravi == Ravi Varadhan [EMAIL PROTECTED]
 on Mon, 16 Oct 2006 18:54:16 -0400 writes:

Ravi Martin, I don't think that a doubly stochastic matrix
Ravi can be obtained from an arbitrary positive rectangular
Ravi matrix.  There is a theorem by Sinkhorn (Am Math Month
Ravi 1967) on the diagonal equivalence of matrices with
Ravi prescribed row and column sums.  It shows that given a
Ravi positive matrix A(m x n), there is a unique matrix DAE
Ravi (where D and E are m x m and n x n diagonal matrices)
Ravi with rows, k*r_i (i = 1, ..., m), and column sums, c_j
Ravi (j=1,...,n) such that k = \sum_j c_j / \sum_i r_i.
Ravi Therefore, the alternative row and column
Ravi normalization algorithm (same as the iterative
Ravi proportional fitting algorithm for contingency tables)
Ravi will alternate between row and column sums being
Ravi unity, while the other sum alternates between k and
Ravi 1/k.

Ravi Here is a slight modification of your algorithm for
Ravi the rectangular case:


Ravi bistochMat.rect - function(m,n, tol = 1e-7, maxit = 1000) {
Ravi ## Purpose: Random bistochastic *square* matrix (M_{ij}):
Ravi ##M_{ij} = 0;  sum_i M_{ij} = sum_j M_{ij} = 1   (for 
all i,
Ravi j)
Ravi ##
Ravi --
Ravi ## Arguments: n: (n * n) matrix dimension;
Ravi ##
Ravi --
Ravi ## Author: Martin Maechler, Date: 16 Oct 2006, 14:47
Ravi stopifnot(maxit = 1, tol = 0)
Ravi M - matrix(runif(m*n), m,n)
Ravi for(i in 1:maxit) {
Ravi rM - rowSum(M)
Ravi M - M / rep((rM),n)
Ravi cM - colSum(M)
Ravi M - M / rep((cM),each=m)
Ravi if(all(abs(c(rM,cM) - 1)  tol))
Ravi break
Ravi }
Ravi ## cat(needed, i, iterations\n)
Ravi ## or rather
Ravi attr(M, iter) - i
Ravi M
Ravi }

Ravi Using this algorithm we get for an 8 x 4 matrix, for example, we get:

 M - bistochMat.rect(8,4)
 apply(M,1,sum)
Ravi [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 apply(M,2,sum)
Ravi [1] 1 1 1 1

Of course I had tried that too, before I posted and limited
the problem to square matrices.

Ravi Clearly the algorithm didn't converge according to
Ravi your convergence criterion, but the row sums oscillate
Ravi between 1 and 0.5, and the column sums oscillate
Ravi between 2 and 1, respectively.

indeed, and I had tried similar examples.

The interesting thing is really the theorem you mention
a consequence of which seems to be that indeed, simple row and
column scaling iterations would not converge.

Intuitively, I'd still expect that relatively simple
modification of the algorithm should lead to convergence.

Your following statement seems to indicate so,
or do I misunderstand its consequences?

Ravi It is interesting to note that the above algorithm
Ravi converges if we use the infinity norm, instead of the
Ravi 1-norm, to scale the rows and columns, i.e. we divide
Ravi rows and columns by their maxima.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Predicted value at a new level in lm

2006-10-17 Thread Li Zhang

Y   X Z 
   42.07.0   33.0
   33.04.0   41.0
   75.0   16.07.0
   28.03.0   49.0
   91.0   21.05.0
   55.08.0   31.0


data-read.table(d.txt,header=TRUE)
mod-lm(data$Y~data$X+data$Z)
---

I would like to know the predict value at a new level,
say 

X=10 Z=30

Is there a function available to calculate it
directly?

Thank you

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding per-panel text to panel strips in lattice xyplot

2006-10-17 Thread Frank E Harrell Jr

Deepayan Sarkar wrote:
 On 10/13/06, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 I would like to add auxiliary information to the bottom of two strips on
 each panel that comes from a table look-up using the values of two
 variables that define the panel.  For example I might panel on sex and
 race, showing 3 randomly chosen time series in each panel and want to
 add (n=100) in the bottom strip to indicate the 3 curves were sampled
 from 100.  Is there a not-too-hard way to do that?

 I would like to do this both with and without groups= and superposition,
 but especially with.
 
 There might be, but it might be easier with some changes to lattice.
 Can you give a minimal example so that we can try out ideas?
 
 Deepayan
 

Thanks for your note Deepayan.  The difficulty is that the quantity to 
add may need to be obtained by a table look-up given current panel strip 
values.  I have gotten around this by duplicating the lookup values 
(here sizecluster) to correspond to x and y then using subscripts.  The 
code snippet below does not put the extra value in a strip but right 
under the bottom strip.  Better would be inside the bottom strip.

 textfun - function(subscripts) {
   if(!length(subscripts)) return()
   size - sizecluster[subscripts[1]]
   txt - paste('N=',size,sep='')
   grid.text(txt, x=.005, y=.99, just=c(0,1),
 gp=gpar(fontsize=9, col=gray(.25)))
 }

xyplot(Y ~ X | distribution*cluster, groups=curve,
  xlab=xlab, ylab=ylab,
  xlim=xlim, ylim=ylim,
  as.table=TRUE,
  panel=function(x, y, subscripts, ...) {
 panel.superpose(x, y, subscripts, ...)
 textfun(subscripts)
   })

Thanks
-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looking for a cleaner way to do something

2006-10-17 Thread Leeds, Mark \(IED\)

I have two numeric vectors each of length 17 and each is named the exact
same way.
 
so 
obsnum p  m pppmp . dot dot dot.. 
temp1is141752 63   85
 
 obsnum p  m pppmp . dot dot dot.. 
temp2 is   1213 4150   97
 
what i want to have is a resultant matrix with 2 rows and 16 columns
where the 16 columns are the 2:17 columns divided by the respective
first element in each vector.
( so 52, 63 and 85 should all get divided by 1417  and 41, 50 and 97
should all be divided by 1213 ).
 
it doesn't have to retain the column names because i can just assign
them again when i am assigning the rownames.
 
below is my  code :
 
resultmatrix-rbind(temp1,temp2)
resultmatrix-resultmatrix[,2:17]/resultmatrix[,1]
 
colnames(resultmatrix)-colnames(temp1)
rownames(resultmatrix)-c(group1,group2)
 
I'm pretty sure above will work but it seemed kind of ugly and I
wondering if there is a better way because i am trying to improve in R.
if there isn't, that's fine. thanks.


This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation.  Morgan Stanley 
may deal as principal in or own or act as market maker for 
securities/instruments mentioned or may advise the issuers.  This is not 
research and is not from MS Research but it may refer to a research 
analyst/research report.  Unless indicated, these views are the author's and 
may differ from those of Morgan Stanley research or others in the Firm.  We do 
not represent this is accurate or complete and we may not update this.  Past 
performance is not indicative of future returns.  For additional information, 
research reports and important disclosures, contact me or see 
https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
authorize or effect the purchase or sale of any security or instrument, to send 
transfer instructions, or to effect any other transactions.  We cannot 
guarantee that any such requests received via !
 e-mail will be processed in a timely manner.  This communication is solely for 
the addressee(s) and may contain confidential information.  We do not waive 
confidentiality by mistransmission.  Contact me if you do not wish to receive 
these communications.  In the UK, this communication is directed in the UK to 
those persons who are market counterparties or intermediate customers (as 
defined in the UK Financial Services Authority's rules).

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re : Generate a random bistochastic matrix

2006-10-17 Thread Ravi Varadhan

Martin,

Sinkhorn's theorem excludes the possibility of obtaining a doubly stochastic
matrix of the form D%*%A%*%E, which is diagonally equivalent to a given
positive rectangular matrix A.  But it doesn't say that one can't obtain a
doubly stochastic matrix B from A by some other set of operations, other
than multiplying by diagonal matrices. This throws up a number of issues:
does a B always exist, is it unique in some sense, and if so, what is its
relationship to A?  This seems like a really hard problem.  If this problem
can be set up as an optimization problem, perhaps, then we could establish
conditions under which a solution would exist.

In the iterative proportional fitting for contingency tables, we have the
row sums = column sums = grand total, so there is no problem.  

Also, in the case of infinity norm, the constraints are much looser so
convergence is easy.

I also wonder about the physical reality of this problem - i.e. is there a
physical problem that can give rise to a rectangular doubly stochastic
matrix?  In the Markov chain problems, one always gets a square matrix.  I
am not familiar with graph theory applications, where doubly stochastic
matrices play a useful role.

Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: [EMAIL PROTECTED]

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 





-Original Message-
From: Martin Maechler [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 17, 2006 3:31 PM
To: Ravi Varadhan
Cc: R-help@stat.math.ethz.ch; 'Florent Bresson'
Subject: Re: [R] Re : Generate a random bistochastic matrix

Thank you, Ravi,

 Ravi == Ravi Varadhan [EMAIL PROTECTED]
 on Mon, 16 Oct 2006 18:54:16 -0400 writes:

Ravi Martin, I don't think that a doubly stochastic matrix
Ravi can be obtained from an arbitrary positive rectangular
Ravi matrix.  There is a theorem by Sinkhorn (Am Math Month
Ravi 1967) on the diagonal equivalence of matrices with
Ravi prescribed row and column sums.  It shows that given a
Ravi positive matrix A(m x n), there is a unique matrix DAE
Ravi (where D and E are m x m and n x n diagonal matrices)
Ravi with rows, k*r_i (i = 1, ..., m), and column sums, c_j
Ravi (j=1,...,n) such that k = \sum_j c_j / \sum_i r_i.
Ravi Therefore, the alternative row and column
Ravi normalization algorithm (same as the iterative
Ravi proportional fitting algorithm for contingency tables)
Ravi will alternate between row and column sums being
Ravi unity, while the other sum alternates between k and
Ravi 1/k.

Ravi Here is a slight modification of your algorithm for
Ravi the rectangular case:


Ravi bistochMat.rect - function(m,n, tol = 1e-7, maxit = 1000) {
Ravi ## Purpose: Random bistochastic *square* matrix (M_{ij}):
Ravi ##M_{ij} = 0;  sum_i M_{ij} = sum_j M_{ij} = 1   (for
all i,
Ravi j)
Ravi ##
Ravi
--
Ravi ## Arguments: n: (n * n) matrix dimension;
Ravi ##
Ravi
--
Ravi ## Author: Martin Maechler, Date: 16 Oct 2006, 14:47
Ravi stopifnot(maxit = 1, tol = 0)
Ravi M - matrix(runif(m*n), m,n)
Ravi for(i in 1:maxit) {
Ravi rM - rowSum(M)
Ravi M - M / rep((rM),n)
Ravi cM - colSum(M)
Ravi M - M / rep((cM),each=m)
Ravi if(all(abs(c(rM,cM) - 1)  tol))
Ravi break
Ravi }
Ravi ## cat(needed, i, iterations\n)
Ravi ## or rather
Ravi attr(M, iter) - i
Ravi M
Ravi }

Ravi Using this algorithm we get for an 8 x 4 matrix, for example, we
get:

 M - bistochMat.rect(8,4)
 apply(M,1,sum)
Ravi [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 apply(M,2,sum)
Ravi [1] 1 1 1 1

Of course I had tried that too, before I posted and limited
the problem to square matrices.

Ravi Clearly the algorithm didn't converge according to
Ravi your convergence criterion, but the row sums oscillate
Ravi between 1 and 0.5, and the column sums oscillate
Ravi between 2 and 1, respectively.

indeed, and I had tried similar examples.

The interesting thing is really the theorem you mention
a consequence of which seems to be that indeed, simple row and
column scaling iterations would not converge.

Intuitively, I'd still expect that relatively simple
modification of the algorithm should lead to convergence.

Your following statement seems to indicate so,
or do I misunderstand its consequences?

Ravi It is interesting to note that the above algorithm
Ravi converges if we use the infinity norm, instead of the
Ravi

Re: [R] looking for a cleaner way to do something

2006-10-17 Thread Gabor Grothendieck

Try this:

X - structure(11:15, .Names = letters[1:5])
Y - structure(21:25, .Names = letters[1:5])

rbind(group1 = X, group2 = Y)
tab - rbind(group1 = X, group2 = Y)
tab[,-1] / tab[,1]

On 10/17/06, Leeds, Mark (IED) [EMAIL PROTECTED] wrote:
 I have two numeric vectors each of length 17 and each is named the exact
 same way.

 so
obsnum p  m pppmp . dot dot dot..
 temp1is141752 63   85

 obsnum p  m pppmp . dot dot dot..
 temp2 is   1213 4150   97

 what i want to have is a resultant matrix with 2 rows and 16 columns
 where the 16 columns are the 2:17 columns divided by the respective
 first element in each vector.
 ( so 52, 63 and 85 should all get divided by 1417  and 41, 50 and 97
 should all be divided by 1213 ).

 it doesn't have to retain the column names because i can just assign
 them again when i am assigning the rownames.

 below is my  code :

 resultmatrix-rbind(temp1,temp2)
 resultmatrix-resultmatrix[,2:17]/resultmatrix[,1]

 colnames(resultmatrix)-colnames(temp1)
 rownames(resultmatrix)-c(group1,group2)

 I'm pretty sure above will work but it seemed kind of ugly and I
 wondering if there is a better way because i am trying to improve in R.
 if there isn't, that's fine. thanks.
 

 This is not an offer (or solicitation of an offer) to buy/sell the 
 securities/instruments mentioned or an official confirmation.  Morgan Stanley 
 may deal as principal in or own or act as market maker for 
 securities/instruments mentioned or may advise the issuers.  This is not 
 research and is not from MS Research but it may refer to a research 
 analyst/research report.  Unless indicated, these views are the author's and 
 may differ from those of Morgan Stanley research or others in the Firm.  We 
 do not represent this is accurate or complete and we may not update this.  
 Past performance is not indicative of future returns.  For additional 
 information, research reports and important disclosures, contact me or see 
 https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
 authorize or effect the purchase or sale of any security or instrument, to 
 send transfer instructions, or to effect any other transactions.  We cannot 
 guarantee that any such requests received vi!
 a !
  e-mail will be processed in a timely manner.  This communication is solely 
 for the addressee(s) and may contain confidential information.  We do not 
 waive confidentiality by mistransmission.  Contact me if you do not wish to 
 receive these communications.  In the UK, this communication is directed in 
 the UK to those persons who are market counterparties or intermediate 
 customers (as defined in the UK Financial Services Authority's rules).

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looking for a cleaner way to do something

2006-10-17 Thread Gabor Grothendieck

Sorry there was an extra line in there.  It should be:

X - structure(11:15, .Names = letters[1:5])
Y - structure(21:25, .Names = letters[1:5])

tab - rbind(group1 = X, group2 = Y)
tab[,-1] / tab[,1]


On 10/17/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Try this:

 X - structure(11:15, .Names = letters[1:5])
 Y - structure(21:25, .Names = letters[1:5])

 rbind(group1 = X, group2 = Y)
 tab - rbind(group1 = X, group2 = Y)
 tab[,-1] / tab[,1]

 On 10/17/06, Leeds, Mark (IED) [EMAIL PROTECTED] wrote:
  I have two numeric vectors each of length 17 and each is named the exact
  same way.
 
  so
 obsnum p  m pppmp . dot dot dot..
  temp1is141752 63   85
 
  obsnum p  m pppmp . dot dot dot..
  temp2 is   1213 4150   97
 
  what i want to have is a resultant matrix with 2 rows and 16 columns
  where the 16 columns are the 2:17 columns divided by the respective
  first element in each vector.
  ( so 52, 63 and 85 should all get divided by 1417  and 41, 50 and 97
  should all be divided by 1213 ).
 
  it doesn't have to retain the column names because i can just assign
  them again when i am assigning the rownames.
 
  below is my  code :
 
  resultmatrix-rbind(temp1,temp2)
  resultmatrix-resultmatrix[,2:17]/resultmatrix[,1]
 
  colnames(resultmatrix)-colnames(temp1)
  rownames(resultmatrix)-c(group1,group2)
 
  I'm pretty sure above will work but it seemed kind of ugly and I
  wondering if there is a better way because i am trying to improve in R.
  if there isn't, that's fine. thanks.
  
 
  This is not an offer (or solicitation of an offer) to buy/sell the 
  securities/instruments mentioned or an official confirmation.  Morgan 
  Stanley may deal as principal in or own or act as market maker for 
  securities/instruments mentioned or may advise the issuers.  This is not 
  research and is not from MS Research but it may refer to a research 
  analyst/research report.  Unless indicated, these views are the author's 
  and may differ from those of Morgan Stanley research or others in the Firm. 
   We do not represent this is accurate or complete and we may not update 
  this.  Past performance is not indicative of future returns.  For 
  additional information, research reports and important disclosures, contact 
  me or see https://secure.ms.com/servlet/cls.  You should not use e-mail to 
  request, authorize or effect the purchase or sale of any security or 
  instrument, to send transfer instructions, or to effect any other 
  transactions.  We cannot guarantee that any such requests received !
 via !
   e-mail will be processed in a timely manner.  This communication is solely 
  for the addressee(s) and may contain confidential information.  We do not 
  waive confidentiality by mistransmission.  Contact me if you do not wish to 
  receive these communications.  In the UK, this communication is directed in 
  the UK to those persons who are market counterparties or intermediate 
  customers (as defined in the UK Financial Services Authority's rules).
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Predicted value at a new level in lm

2006-10-17 Thread Alexandre

You may use predict.lm method!

-- Início da mensagem original ---
De: [EMAIL PROTECTED]
Para: r-help@stat.math.ethz.ch
Cc:
Data: Tue, 17 Oct 2006 12:34:12 -0700 (PDT)
Assunto: [R] Predicted value at a new level in lm
 Y X Z
 42.0 7.0 33.0
 33.0 4.0 41.0
 75.0 16.0 7.0
 28.0 3.0 49.0
 91.0 21.0 5.0
 55.0 8.0 31.0


 data-read.table(d.txt,header=TRUE)
 mod-lm(data$Y~data$X+data$Z)
 ---

 I would like to know the predict value at a new level,
 say

 X=10 Z=30

 Is there a function available to calculate it
 directly?

 Thank you

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding per-panel text to panel strips in lattice xyplot

2006-10-17 Thread Deepayan Sarkar

On 10/17/06, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 Deepayan Sarkar wrote:
  On 10/13/06, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
  I would like to add auxiliary information to the bottom of two strips on
  each panel that comes from a table look-up using the values of two
  variables that define the panel.  For example I might panel on sex and
  race, showing 3 randomly chosen time series in each panel and want to
  add (n=100) in the bottom strip to indicate the 3 curves were sampled
  from 100.  Is there a not-too-hard way to do that?
 
  I would like to do this both with and without groups= and superposition,
  but especially with.
 
  There might be, but it might be easier with some changes to lattice.
  Can you give a minimal example so that we can try out ideas?
 
  Deepayan
 

 Thanks for your note Deepayan.  The difficulty is that the quantity to
 add may need to be obtained by a table look-up given current panel strip
 values.  I have gotten around this by duplicating the lookup values
 (here sizecluster) to correspond to x and y then using subscripts.  The
 code snippet below does not put the extra value in a strip but right
 under the bottom strip.  Better would be inside the bottom strip.

  textfun - function(subscripts) {
if(!length(subscripts)) return()
size - sizecluster[subscripts[1]]
txt - paste('N=',size,sep='')
grid.text(txt, x=.005, y=.99, just=c(0,1),
  gp=gpar(fontsize=9, col=gray(.25)))
  }

 xyplot(Y ~ X | distribution*cluster, groups=curve,
   xlab=xlab, ylab=ylab,
   xlim=xlim, ylim=ylim,
   as.table=TRUE,
   panel=function(x, y, subscripts, ...) {
  panel.superpose(x, y, subscripts, ...)
  textfun(subscripts)
})

Well, the strip function has always been passed an argument called
'which.panel' (and the latest lattice has a function called
'which.packet()' which gives the same information inside a panel
function as well). It seems to me that that's the thing you want to
use. E.g.

 library(lattice)
 dotplot(variety ~ yield | site * year, data = barley,
+ strip = function(..., which.panel) print(which.panel))
[1] 1 1
[1] 1 1
[1] 2 1
...
[1] 5 2
[1] 6 2
[1] 6 2

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging tables of differing lengths

2006-10-17 Thread hadley wickham

What I want to do is merge these two tables into a 2X10 table in order
to  do a chi-square test.  Given the unequal number of columns, all my

You might want to try something like:

levs - unique(c(wintype1, wintype2))
table(factor(wintype1, levels=lev))
table(factor(wintype2, levels=lev))


Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Predicted value at a new level in lm

2006-10-17 Thread Gavin Simpson

On Tue, 2006-10-17 at 12:34 -0700, Li Zhang wrote:
 Y   X Z 
42.07.0   33.0
33.04.0   41.0
75.0   16.07.0
28.03.0   49.0
91.0   21.05.0
55.08.0   31.0
 
 
 data-read.table(d.txt,header=TRUE)
 mod-lm(data$Y~data$X+data$Z)
 ---
 
 I would like to know the predict value at a new level,
 say 
 
 X=10 Z=30

 dat - scan()
1:42.07.0   33.0
4:33.04.0   41.0
7:75.0   16.07.0
10:28.03.0   49.0
13:91.0   21.05.0
16:55.08.0   31.0
19:
Read 18 items
 dat - as.data.frame(matrix(dat, ncol = 3, byrow = TRUE))
 names(dat) - c(X,Y,Z)
 dat
   X  Y  Z
1 42  7 33
2 33  4 41
3 75 16  7
4 28  3 49
5 91 21  5
6 55  8 31
 mod - lm(Y ~ X + Z, data = dat) # note use of data argument
 pred - predict(mod, newdata = list(X = 10, Z = 30))
 pred
[1] -0.003469617

or

 pred - predict(mod, newdata = data.frame(X = 10, Z = 30))
 pred
[1] -0.003469617

HTH

G

 
 Is there a function available to calculate it
 directly?
 
 Thank you
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 *Note new Address and Fax and Telephone numbers from 10th April 2006*
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/cv/
WC1E 6BT  [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] barplot question

2006-10-17 Thread Leeds, Mark \(IED\)

i'm doing a bar plot and there are 16 column variables. is there a way
to make the variable names go down instead of across when you do the
barplot ?
because the names are so long, the barplot just shows 3 names and leaves
the rest out. if i could rotate the names 90 degrees, it would probably
fit a lot more.
or maybe i can use space to make the horizontal width longer ? I looed
up ?barlot but i'm not sure. when 1st and 2nd are on the bottom,
things look fine but i'm not as interesed in those 2 barplots.  
 
i didn't use any special options. i just did
 
barplot(probsignmatrix)
barplot(t(probsignmatrix))
 
barplot(probsignmatrix,beside=T)
barplot(t(probsignmatrix),beside=T)
 
 
 
i put probsignmatrix below in case someone wants to see what i mean
because it may not be clear. i don't expect anyone to type it in but
rounding would still show what i mean.
thanks a lot.
 
 
 pcount pmpppcount pmmppcount pmmmpcount pcount mcount
pppmmcount ppmmmcount ppmppcount ppmmpcount pppmpcount ppmpmcount
pmpmpcount pmpmmcount pmmpmcount pmppmcount
1st 0.03477157 0.02842640 0.03157360 0.03365482 0.04010152 0.03553299
0.03989848 0.04182741 0.02817259 0.03203046 0.02781726 0.02218274
0.01771574 0.02289340 0.02583756 0.02390863
2nd 0.04648895 0.02901495 0.03092490 0.03064044 0.04108420 0.03998700
0.03958062 0.04059655 0.03039662 0.03027471 0.02901495 0.02170026
0.01601105 0.02287874 0.02165962 0.02267555


This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation.  Morgan Stanley 
may deal as principal in or own or act as market maker for 
securities/instruments mentioned or may advise the issuers.  This is not 
research and is not from MS Research but it may refer to a research 
analyst/research report.  Unless indicated, these views are the author's and 
may differ from those of Morgan Stanley research or others in the Firm.  We do 
not represent this is accurate or complete and we may not update this.  Past 
performance is not indicative of future returns.  For additional information, 
research reports and important disclosures, contact me or see 
https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
authorize or effect the purchase or sale of any security or instrument, to send 
transfer instructions, or to effect any other transactions.  We cannot 
guarantee that any such requests received via !
 e-mail will be processed in a timely manner.  This communication is solely for 
the addressee(s) and may contain confidential information.  We do not waive 
confidentiality by mistransmission.  Contact me if you do not wish to receive 
these communications.  In the UK, this communication is directed in the UK to 
those persons who are market counterparties or intermediate customers (as 
defined in the UK Financial Services Authority's rules).

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cluster in R

2006-10-17 Thread Weiwei Shi

hi,

is there some good summary on clustering methods in R? It seems there
are many packages involving it.

And I have two questions on clustering here:

1. Is there a way of evaluate the effecitives (or seperation) of
clustering (rather than by visualization)?

2. Is there a search method (like genetic search) which can help find
the best subset of attributes which gives best seperation?

Thanks,

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot question

2006-10-17 Thread Francisco J. Zagmutt

Try RSiteSearch(rotate barplot labels)
Then read the first thread for an example of what you want to do.

Cheers

Francisco

Dr. Francisco J. Zagmutt
College of Veterinary Medicine and Biomedical Sciences
Colorado State University




From: Leeds, Mark (IED) [EMAIL PROTECTED]
To: R-help@stat.math.ethz.ch
Subject: [R] barplot question
Date: Tue, 17 Oct 2006 17:15:43 -0400

i'm doing a bar plot and there are 16 column variables. is there a way
to make the variable names go down instead of across when you do the
barplot ?
because the names are so long, the barplot just shows 3 names and leaves
the rest out. if i could rotate the names 90 degrees, it would probably
fit a lot more.
or maybe i can use space to make the horizontal width longer ? I looed
up ?barlot but i'm not sure. when 1st and 2nd are on the bottom,
things look fine but i'm not as interesed in those 2 barplots.

i didn't use any special options. i just did

barplot(probsignmatrix)
barplot(t(probsignmatrix))

barplot(probsignmatrix,beside=T)
barplot(t(probsignmatrix),beside=T)



i put probsignmatrix below in case someone wants to see what i mean
because it may not be clear. i don't expect anyone to type it in but
rounding would still show what i mean.
thanks a lot.


  pcount pmpppcount pmmppcount pmmmpcount pcount mcount
pppmmcount ppmmmcount ppmppcount ppmmpcount pppmpcount ppmpmcount
pmpmpcount pmpmmcount pmmpmcount pmppmcount
1st 0.03477157 0.02842640 0.03157360 0.03365482 0.04010152 0.03553299
0.03989848 0.04182741 0.02817259 0.03203046 0.02781726 0.02218274
0.01771574 0.02289340 0.02583756 0.02390863
2nd 0.04648895 0.02901495 0.03092490 0.03064044 0.04108420 0.03998700
0.03958062 0.04059655 0.03039662 0.03027471 0.02901495 0.02170026
0.01601105 0.02287874 0.02165962 0.02267555


This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation.  Morgan 
Stanley may deal as principal in or own or act as market maker for 
securities/instruments mentioned or may advise the issuers.  This is not 
research and is not from MS Research but it may refer to a research 
analyst/research report.  Unless indicated, these views are the author's 
and may differ from those of Morgan Stanley research or others in the Firm. 
  We do not represent this is accurate or complete and we may not update 
this.  Past performance is not indicative of future returns.  For 
additional information, research reports and important disclosures, contact 
me or see https://secure.ms.com/servlet/cls.  You should not use e-mail to 
request, authorize or effect the purchase or sale of any security or 
instrument, to send transfer instructions, or to effect any other 
transactions.  We cannot guarantee that any such requests received via !
  e-mail will be processed in a timely manner.  This communication is 
solely for the addressee(s) and may contain confidential information.  We 
do not waive confidentiality by mistransmission.  Contact me if you do not 
wish to receive these communications.  In the UK, this communication is 
directed in the UK to those persons who are market counterparties or 
intermediate customers (as defined in the UK Financial Services Authority's 
rules).

   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot question

2006-10-17 Thread Peter Alspach


Mark

 i'm doing a bar plot and there are 16 column variables. is
 there a way to make the variable names go down instead of
 across when you do the barplot ?
 because the names are so long, the barplot just shows 3 names
 and leaves the rest out. if i could rotate the names 90
 degrees, it would probably fit a lot more.

Is this the sort of thing you mean:

temp - barplot(rnorm(16, 3))
text(temp, rep(-0.2, 16), paste('trt', 1:16), srt=90, adj=1)

Peter Alspach

 or maybe i can use space to make the horizontal width longer
 ? I looed up ?barlot but i'm not sure. when 1st and 2nd are
 on the bottom, things look fine but i'm not as interesed in
 those 2 barplots. 
 
 i didn't use any special options. i just did
 
 barplot(probsignmatrix)
 barplot(t(probsignmatrix))
 
 barplot(probsignmatrix,beside=T)
 barplot(t(probsignmatrix),beside=T)
 
 
 
 i put probsignmatrix below in case someone wants to see what
 i mean because it may not be clear. i don't expect anyone to
 type it in but rounding would still show what i mean.
 thanks a lot.
 
 
  pcount pmpppcount pmmppcount pmmmpcount pcount
 mcount pppmmcount ppmmmcount ppmppcount ppmmpcount
 pppmpcount ppmpmcount pmpmpcount pmpmmcount pmmpmcount
 pmppmcount 1st 0.03477157 0.02842640 0.03157360 0.03365482
 0.04010152 0.03553299
 0.03989848 0.04182741 0.02817259 0.03203046 0.02781726 0.02218274
 0.01771574 0.02289340 0.02583756 0.02390863 2nd 0.04648895
 0.02901495 0.03092490 0.03064044 0.04108420 0.03998700
 0.03958062 0.04059655 0.03039662 0.03027471 0.02901495 0.02170026
 0.01601105 0.02287874 0.02165962 0.02267555
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] EM Algorthm help library norm

2006-10-17 Thread downunder


Hello, i need some help concerning the library norm. i habe to impute some
missing values using the em algorithm.
The help offered for the library doesn't really help me, maybe somebody
already worked on em algorithm or multiple imputation. 

some fictive Data
x1 x2

50 60
24   .
26 20 
87   .
21   .

Problem: Em Algorithm in R calculating the missings in x2.

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/EM-Algorthm-help-library-norm-tf2462486.html#a6864754
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster in R

2006-10-17 Thread BBands

On 10/17/06, Weiwei Shi [EMAIL PROTECTED] wrote:

 is there some good summary on clustering methods in R? It seems there
 are many packages involving it.

Gabor provided this very useful link a couple of days back.

http://cran.r-project.org/src/contrib/Views/Cluster.html

jab
-- 
John Bollinger, CFA, CMT
www.BollingerBands.com

If you advance far enough, you arrive at the beginning.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster in R

2006-10-17 Thread Weiwei Shi

hi,
I just happened to find that page. But it seems too brief to me. For
example, my project involves non-determined cluster number and
non-determined attributes for the would-be-clustered samples. What
kind of methods should I start with?

Thanks a lot for the prompty reply.

W.

On 10/17/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Go the R home page (google for R), click on CRAN in left pane, choose
 a mirror, click on Task Views in left pane and choose
 Cluster.

 On 10/17/06, Weiwei Shi [EMAIL PROTECTED] wrote:
  hi,
 
  is there some good summary on clustering methods in R? It seems there
  are many packages involving it.
 
  And I have two questions on clustering here:
 
  1. Is there a way of evaluate the effecitives (or seperation) of
  clustering (rather than by visualization)?
 
  2. Is there a search method (like genetic search) which can help find
  the best subset of attributes which gives best seperation?
 
  Thanks,
 
  --
  Weiwei Shi, Ph.D
  Research Scientist
  GeneGO, Inc.
 
  Did you always know?
  No, I did not. But I believed...
  ---Matrix III
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster in R

2006-10-17 Thread Gabor Grothendieck

Go the R home page (google for R), click on CRAN in left pane, choose
a mirror, click on Task Views in left pane and choose
Cluster.

On 10/17/06, Weiwei Shi [EMAIL PROTECTED] wrote:
 hi,

 is there some good summary on clustering methods in R? It seems there
 are many packages involving it.

 And I have two questions on clustering here:

 1. Is there a way of evaluate the effecitives (or seperation) of
 clustering (rather than by visualization)?

 2. Is there a search method (like genetic search) which can help find
 the best subset of attributes which gives best seperation?

 Thanks,

 --
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.

 Did you always know?
 No, I did not. But I believed...
 ---Matrix III

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot question

2006-10-17 Thread Deepayan Sarkar

On 10/17/06, Leeds, Mark (IED) [EMAIL PROTECTED] wrote:
 i'm doing a bar plot and there are 16 column variables. is there a way
 to make the variable names go down instead of across when you do the
 barplot ?
 because the names are so long, the barplot just shows 3 names and leaves
 the rest out. if i could rotate the names 90 degrees, it would probably
 fit a lot more.
 or maybe i can use space to make the horizontal width longer ? I looed
 up ?barlot but i'm not sure. when 1st and 2nd are on the bottom,
 things look fine but i'm not as interesed in those 2 barplots.

 i didn't use any special options. i just did

 barplot(probsignmatrix)
 barplot(t(probsignmatrix))

 barplot(probsignmatrix,beside=T)
 barplot(t(probsignmatrix),beside=T)



 i put probsignmatrix below in case someone wants to see what i mean
 because it may not be clear. i don't expect anyone to type it in but
 rounding would still show what i mean.
 thanks a lot.


  pcount pmpppcount pmmppcount pmmmpcount pcount mcount
 pppmmcount ppmmmcount ppmppcount ppmmpcount pppmpcount ppmpmcount
 pmpmpcount pmpmmcount pmmpmcount pmppmcount
 1st 0.03477157 0.02842640 0.03157360 0.03365482 0.04010152 0.03553299
 0.03989848 0.04182741 0.02817259 0.03203046 0.02781726 0.02218274
 0.01771574 0.02289340 0.02583756 0.02390863
 2nd 0.04648895 0.02901495 0.03092490 0.03064044 0.04108420 0.03998700
 0.03958062 0.04059655 0.03039662 0.03027471 0.02901495 0.02170026
 0.01601105 0.02287874 0.02165962 0.02267555
 

Don't know if you want to go this way, but try

library(lattice)
barchart(t(probsignmatrix))

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Input buffer overflow

2006-10-17 Thread Gabor Grothendieck

No one answered this question but to answer my own question I did notice
that since I  posted this, there have been changes to parse.Rd in the
development version of R:

  https://svn.r-project.org/R/trunk/src/library/base/man/parse.Rd

indicating:

  a limit of 8192 bytes on the size of strings which can be parsed.

On 10/15/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 In gsubfn I replace matches with strings that represent calls to a function
 and then perform paste(eval(parse(text= ...)), collapse = ) on the result.
 One user of gsubfn is using it with very long strings (over 20,000 characters)
 and the parse is giving an input buffer overflow.  Here is an
 artificial example:

s - paste(rep(X, 25000), collapse = )
out - parse(text = shQuote(s))
   Error in parse(text = shQuote(s)) : input buffer overflow

 Is there a way to increase the limit?


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Semi-definite programming

2006-10-17 Thread simon gatehouse

Dear R users

Are any R users aware of implementations of Semi-Definite optimization 
in R. 
For example, has anybody implemented any of the numerous public domain C 
libraries  for SDP in R and would they be willing to share.
My objective here is to implement variants of Maximum Variance 
Unfolding as outlined in, for example,  
http://www.icml2006.org/icml_documents/camera-ready/131_A_Duality_View_of_Sp.pdf

Any ideas would be warmly appreciated.

Simon Gatehouse
School of Biological Earth  Environmental Sciences
University of New South Wales

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding per-panel text to panel strips in lattice xyplot

2006-10-17 Thread Richard M. Heiberger

This looks like an application for bottom.strips and right.strips,
in addition to the current left.strips and strips, in each panel.
The new feature is that these new strips might be associated with
different information than the regular strips that reflect
the levels of the conditioning variables.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overlapping intervals

2006-10-17 Thread Giovanni Coppola

Hello Jim,
thank you very much for this elegant code. It does exactly what I need.
I'll check findInterval as well.
Thanks again
Giovanni


On Oct 16, 2006, at 7:22 AM, jim holtman wrote:

 Here is a more general way that looks for the transitions:

 series1-cbind(Start=c(10,21,40,300),End=c(20,26,70,350))
 series2-cbind(Start=c(25,60,210,500),End=c(40,100,400,1000))
 x - rbind(series1, series2)
 # create +1 for start and -1 for end
 x.s - rbind(cbind(x[,1], 1), cbind(x[,2], -1))
 #sort by start times
 x.s - x.s[order(x.s[,1]),]
 # cumsum is a count of the transitions
 x.s - cbind(x.s, cumsum(x.s[,2]))
 # c(1,2) is start and c(-1,1) is the end of an overlap
 cbind(x.s[x.s[,2] == 1  x.s[,3] == 2, 1], x.s[x.s[,2] == -1  x.s 
 [,3] == 1, 1])
 [,1] [,2]
 [1,]   25   26
 [2,]   40   40
 [3,]   60   70
 [4,]  300  350


 On 10/15/06, Giovanni Coppola [EMAIL PROTECTED] wrote:
 Hello everybody,

 I have two series of intervals, and I'd like to output the shared
 regions.
 For example:
 series1-cbind(Start=c(10,21,40,300),End=c(20,26,70,350))
 series2-cbind(Start=c(25,60,210,500),End=c(40,100,400,1000))

   series1
  Start End
 [1,]10  20
 [2,]21  26
 [3,]40  70
 [4,]   300 350
   series2
  Start  End
 [1,]25   40
 [2,]60  100
 [3,]   210  400
 [4,]   500 1000

 I'd like to have something like this as result:
   shared
  Start End
 [1,]25  26
 [2,]60  70
 [3,]   300 350

 I found this post, but the solution finds the regions shared across
 all the intervals.
 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/59594.html
 Can anybody help me with this?
 Thanks
 Giovanni

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.



 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] strange error in mtrace

2006-10-17 Thread Vladimir Eremeev

Dear useRs,

I am experiencing very strange error with Mark Bravington's package debug.
I haven't seen them before.

Here is the sample session

 library(debug)
Loading required package: mvbutils
MVBUTILS: no tasks vector found in ROOT
Loading required package: tcltk
Loading Tcl/Tk interface ... done
 x-function() return(1)
 mtrace(x)
 x()
Error in attr(value, row.names) - rlabs : 
row names must be 'character' or 'integer', not 'double'
 mtrace(x,FALSE)
 x()
[1] 1
 mtrace(x)
 x()
Error in attr(value, row.names) - rlabs : 
row names must be 'character' or 'integer', not 'double'
 

This happened with any function, which I tried to debug.

I use R 2.4.0 on Linux and on Windows.

debug version is 1.1.0,
mvbutils version is  1.1.1

on both systems.

Linux R was compiled from sources.

The packages were installed from the Internet repositories using 
install.packages
function on both systems.

update.packages only asks for repository for current session, and does
nothing, which means, that everything is of the latest version.

What could be wrong?

---
Best regards,
Vladimirmailto:[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk crashes with bad color with text widget

2006-10-17 Thread Duncan Murdoch

On 10/17/2006 8:36 AM, Peter Dalgaard wrote:
 Duncan Murdoch [EMAIL PROTECTED] writes:
 
 On 10/16/2006 10:47 PM, Alex Couture-Beil wrote:
 Hello

 I have been playing with tcl/tk in R 2.4.0 on windows XP and have 
 managed to crash R by supplying tcl/tk with an incorrect color.
 Is this a bug? is there a way for me to test the color to see if it is a 
 valid tcl/tk color, to avoid this?

 tt=tktoplevel()
 tklabel(parent=tt, text=hello world, foreground=reed)
 Error in structure(.External(dotTclObjv, objv, PACKAGE = tcltk), 
 class = tclObj) :
 [tcl] unknown color name reed.
 An error is displayed as one would expect, however when I try
 tktext(parent=tt, foreground=blaaack)
 R crashes, rather than displaying an error as tklabel did.

 This, however, does not happen on my FreeBSD machine, which displays an 
 error similar to the one for tklabel and does not crash.
 I see the same crash in Windows, occurring deep in one of the TCL 
 routines, where it tries to work with a font, but the font has not been 
 assigned.

 TK on Windows uses a different display driver than FreeBSD does, so this 
 could be a TK bug, rather than an R bug, and it does look like that. 
 Alternatively, we might be ignoring an error generated in TK, in which 
 case it is our bug:  but the tklabel example makes that sound wrong.

 To verify, it would be nice to try the same commands in wish (or some 
 other TCL/TK platform).  Do you know the pure TCL equivalent?
 
 Should be close to this
 
 toplevel .1
 label .1.1 -text hello world -foreground reed
 text .1.2 -foreground blaack
 
 (and it doesn't crash on my machine, in R or wish)

I only see the crash in R, not in wish.  It happens when, in the midst 
of destroying the partially created text widget, TK needs to create the 
Windows window corresponding to it.  Windows sends some messages to the 
newly created window (including WM_NCCREATE), these lead to TK servicing 
idle events, and one of those involves looking the incomplete text 
object, and that's when things crash.

I don't know why wish can handle the error properly; maybe it just got 
lucky, or maybe it handles the error differently.

I think I'm going to have to give up on this one.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert Contingency Table to Flat File

2006-10-17 Thread Marc Schwartz

Just a quick update on this thread.

The version of expand.dft() that I posted earlier has a bug in it.

This is the result of the use of lapply() and the evaluation of the
additional arguments passed to type.convert().

I noted this when testing the function on the UCBAdmissions data set,
which is a multi-way table used in some help file examples such
as ?as.data.frame.table.

Here is a corrected version:

expand.dft - function(x, na.strings = NA, as.is = FALSE, dec = .)
{
  DF - sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
   simplify = FALSE)

  DF - subset(do.call(rbind, DF), select = -Freq)

  for (i in 1:ncol(DF))
  {
DF[[i]] - type.convert(as.character(DF[[i]]),
na.strings = na.strings,
as.is = as.is, dec = dec)
   
  }

  DF
} 


Thus if we now take the UCBAdmissions multi-way table data and convert
it to a flat contingency table:

FCT - as.data.frame(UCBAdmissions)

 FCT
  Admit Gender Dept Freq
1  Admitted   MaleA  512
2  Rejected   MaleA  313
3  Admitted FemaleA   89
4  Rejected FemaleA   19
5  Admitted   MaleB  353
6  Rejected   MaleB  207
7  Admitted FemaleB   17
8  Rejected FemaleB8
9  Admitted   MaleC  120
10 Rejected   MaleC  205
11 Admitted FemaleC  202
12 Rejected FemaleC  391
13 Admitted   MaleD  138
14 Rejected   MaleD  279
15 Admitted FemaleD  131
16 Rejected FemaleD  244
17 Admitted   MaleE   53
18 Rejected   MaleE  138
19 Admitted FemaleE   94
20 Rejected FemaleE  299
21 Admitted   MaleF   22
22 Rejected   MaleF  351
23 Admitted FemaleF   24
24 Rejected FemaleF  317


Thus, there should be:

 sum(FCT$Freq)
[1] 4526

rows in the final 'raw' data frame.


 DF - expand.dft(FCT)

 str(DF)
'data.frame':   4526 obs. of  3 variables:
 $ Admit : Factor w/ 2 levels Admitted,Rejected: 1 1 1 1 1 1 1 1 1
1 ...
 $ Gender: Factor w/ 2 levels Female,Male: 2 2 2 2 2 2 2 2 2 2 ...
 $ Dept  : Factor w/ 6 levels A,B,C,D,..: 1 1 1 1 1 1 1 1 1
1 ...


Note that the three columns are coerced back to factors, which is of
course the default behavior for data frames.

If we now use:

 DF - expand.dft(FCT, as.is = TRUE)

 str(DF)
'data.frame':   4526 obs. of  3 variables:
 $ Admit : chr  Admitted Admitted Admitted Admitted ...
 $ Gender: chr  Male Male Male Male ...
 $ Dept  : chr  A A A A ...


The three columns stay as character vectors. It was this behavior that
did not work properly in the first version.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Multiple histograms in one plot

2006-10-17 Thread Johann Hibschman

Hi all,

I'm trying to plot multiple histograms in one plot (cross-validation
values of model parameters), but I cannot seem to reduce the margins
enough to fit as many of them in as I would like.

I'm using split.screen to divide the window into a 5x4 grid, then
plotting with hist.  I've tried explicitly reducing the margins with
par(mar=c(1,1,1,1)), but it doesn't seem to have any effect.
Visually, there is a lot of whitespace and very little histogram in my
results.

Can anyone suggest either a better method to visualize these results,
a better way to plot histograms, or a way to actually reduce the
margins used?  The intent is to give a sense of how well-constrained
the various model parameters are.

Thanks,

Johann

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] crush in edit()

2006-10-17 Thread Ei-ji Nakama

It is a problem by stack smashing protector.
--- src/modules/X11/dataentry.c.orig2006-09-04 23:41:34.0 +0900
+++ src/modules/X11/dataentry.c 2006-10-18 11:31:43.0 +0900
@@ -1046,7 +1046,7 @@
for(j=0;*(wcspc+j)!=L'\0';j++)wcs[j]=*(wcspc+j);
wcs[j]=L'\0';
w_p=wcs;
-   cnt=wcsrtombs(s,(const wchar_t **)w_p,sizeof(wcs),NULL);
+   cnt=wcsrtombs(s,(const wchar_t **)w_p,sizeof(s)-1,NULL);
s[cnt]='\0';
 if (textwidth(s, strlen(s))  (bw - text_offset)) break;
 *(++wcspc) = L'';
@@ -1056,7 +1056,7 @@
for(j=0;*(wcspc+j)!=L'\0';j++)wcs[j]=*(wcspc+j);
wcs[j]=L'\0';
w_p=wcs;
-   cnt=wcsrtombs(s,(const wchar_t **)w_p,sizeof(wcs),NULL);
+   cnt=wcsrtombs(s,(const wchar_t **)w_p,sizeof(s)-1,NULL);
s[cnt]='\0';
 if (textwidth(s, strlen(s))  (bw - text_offset)) break;
 *(wcspbuf + i - 2) = L'';
@@ -1066,7 +1066,7 @@
 for(j=0;*(wcspc+j)!=L'\0';j++) wcs[j]=*(wcspc+j);
 wcs[j]=L'\0';
 w_p=wcs;
-cnt=wcsrtombs(s,(const wchar_t **)w_p,sizeof(wcs),NULL);
+cnt=wcsrtombs(s,(const wchar_t **)w_p,sizeof(s)-1,NULL);

 drawtext(x_pos + text_offset, y_pos + box_h - text_offset, s, cnt);

@@ -2398,6 +2398,7 @@
 int cnt;
 char last_mbs[8];
 char *mbs;
+size_t bytes;

 mbs = (str == NULL) ? buf : str;

@@ -2411,8 +2412,8 @@
 if(wcs[0] == L'\0') return 0;

 memset(last_mbs, 0, sizeof(last_mbs));
-wcrtomb(last_mbs, wcs[cnt-1], mb_st);
-return(strlen(last_mbs));
+bytes=wcrtomb(last_mbs, wcs[cnt-1], mb_st); /* -Wall */
+return(bytes);
 #else
 return(1);
 #endif


2006/10/18, crazybuddy Vincent [EMAIL PROTECTED]:
 Dear all,

 I am new to R system. When I tried to edit data read from a csv file, R
 system crushed, I got an error message as follows:

  edit(data)
 *** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
 === Backtrace: =
 /lib/libc.so.6(__chk_fail+0x41)[0x49d020b1]
 /lib/libc.so.6[0x49d034a2]
 /usr/lib/R/modules//R_X11.so[0x33ed7a]
 /usr/lib/R/modules//R_X11.so[0x34050d]
 /usr/lib/R/modules//R_X11.so[0x341858]
 /usr/lib/R/modules//R_X11.so(RX11_dataentry+0xa25)[0x342f45]
 /usr/lib/R/lib/libR.so[0xa34675]
 /usr/lib/R/lib/libR.so[0x954ed6]
 /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
 /usr/lib/R/lib/libR.so[0x929ed8]
 /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
 /usr/lib/R/lib/libR.so[0x926a37]
 /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
 /usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x928117]
 /usr/lib/R/lib/libR.so[0x95661f]
 /usr/lib/R/lib/libR.so(Rf_usemethod+0x609)[0x957a89]
 /usr/lib/R/lib/libR.so[0x95825e]
 /usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x925b23]
 /usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x928117]
 /usr/lib/R/lib/libR.so(Rf_eval+0x2f4)[0x925994]
 /usr/lib/R/lib/libR.so(Rf_ReplIteration+0x311)[0x945361]
 /usr/lib/R/lib/libR.so[0x945571]
 /usr/lib/R/lib/libR.so(run_Rmainloop+0x60)[0x9458c0]
 /usr/lib/R/lib/libR.so(Rf_mainloop+0x1c)[0x9458ec]
 /usr/lib/R/bin/exec/R(main+0x46)[0x80486f6]
 /lib/libc.so.6(__libc_start_main+0xdc)[0x49c3b4e4]
 /usr/lib/R/bin/exec/R[0x80485f1]
 === Memory map: 
 00111000-0012f000 r-xp  fd:00 16943095
 /usr/lib/R/library/grDevices/libs/grDevices.so
 0012f000-0013 rwxp 0001d000 fd:00 16943095
 /usr/lib/R/library/grDevices/libs/grDevices.so
 0013-00181000 r-xp  fd:00 16976568
 /usr/lib/R/library/stats/libs/stats.so
 00181000-00183000 rwxp 00051000 fd:00 16976568
 /usr/lib/R/library/stats/libs/stats.so
 00339000-00352000 r-xp  fd:00 15959326   /usr/lib/R/modules/R_X11.so
 00352000-00353000 rwxp 00018000 fd:00 15959326   /usr/lib/R/modules/R_X11.so
 00353000-0035f000 rwxp 00353000 00:00 0
 0048-00496000 r-xp  fd:00 15303387   /usr/lib/gconv/SJIS.so
 00496000-00498000 rwxp 00015000 fd:00 15303387   /usr/lib/gconv/SJIS.so
 0056e000-00598000 r-xp  fd:00 16452204   /usr/lib/R/lib/libRblas.so
 00598000-00599000 rwxp 00029000 fd:00 16452204   /usr/lib/R/lib/libRblas.so
 00848000-00851000 r-xp  fd:00 15204401   /lib/libnss_files-2.4.so
 00851000-00852000 r-xp 8000 fd:00 15204401   /lib/libnss_files-2.4.so
 00852000-00853000 rwxp 9000 fd:00 15204401   /lib/libnss_files-2.4.so
 00885000-00abd000 r-xp  fd:00 16452203   /usr/lib/R/lib/libR.so
 00abd000-00aca000 rwxp 00238000 fd:00 16452203   /usr/lib/R/lib/libR.so
 00aca000-00b61000 rwxp 00aca000 00:00 0
 00c47000-00c4d000 r-xp  fd:00 16944203
 /usr/lib/R/library/methods/libs/methods.so
 00c4d000-00c4e000 rwxp 5000 fd:00 16944203
 /usr/lib/R/library/methods/libs/methods.so
 00eb6000-00f31000 r-xp  fd:00 15242987
 /usr/lib/libgfortran.so.1.0.0
 00f31000-00f32000 rwxp 0007b000 fd:00 15242987
 /usr/lib/libgfortran.so.1.0.0
 00f44000-00f45000 r-xp  fd:00 15303344   /usr/lib/gconv/ISO8859-1.so
 00f45000-00f47000 rwxp  fd:00 15303344

[R] MARS help?

2006-10-17 Thread Spencer Graves

  I'm trying to use mars{mda} to model functions that look fairly 
close to a sequence of straight line segments.  Unfortunately, 'mars' 
seems to totally miss the obvious places for the knots in the apparent 
first order spline model, and I wonder if someone can suggest a better 
way to do this.  The following example consists of a slight downward 
trend followed by a jump up after t1=4, following by a more marked 
downward trend after t1=5: 

Dat0 - cbind(t1=1:10,
   x=c(1, 0, 0, 90, 99, 95, 90, 87, 80, 77))
library(mda)
fit0 - mars(Dat0[, 1, drop=FALSE], Dat0[, 2],
 penalty=.001)
plot(Dat0, type=l)
lines(Dat0[, 1], fit0$fitted.values,
  lty=2, col=red)

  Are there 'mars' options I'm missing or other software I should be 
using? 

  I've got thousands of traces crudely like this of different 
lengths, and I want an automated way of summarizing similar traces in 
terms of a fixed number of knots and associated slopes for each linear 
spline segment max(0, t1-t.knot). 

  Thanks,
  Spencer Graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sqlSave, fast=F option, bug?

2006-10-17 Thread Paul MacManus

Hi,

Using the fast=F option, sqlSave saves without matching column names.
It looks like a bug to me..

Here's a simple (artificial) example.
-
Create a dataframe and save it to a database table test as follows:

df - data.frame(T=1, S=10)
sqlSave(channel, df, test, rownames=F)

The table now looks like

T  S
1  10

If I create another dataframe and save as follows

df - data.frame(S=20, T=2)
sqlSave(channel, df, test, rownames=F, append=T)

Then table test now looks like

T  S
1  10
2  20

The important point is that although S was the first column of df,
sqlSave checked the column names and matched the corresponding columns
of df and table test.

However, if I now create another dataframe and save it using the
fast=F option as follows

df - data.frame(S=30, T=3)
sqlSave(channel, df, test, rownames=F, append=T, fast=F)

the table test now looks like

T  S
1  10
2  20
30 3

In other words, sqlSave didn't check column names, it simply mapped
column 1 to column 1 and column 2 to column 2.
-

This cannot be right. Opinions?

I'm using R 2.3.0 and package RODBC 1.1-7 on Windows XP with MS SQL Server

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re : Book recommendation for newbie to stats and R?

2006-10-17 Thread Anupam Tyagi

Hi Kevin,

justin bem justin_bem at yahoo.fr writes:

 
 Exact reference is : 
  Wonnacot, T., Wonnacot, R., 
  Introductory Statistics for Business and Economics, 
  New York, 1990
 

Though now about R, a good book to read for analyzing non-experimental data (and
even experimental data) is Identification Problems in the Social Sciences by
Charles Manski. It is a small, clearly written book, with examples. Providing a
reasonable answer (including caveats) to the kind of typical problem you
described in your initial post will benefit from this. You should atleast
consider this an important supplement. See the link below. Anupam.
http://www.hup.harvard.edu/catalog/MANIDE.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Review process for new packages

2006-10-17 Thread Anupam Tyagi

Hello,

Duncan Murdoch murdoch at stats.uwo.ca writes:

 
 On 10/17/2006 2:22 AM, Andreas Wittmann wrote:
  Hi all, 
  
  i'm currently working on a creditmetrics package which includes functions
for computing the credit risk
 model creditmetrics. I guess it would be finished in a few days. 
  
  My question now is, does there exist some review process before sending it
to ctan or is it reviewed after
 having sended it?
 
 There's no review process to decide whether your package is useful or 
 well-written.  If you want that kind of review you should submit it to 
 the Journal of Statistical Software.

Although, this is a sensitive issue, it is unfortunate that such review (or
comment, if that is a more suitable word) process is not available at R. Is it
possible to have some process where people can provide comments, even if it is
not a journal review. It can help in improving the quality of packages
submitted to R, in reducing bugs, or simply catching errors (coding and
non-coding) that the author may have over-looked by mistake. Will contributing
something to R, on provisional basis, and then asking for comments, and then
submitting a final version work? 

It may also help to require the author to include a mathematical description of
what has been submitted, if it is a statistical function. This be because most
new users find it difficult to read R code at the level of functions. They may
also not be familiar with the statistical concept, but may know about it
mathematically---because different disciplines have differentiated their
specialized terminology (with some variation) as discipline specific statistical
applications have evolved. I think this will make R more accessible to a wider
user-base. 
---Anupam.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange error in mtrace

2006-10-17 Thread Francisco J. Zagmutt

I vaguely remember seeing a bug report for the debug library a few days ago 
but I can't find the thread. I think it was a compatibility issue with 2.4.0 
but the mantainer was already working on it. Maybe somebody else can provide 
a link to the specific posting?

I am sorry I can't be of more assistance.

Best regards,

Francisco

Dr. Francisco J. Zagmutt
College of Veterinary Medicine and Biomedical Sciences
Colorado State University




From: Vladimir Eremeev [EMAIL PROTECTED]
Reply-To: Vladimir Eremeev [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Subject: [R] strange error in mtrace
Date: Tue, 17 Oct 2006 18:33:45 -0800

Dear useRs,

I am experiencing very strange error with Mark Bravington's package 
debug.
I haven't seen them before.

Here is the sample session

  library(debug)
Loading required package: mvbutils
MVBUTILS: no tasks vector found in ROOT
Loading required package: tcltk
Loading Tcl/Tk interface ... done
  x-function() return(1)
  mtrace(x)
  x()
Error in attr(value, row.names) - rlabs :
 row names must be 'character' or 'integer', not 'double'
  mtrace(x,FALSE)
  x()
[1] 1
  mtrace(x)
  x()
Error in attr(value, row.names) - rlabs :
 row names must be 'character' or 'integer', not 'double'
 

This happened with any function, which I tried to debug.

I use R 2.4.0 on Linux and on Windows.

debug version is 1.1.0,
mvbutils version is  1.1.1

on both systems.

Linux R was compiled from sources.

The packages were installed from the Internet repositories using 
install.packages
function on both systems.

update.packages only asks for repository for current session, and does
nothing, which means, that everything is of the latest version.

What could be wrong?

---
Best regards,
Vladimirmailto:[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

96 matches

Mail list logo