date:20050120

[R] Interpreting Rprof output

2005-01-20 Thread Ale iberna

Hello!

I have run Rprof on a function of mine and the results look very strange, 
to say the least. At the end I of this email is an output of summaryRprof. Can 
someone help me interpret this output? I have read the appropriate section in 
the manual Writing R Extensions and help pages. 

 

If I understand this output correctly, it is saying that unlist has been 
active in every interval and all functions or the functions they have called 
have been active in every interval. Is that correct?

It seams strange that the larges time is 0.02, since the function that was 
Rprof-ed ran for at least an hour if not more.

 

I am also surprised that some functions are not listed here, especially for, 
if, as.character, and some others.

 

Could it be that this is the result of the type of my function? The function 
that was originally called is opt.random.par. This function in turn called 
function opt.par.check.to.skip.iter via do.call 100 times in a for loop. 
This later function calls function crit.fun form about 70 to 300 times with 
double for loop in each iteration and does usually do 1 to 10 iterations. 
This crit.fun itself is quite quick, on these data it takes about 0.04s. 

 

I know it would be helpful to provide the functions and the data that produced 
this results, however I can not disclose them at this point. 

 

Thank you in advance for any suggestions.

Ales Ziberna




output of summaryRprof:
$by.self
   self.time self.pct total.time total.pct
unlist  0.02  100   0.02   100
any 0.000   0.02   100
crit.fun0.000   0.02   100
diag0.000   0.02   100
do.call 0.000   0.02   100
opt.par.check.to.skip.iter  0.000   0.02   100
opt.random.par  0.000   0.02   100
sapply  0.000   0.02   100
sum 0.000   0.02   100

$by.total
   total.time total.pct self.time self.pct
unlist   0.02   100  0.02  100
any  0.02   100  0.000
crit.fun 0.02   100  0.000
diag 0.02   100  0.000
do.call  0.02   100  0.000
opt.par.check.to.skip.iter   0.02   100  0.000
opt.random.par   0.02   100  0.000
sapply   0.02   100  0.000
sum  0.02   100  0.000

$sampling.time
[1] 0.02

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Creating a custom connection to read from multiple files

2005-01-20 Thread Tomas Kalibera

Hello,
is it possible to create my own connection which I could use with
read.table or scan ? I would like to create a connection that would read
from multiple files in sequence (like if they were concatenated),
possibly with an option to skip first n lines of each file. I would like
to avoid using platform specific scripts for that... (currently I invoke
/bin/cat from R to create a concatenation of all those files).
Thanks,
Tomas
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Creating a custom connection to read from multiple files

2005-01-20 Thread Prof Brian Ripley

On Thu, 20 Jan 2005, Tomas Kalibera wrote:
is it possible to create my own connection which I could use with
Yes.  In a sense, all the connections are custom connections written by 
someone.

read.table or scan ? I would like to create a connection that would read
from multiple files in sequence (like if they were concatenated),
possibly with an option to skip first n lines of each file. I would like
to avoid using platform specific scripts for that... (currently I invoke
/bin/cat from R to create a concatenation of all those files).
I would use pipes, but a pure R solution is to process the files to an 
anonymous file() connection and then read that.

However, what is wrong with reading a file at a time and combining the 
results in R using rbind?

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Creating a custom connection to read from multiple files

2005-01-20 Thread Tomas Kalibera

Dear Prof Ripley,
thanks for your suggestions, it's very nice one can create custom 
connections directly in R and I think it is what I need just now.

However, what is wrong with reading a file at a time and combining the 
results in R using rbind?

Well, the problem is performance. If I concatenate all those files, they 
have around 8MB, can grow to tens of MBs in near future.

Both concatenating and reading from a single file by scan takes 5 
seconds (which is almost OK).

However, reading individual files by read.table and rbinding one by one 
( samples=rbind(samples, newSamples ) takes minutes. The same is when I 
concatenate lists manually. Scan does not help significantly. I guess 
there is some overhead in detecting dimensions of objects in rbind (?) 
or re-allocation or copying data ?

Best regards,
Tomas Kalibera
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Creating a custom connection to read from multiple files

2005-01-20 Thread Prof Brian Ripley

On Thu, 20 Jan 2005, Tomas Kalibera wrote:
Dear Prof Ripley,
thanks for your suggestions, it's very nice one can create custom connections 
directly in R and I think it is what I need just now.

However, what is wrong with reading a file at a time and combining the 
results in R using rbind?

Well, the problem is performance. If I concatenate all those files, they have 
around 8MB, can grow to tens of MBs in near future.

Both concatenating and reading from a single file by scan takes 5 seconds 
(which is almost OK).

However, reading individual files by read.table and rbinding one by one ( 
samples=rbind(samples, newSamples ) takes minutes. The same is when I 
concatenate lists manually. Scan does not help significantly. I guess there 
is some overhead in detecting dimensions of objects in rbind (?) or 
re-allocation or copying data ?
rbind is vectorized so you are using it (way) suboptimally.
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Problem loading a library

2005-01-20 Thread Marco Sandri

Hi.
I have R (Ver 2.0) correctly running on a Suse 9.0
Linux machine.

I correclty installed the Logic Regression LogicReg library
(by the command: R CMD INSTALL LogicReg)
developed by Ingo Ruczinski and Charles Kooperberg :
http://bear.fhcrc.org/~ingor/logic/html/program.html

When I try to load the library in R by the command:
library(LogicReg)
I get the following error:
Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
/usr/lib/R/library/LogicReg/libs/LogicReg.so:
/usr/lib/R/library/LogicReg/libs/LogicReg.so: cannot map zero-fill pages:
Cannot allocate memory
Error in library(LogicReg) : .First.lib failed

How could I solve the problem?
Thanks in advance for your kind help.
Marco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Creating a custom connection to read from multiple files

2005-01-20 Thread Liaw, Andy

 From: Prof Brian Ripley
 
 On Thu, 20 Jan 2005, Tomas Kalibera wrote:
 
  Dear Prof Ripley,
 
  thanks for your suggestions, it's very nice one can create 
 custom connections 
  directly in R and I think it is what I need just now.
 
  However, what is wrong with reading a file at a time and 
 combining the 
  results in R using rbind?
  
  Well, the problem is performance. If I concatenate all 
 those files, they have 
  around 8MB, can grow to tens of MBs in near future.
 
  Both concatenating and reading from a single file by scan 
 takes 5 seconds 
  (which is almost OK).
 
  However, reading individual files by read.table and 
 rbinding one by one ( 
  samples=rbind(samples, newSamples ) takes minutes. The same 
 is when I 
  concatenate lists manually. Scan does not help 
 significantly. I guess there 
  is some overhead in detecting dimensions of objects in rbind (?) or 
  re-allocation or copying data ?
 
 rbind is vectorized so you are using it (way) suboptimally.

Here's an example:

  ## Create a 500 x 100 data matrix.
  x - matrix(rnorm(5e4), 500, 100)
  ## Generate 50 filenames.
  fname - paste(f, formatC(1:50, width=2, flag=0), .txt, sep=)
  ## Write the data to files 50 times.
  for (f in fname) write(t(x), file=f, ncol=ncol(x))
  
  ## Read the files into a list of data frames.
  system.time(datList - lapply(fname, read.table, header=FALSE),
gcFirst=TRUE)
[1] 11.91  0.05 12.33NANA
  ## Specify colClasses to speed up.
  system.time(datList - lapply(fname, read.table,
colClasses=rep(numeric, 100)),
+  gcFirst=TRUE)
[1] 10.69  0.07 10.79NANA
  ## Stack them together.
  system.time(dat - do.call(rbind, datList), gcFirst=TRUE)
[1] 5.34 0.09 5.45   NA   NA
  
  ## Use matrices instead of data frames.
  system.time(datList - lapply(fname, 
+  function(f) matrix(scan(f), ncol=100, byrow=TRUE)), gcFirst=TRUE)
Read 5 items
...
Read 5 items
[1]  9.49  0.08 15.06NANA
  system.time(dat - do.call(rbind, datList), gcFirst=TRUE)
[1] 0.09 0.03 0.12   NA   NA
  ## Clean up the files.
  unlink(fname)

A couple of points:

- Usually specifying colClasses will make read.table() quite a bit 
  faster, even though it's only marginally faster here.  Look back
  in the list archive to see examples.

- If your data files are all numerics (as in this example), 
  storing them in matrices will be much more efficient.  Note
  the difference in rbind()ing the 50 data frames and 50 
  matrices (5.34 seconds vs. 0.09!).  rbind.data.frame()
  needs to ensure that the resulting data frame has unique
  rownames (a requirement for a legit data frame), and
  that's probably taking a big chunk of the time.

Andy

 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Constructing Matrices

2005-01-20 Thread Doran, Harold

Dear List:

I am working to construct a matrix of a particular form. For the most
part, developing the matrix is simple and is built as follows:

vl.mat-matrix(c(0,0,0,0,0,64,0,0,0,0,64,0,0,0,0,64),nc=4)

Now to expand this matrix to be block-diagonal, I do the following:

sample.size - 100 # number of individual students
I- diag(sample.size)
bd.mat-kronecker(I,vl.mat)

This creates a block-diagonal matrix with variances along the diagonal
and covariances within-student to be zero (I am working with
longitudinal student achievement data). However, across student, I want
to have the correlation equal to 1 for each variance term. To
illustrate, here is a matrix for 2 students. The goal is for the
correlation between the second variance term for student 1 to be
perfectly correlated with the variance term for student 2. In other
words, I need to plug in 64 at position (6,2) and (2,6), another 64 at
position (7,3) and (3,7) and another 64 at positions (8,4) and (4,8).
I'm having some difficulty conceptualizing how to construct this part of
the matrix and would appreciate any thoughts.

Thank you,
Harold


 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]00000000
[2,]0   64000000
[3,]00   6400000
[4,]000   640000
[5,]00000000
[6,]00000   6400
[7,]000000   640
[8,]0000000   64

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Problem loading a library

2005-01-20 Thread Prof Brian Ripley

On Thu, 20 Jan 2005, Marco Sandri wrote:
Hi.
I have R (Ver 2.0) correctly running on a Suse 9.0
Linux machine.
32- or 64-bit?
I correclty installed the Logic Regression LogicReg library
(by the command: R CMD INSTALL LogicReg)
developed by Ingo Ruczinski and Charles Kooperberg :
http://bear.fhcrc.org/~ingor/logic/html/program.html
When I try to load the library in R by the command:
library(LogicReg)
I get the following error:
Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
/usr/lib/R/library/LogicReg/libs/LogicReg.so:
/usr/lib/R/library/LogicReg/libs/LogicReg.so: cannot map zero-fill pages:
Cannot allocate memory
Error in library(LogicReg) : .First.lib failed
How could I solve the problem?
Use a different machine?  That package works on all of mine, 32- and 
64-bit.

BTW, the posting guide does suggest you contact the package authors first, 
so what do they say?

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Constructing Matrices

2005-01-20 Thread Liaw, Andy

I'm still not clear on exactly what your question is.  If you can plug in
the numbers you want in, say, the lower triangular portion, you can copy
those to the upper triangular part easily; something like:

m[upper.tri(m)] - m[lower.tri(m)]

Is that what you're looking for?

Andy

 From: Doran, Harold
 
 Dear List:
 
 I am working to construct a matrix of a particular form. For the most
 part, developing the matrix is simple and is built as follows:
 
 vl.mat-matrix(c(0,0,0,0,0,64,0,0,0,0,64,0,0,0,0,64),nc=4)
 
 Now to expand this matrix to be block-diagonal, I do the following:
 
 sample.size - 100 # number of individual students
 I- diag(sample.size)
 bd.mat-kronecker(I,vl.mat)
 
 This creates a block-diagonal matrix with variances along the diagonal
 and covariances within-student to be zero (I am working with
 longitudinal student achievement data). However, across 
 student, I want
 to have the correlation equal to 1 for each variance term. To
 illustrate, here is a matrix for 2 students. The goal is for the
 correlation between the second variance term for student 1 to be
 perfectly correlated with the variance term for student 2. In other
 words, I need to plug in 64 at position (6,2) and (2,6), another 64 at
 position (7,3) and (3,7) and another 64 at positions (8,4) and (4,8).
 I'm having some difficulty conceptualizing how to construct 
 this part of
 the matrix and would appreciate any thoughts.
 
 Thank you,
 Harold
 
 
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]00000000
 [2,]0   64000000
 [3,]00   6400000
 [4,]000   640000
 [5,]00000000
 [6,]00000   6400
 [7,]000000   640
 [8,]0000000   64
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] References

2005-01-20 Thread kolluru ramesh

Where can I get the literature on Multiple Imputation using Additive 
Regressing, Bootstrapping, Predictive Mean Matching

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread michael watson \(IAH-C\)

I think that title makes sense... I hope it does...

I have a data frame, one of the columns of which is a factor.  I want
the rows of data that correspond to the level in that factor which
occurs the most times.  

I can get a list by doing:

by(data,data$pattern,subset)

And go through each element of the list counting the rows, to find the
maximum

BUT I can't help thinking there's a more elegant way of doing this

The second part is figuring out the rows which have the maximum number
of consecutive patterns which are the same... Now that I would love some
help with... :-)

Thanks
Mick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Reference Material for Multiple Imputation

2005-01-20 Thread kolluru ramesh

Any particular site where I get some examples and references in Multiple 
Imputation using Bootstrapping


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Constructing Matrices

2005-01-20 Thread Doran, Harold

I should probably have explained my data and model a little better.
Assume I have student achievement scores across four time points. I
estimate the model using gls() as follows

fm1 - gls(score ~ time, long, correlation=corAR1(form=~1|stuid),
method='ML')

I can now extract the variance-covariance matrix for this model as
follows:

var.mat-getVarCov(fm1)

Assume for sake of argument I have a sample size of 100 students. I can
expand this to the full matrix as follows
I-diag(100)
V-kronecker(I,var.mat)

For my particular model, the scores within each student are assumed
correlated (AR1), but across student are uncorrelated. Now, for a
particular problem I am dealing with I need to make some adjustments to
this matrix, V,  and reestimate the gls(). The adjustments I need to
make cannot be done using any of the existing varFunc classes, so I am
having to do this manually.

What I need to do is create a new matrix manually, add it to V, then
reestimate the gls. Creating this new matrix is the challenge I
currently face, let's call it v.prime. 

The issue at hand is creating v.prime to have non-zero covariance terms
across students in very specific places. The matrix I used below is only
for two students. But assume I am doing this for thousands of students.
My goal is to create a full block-diagonal covariance matrix where the
correlation across students at time two is always perfectly correlated
and the correlation at time three is always perfectly correlated across
students. So, within each block of v.prime, the variances are
uncorrelated, but across each block the variances are correlated. 

So, I need to construct v.prime such that it is one the same order of V
to make them conformable for addition. More importantly, I need the
off-diagonal elements across students to represent a perfect correlation
in very specific places. In the example below, if there was a 64 at
position (2,6) this would represent a perfect correlation between
student 1 and 2 at this point in time since the variance along the
diagonal at time 2 is 64. Since I am doing this for many students, there
would need to be a 64 between student 1 and all other students (not just
student 2) and so on.

From here I can use R's matrix facilities to reestimate the gls.

I hope this clarifies a bit.

Harold

-Original Message-
From: Liaw, Andy [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 20, 2005 8:41 AM
To: Doran, Harold; r-help@stat.math.ethz.ch
Subject: RE: [R] Constructing Matrices

I'm still not clear on exactly what your question is.  If you can plug
in the numbers you want in, say, the lower triangular portion, you can
copy those to the upper triangular part easily; something like:

m[upper.tri(m)] - m[lower.tri(m)]

Is that what you're looking for?

Andy

 From: Doran, Harold
 
 Dear List:
 
 I am working to construct a matrix of a particular form. For the most 
 part, developing the matrix is simple and is built as follows:
 
 vl.mat-matrix(c(0,0,0,0,0,64,0,0,0,0,64,0,0,0,0,64),nc=4)
 
 Now to expand this matrix to be block-diagonal, I do the following:
 
 sample.size - 100 # number of individual students
 I- diag(sample.size)
 bd.mat-kronecker(I,vl.mat)
 
 This creates a block-diagonal matrix with variances along the diagonal

 and covariances within-student to be zero (I am working with 
 longitudinal student achievement data). However, across student, I 
 want to have the correlation equal to 1 for each variance term. To 
 illustrate, here is a matrix for 2 students. The goal is for the 
 correlation between the second variance term for student 1 to be 
 perfectly correlated with the variance term for student 2. In other 
 words, I need to plug in 64 at position (6,2) and (2,6), another 64 at

 position (7,3) and (3,7) and another 64 at positions (8,4) and (4,8).
 I'm having some difficulty conceptualizing how to construct this part 
 of the matrix and would appreciate any thoughts.
 
 Thank you,
 Harold
 
 
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]00000000
 [2,]0   64000000
 [3,]00   6400000
 [4,]000   640000
 [5,]00000000
 [6,]00000   6400
 [7,]000000   640
 [8,]0000000   64
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 



--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

[R] Cauchy's theorem

2005-01-20 Thread Robin Hankin

In complex analysis, Cauchy's integral theorem states (loosely 
speaking) that the path integral
of any entire differentiable function, around any closed curve, is zero.

I would like to see this numerically, using R (and indeed I would like 
to use the
residue theorem as well).

Has anyone coded up path integration?

--
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
European Way, Southampton SO14 3ZH, UK
 tel  023-8059-7743
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] (no subject)

2005-01-20 Thread Virginie Rondeau

Hello
I would like to compare the results obtained with a classical non 
parametric proportionnal hazard model with a parametric proportionnal 
hazard model using a Weibull.

How can we obtain the equivalence of the parameters using coxph(non 
parametric model) and survreg(parametric model) ?

Thanks
Virginie
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (no subject)

2005-01-20 Thread Göran Broström

On Thu, Jan 20, 2005 at 03:18:53PM +0100, Virginie Rondeau wrote:
 Hello
 I would like to compare the results obtained with a classical non 
 parametric proportionnal hazard model with a parametric proportionnal 
 hazard model using a Weibull.
 
 How can we obtain the equivalence of the parameters using coxph(non 
 parametric model) and survreg(parametric model) ?

One way of avoiding this problem is to fit the Weibull model with 'weibreg'
in the package eha.


-- 
 Göran Broströmtel: +46 90 786 5223
 Department of Statistics  fax: +46 90 786 6614
 Umeå University   http://www.stat.umu.se/egna/gb/
 SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Johnson transformation

2005-01-20 Thread

Hello,
I'm Carla, an italian student, I'm looking for a package to transform non 
normal data to normality. I tried to use Box Cox, but it's not ok. There is a 
package to use Johnson families' transormation? Can you give me any suggestions 
to find free software as R that use this trasform?
Thank yuo very much
Carla





6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero!
Scaricalo su INTERNET GRATIS 6X http://www.libero.it

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (no subject)

2005-01-20 Thread Frank E Harrell Jr

Virginie Rondeau wrote:
Hello
I would like to compare the results obtained with a classical non 
parametric proportionnal hazard model with a parametric proportionnal 
hazard model using a Weibull.

How can we obtain the equivalence of the parameters using coxph(non 
parametric model) and survreg(parametric model) ?

Thanks
Virginie
In the Design package look at the pphsm function that converts a survreg 
Weibull fit (fitted by the psm function which is an adaptation of 
survreg) to PH form.

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Johnson transformation

2005-01-20 Thread Charles Annis, P.E.

Greetings, Carla:

While it is possible to map any proper density into a normal through their
CDFs, that may not be useful in your case.

I suggest that you first plot your data.
?qqnorm

(Type ?qqnorm on the R command line and hit Enter.)

Are your data continuous, or do they occur in groups?  Do the data curve?
Do they look like two (or more) distinct lines?

If your data have only one mode and if they are smooth then the Box-Cox
transform should provide a symmetrical result.  Not all symmetrical
densities are normal, of course.  And if your data are discrete then using a
continuous density like the normal (or Johnson family) is inappropriate.

The purpose of fitting a distribution to data is usually to permit some
probability statement, like Prob(x  X) = alpha.  Why do you want to use the
Johnson family?  I am not aware of convenient methods for making such
probability statements for them.

Best wishes.


Charles Annis, P.E.
 
[EMAIL PROTECTED]
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 10:16 AM
To: r-help
Subject: [R] Johnson transformation

Hello,
I'm Carla, an italian student, I'm looking for a package to transform non
normal data to normality. I tried to use Box Cox, but it's not ok. There is
a package to use Johnson families' transormation? Can you give me any
suggestions to find free software as R that use this trasform?
Thank yuo very much
Carla





6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero!
Scaricalo su INTERNET GRATIS 6X http://www.libero.it

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Johnson transformation

2005-01-20 Thread Bob Wheeler

[EMAIL PROTECTED] wrote:
Hello,
I'm Carla, an italian student, I'm looking for a package to transform non 
normal data to normality. I tried to use Box Cox, but it's not ok. There is a 
package to use Johnson families' transormation? Can you give me any suggestions 
to find free software as R that use this trasform?
Thank yuo very much
Carla


6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero!
Scaricalo su INTERNET GRATIS 6X http://www.libero.it
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
The Johnson system is in the SuppDists package.
--
Bob Wheeler --- http://www.bobwheeler.com/
ECHIP, Inc. ---
Randomness comes in bunches.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Cauchy's theorem

2005-01-20 Thread dr mike

 I don't know about the 'in R' bit, but ISTR that Monte-Carlo (or pseudo
Monte-Carlo) Integration is a way of doing this 'numerically'. I know that
Mathematica implements the (pseudo Monte-Carlo)
Halton-Hammersley-Wozniakowski algorithm as Nintegrate. Perhaps something
equivalent has been coded by someone for WINBUGS (OPENBUGS) (accessible from
R via the BRUGS package).

HTH

Mike

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin
Sent: 20 January 2005 14:14
To: R-help@stat.math.ethz.ch
Subject: [R] Cauchy's theorem

In complex analysis, Cauchy's integral theorem states (loosely
speaking) that the path integral
of any entire differentiable function, around any closed curve, is zero.

I would like to see this numerically, using R (and indeed I would like to
use the residue theorem as well).

Has anyone coded up path integration?




--
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] glm and percentage data with many zero values

2005-01-20 Thread Christian Kamenik

Dear all,
I am interested in correctly testing effects of continuous environmental
variables and ordered factors on bacterial abundance. Bacterial
abundance is derived from counts and expressed as percentage. My problem
is that the abundance data contain many zero values:
Bacteria -
c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0,0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15)

First I tried transforming the data (e.g., logit) but because of the
zeros I was not satisfied. Next I converted the percentages into integer
values by round(Bacteria*10) or ceiling(Bacteria*10) and calculated a
glm with a Poisson error structure; however, I am not very happy with
this approach because it changes the original percentage data
substantially (e.g., 0.03 becomes either 0 or 1). The same is true for
converting the percentages into factors and calculating a multinomial or
proportional-odds model (anyway, I do not know if this would be a
meaningful approach).
I was searching the web and the best answer I could get was
http://www.biostat.wustl.edu/archives/html/s-news/1998-12/msg00010.html
in which several persons suggested quasi-likelihood. Would it be
reasonable to use a glm with quasipoisson? If yes, how I can I find the
appropriate variance function? Any other suggestions?

Many thanks in advance, Christian

Christian Kamenik
Institute of Plant Sciences
University of Bern
Altenbergrain 21
3013 Bern
Switzerland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread Liaw, Andy

 From: Douglas Bates
 
 michael watson (IAH-C) wrote:
  I think that title makes sense... I hope it does...
  
  I have a data frame, one of the columns of which is a 
 factor.  I want
  the rows of data that correspond to the level in that factor which
  occurs the most times.  
 
 So first you want to determine the mode (in the sense of the most 
 frequently occuring value) of the factor.   One way to do this is
 
 names(which.max(table(fac)))
 
 Use this comparison for the subset as
 
 subset(data, pattern == names(which.max(table(pattern

Just be careful that if there are ties (i.e., more than one level having the
max) which.max() will randomly pick one of them.  That may or may not be
what's desired.  If that is a possibility, Mick will need to think what he
wants in such cases.

Andy

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] ROracle error

2005-01-20 Thread Sabrina Carpentier

 I am running R 2.0.0 on a SunOs 5.9 machine and using Oracle 8i.1.7.0.0 
(enterprise edition)

and when I  try to load ROracle I receive the following error:

require(ROracle)
Loading required package: ROracle 
Loading required package: DBI 
Error in dyn.load(x, as.logical(local), as.logical(now)) : 
unable to load shared library 
/bioinfo/local/R/lib/R/library/ROracle/libs/ROracle.so:
  ld.so.1: /bioinfo/local/R/lib/R/bin/exec/R: fatal: relocation error: file 
/bioinfo/local/R/lib/R/library/ROracle/libs/ROracle.so: symbol ncrov: 
referenced symbol not found
[1] FALSE

 Any help is appreciated 

Regards,
Sabrina

Sabrina Carpentier
Service Bioinformatique
Institut Curie - Bat. Trouillet Rossignol (4e étage)
26 rue d'Ulm - 75248 Paris Cedex 5 - FRANCE
[EMAIL PROTECTED]
Tel : +33 1 42 34 65 21 
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] easing out of Excel

2005-01-20 Thread Greg Snow

 Paul Sorenson [EMAIL PROTECTED] 01/19/05 03:18PM 
 I know enough about R to be dangerous and our marketing people have
 asked me to automate some reporting.  Data comes from an SQL
source
 and graphs and various summaries are currently created manually in
 Excel.  The raw information is invoicing records and the reporting
is
 basically summaries by customer, region, product line etc.
 
 With function such as aggregate(), hist() and pareto() (which
someone
 on this list kindly pointed me at) I can produce something roughly
 equivalent to the current reports.
 
 My question is, are there any neat R lock out features people
here
 like to use on this kind of info, particularly when the output is
very
 visual (report is intended for marketing people).
 
 Another way of looking at this is, What kind of hidden
information
 can I extract with R that the Excel solution hasn't touched?

Since you are looking for summaries within groups, you should look at
the
lattice package and some of the plots that you can produce with it
(maybe
for each product line you can produce a lattice/trellis graph with each
panel
representing a region and different colors symbols within panels to
represent
different customers).

If we had more of an idea of what you are looking for, we could give
better
suggestions.


 For example, even the pareto plot mentioned earlier is something
the
 Excel guys haven't thought of or can't easily produce.
 
 regards
 
 BTW the tool chain I am using goes something like:
  Production (run daily):
  DB - SQL/python - CSV - R/python - images -
network
  Presentation:
  network - CGI/python - browser

It looks like you want the reports fully automated and the final result
as HTML
(to be viewed with a browser), I suggest you look at the R2HTML package
and
the sweave function (this lets you write a report in HTML with r-code
in place of 
graphs and output, then a quick run through sweave and you have a final
report
in HTML ready to be viewed).

There are also several tools available for running R through CGI, go
to: 
http://www-r.project.org/ and click on R web-servers under the
Related Projects
heading in the left column to get details.

Hope this helps,


Greg Snow, Ph.D.
Statistical Data Center
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] ROracle error

2005-01-20 Thread Prof Brian Ripley

On Thu, 20 Jan 2005, Sabrina Carpentier wrote:
I am running R 2.0.0 on a SunOs 5.9 machine and using Oracle 8i.1.7.0.0 
(enterprise edition)
and when I  try to load ROracle I receive the following error:
require(ROracle)
Loading required package: ROracle
Loading required package: DBI
Error in dyn.load(x, as.logical(local), as.logical(now)) :
   unable to load shared library 
/bioinfo/local/R/lib/R/library/ROracle/libs/ROracle.so:
 ld.so.1: /bioinfo/local/R/lib/R/bin/exec/R: fatal: relocation error: file 
/bioinfo/local/R/lib/R/library/ROracle/libs/ROracle.so: symbol ncrov: 
referenced symbol not found
[1] FALSE
It's not an R issue, so please ask your sysadmins for help.  But
ldd /bioinfo/local/R/lib/R/library/ROracle/libs/ROracle.so
would be a good start as I suspect your Oracle client libraries are not 
being found.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread Douglas Bates

Liaw, Andy wrote:
From: Douglas Bates
michael watson (IAH-C) wrote:
I think that title makes sense... I hope it does...
I have a data frame, one of the columns of which is a 
factor.  I want
the rows of data that correspond to the level in that factor which
occurs the most times.  
So first you want to determine the mode (in the sense of the most 
frequently occuring value) of the factor.   One way to do this is

names(which.max(table(fac)))
Use this comparison for the subset as
subset(data, pattern == names(which.max(table(pattern

Just be careful that if there are ties (i.e., more than one level having the
max) which.max() will randomly pick one of them.  That may or may not be
what's desired.  If that is a possibility, Mick will need to think what he
wants in such cases.
According to the documentation it picks the first one.  Also, that's 
what Martin Maechler told me and he wrote the code so I trust him on 
that.  I figure that if you have to trust someone to be meticulous and 
precise then a German-speaking Swiss is a good choice.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Valasz: patched

2005-01-20 Thread info

Kedves Ügyfelünk!

Az [EMAIL PROTECTED] címre küldött levelét rendszerünk fogadta, munkatársunk
hamarosan válaszol rá. Amennyiben Ön rendelési, szállítási vagy egyéb
adminisztrációs problémával kapcsolatban írt nekünk, kérjük újra küldje el a
levelet az [EMAIL PROTECTED] címre, hogy a rendelések feldolgozását végzõ
kollégáink minél hamarabb segíteni tudjanak!

Üdvözlettel:

NetPiac Ügyfélszolgálat
---
 NetPiac DVD és VHS Online Áruház
 http://www.netpiac.hu  
---
[EMAIL PROTECTED]  | Tel.: 239 4517
1135 Budapest, Szt. László út 60-64.
Levélcím: 1325 Budapest Pf. 222.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread Liaw, Andy

 From: Douglas Bates

 Liaw, Andy wrote:
 From: Douglas Bates

 michael watson (IAH-C) wrote:

 I think that title makes sense... I hope it does...

 I have a data frame, one of the columns of which is a 

 factor.  I want

 the rows of data that correspond to the level in that factor which
 occurs the most times.  

 So first you want to determine the mode (in the sense of the most 
 frequently occuring value) of the factor.   One way to do this is

 names(which.max(table(fac)))

 Use this comparison for the subset as

 subset(data, pattern == names(which.max(table(pattern

  Just be careful that if there are ties (i.e., more than one 
 level having the
  max) which.max() will randomly pick one of them.  That may 
 or may not be
  what's desired.  If that is a possibility, Mick will need 
 to think what he
  wants in such cases.

 According to the documentation it picks the first one.  Also, that's 
 what Martin Maechler told me and he wrote the code so I trust him on 
 that.  I figure that if you have to trust someone to be 
 meticulous and 
 precise then a German-speaking Swiss is a good choice.

My apologies!  I got it mixed up with max.col, which does the tie-breaking. 

Andy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Windows Front end-crash error

2005-01-20 Thread Doran, Harold

Dear List:

First, many thanks to those who offered assistance while I constructed
code for the simulation. I think I now have code that resolves most of
the issues I encountered with memory.

While the code works perfectly for smallish datasets with small sample
sizes, it arouses a windows-based error with samples of 5,000 and 250
datasets. The error is a dialogue box with the following:

R for Windows terminal front-end has encountered a problem and needs to
close.  We are sorry for the inconvenience. If you were in the middle of
something, the information you were working on might be lost.

The new code is below. Can anyone suggest whether this error is derived
from inefficient code, or is it derived based on a windows specific
issue that can somehow be resolved and if so, how.

Thanks
Harold



library(MASS)
library(nlme)
mu-c(100,150,200,250)
Sigma-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4,4
)
mu2-c(0,0,0)
LE-8^2 #Linking Error
Sigma2-diag(LE,3)
sample.size-5000
N-100 #Number of datasets
#Take a single draw from VL distribution
vl.error-mvrnorm(n=N, mu2, Sigma2)

intercept1 - 0
slope1 - 0
intercept2 - 0
slope2 - 0

for(i in 1:N){
temp - data.frame(ID=seq(1:sample.size),mvrnorm(n=sample.size,
mu,Sigma))

temp$X5 - temp$X1
temp$X6 - temp$X2 + vl.error[i,1] 
temp$X7 - temp$X3 + vl.error[i,2]
temp$X8 - temp$X4 + vl.error[i,3] 

long-reshape(temp, idvar=ID,
varying=list(c(X1,X2,X3,X4),c(X5,X6,X7,X8)), 
v.names=c(score.1,score.2),direction='long')

glsrun1 - gls(score.1~I(time-1), data=long, 
correlation=corAR1(form=~1|ID), method='ML')

glsrun2 - gls(score.2~I(time-1), data=long, 
correlation=corAR1(form=~1|ID), method='ML')

intercept1[[i]] - glsrun1$coefficient[1]
slope1[[i]] - glsrun1$coefficient[2]
intercept2[[i]] - glsrun2$coefficient[1]
slope2[[i]] - glsrun2$coefficient[2]
}

cat(Sample Size =, sample.size, \n)
cat(Number of Datasets =, N, \n)
cat(Vertical Linking Error =, LE, \n)
cat(Original Standard Errors,\n, Intercept,\t,
sd(intercept1),\n,Slope,\t,\t, sd(slope1),\n)
cat(Modified Standard Errors,\n, Intercept,\t,
sd(intercept2),\n,Slope,\t,\t, sd(slope2),\n)

rm(list=ls())
gc()

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Cross-validation accuracy in SVM

2005-01-20 Thread Ton van Daelen

Hi all -

I am trying to tune an SVM model by optimizing the cross-validation
accuracy. Maximizing this value doesn't necessarily seem to minimize the
number of misclassifications. Can anyone tell me how the
cross-validation accuracy is defined? In the output below, for example,
cross-validation accuracy is 92.2%, while the number of correctly
classified samples is (1476+170)/(1476+170+4) = 99.7% !?

Thanks for any help.

Regards - Ton

---
Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
   cost:  8 
  gamma:  0.007 

Number of Support Vectors:  1015

 ( 148 867 )

Number of Classes:  2 

Levels: 
 false true

5-fold cross-validation on training data:

Total Accuracy: 92.24242 
Single Accuracies:
 90 93.3 94.84848 92.72727 90.30303 

Contingency Table
   predclasses
origclasses false true
  false 1476 0
  true 4   170

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Cross-validation accuracy in SVM

2005-01-20 Thread Liaw, Andy

The 99.7% accuracy you quoted, I take it, is the accuracy on the training
set.  If so, that number hardly means anything (other than, perhaps,
self-fulfilling prophecy).  Usually what one would want is for the model to
be able to predict data that weren't used to train the model with high
accuracy.  That's what cross-validation tries to emulate.  It gives you an
estimate of how well you can expect your model to do on data that the model
has not seen.

Andy

 From: Ton van Daelen
 
 Hi all -
 
 I am trying to tune an SVM model by optimizing the cross-validation
 accuracy. Maximizing this value doesn't necessarily seem to 
 minimize the
 number of misclassifications. Can anyone tell me how the
 cross-validation accuracy is defined? In the output below, 
 for example,
 cross-validation accuracy is 92.2%, while the number of correctly
 classified samples is (1476+170)/(1476+170+4) = 99.7% !?
 
 Thanks for any help.
 
 Regards - Ton
 
 ---
 Parameters:
SVM-Type:  C-classification 
  SVM-Kernel:  radial 
cost:  8 
   gamma:  0.007 
 
 Number of Support Vectors:  1015
 
  ( 148 867 )
 
 Number of Classes:  2 
 
 Levels: 
  false true
 
 5-fold cross-validation on training data:
 
 Total Accuracy: 92.24242 
 Single Accuracies:
  90 93.3 94.84848 92.72727 90.30303 
 
 Contingency Table
predclasses
 origclasses false true
   false 1476 0
   true 4   170
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Straight-line fitting with errors in both coordinates

2005-01-20 Thread Pan, Chongle

Hi All,
I want to fit a straight line into a group of two-dimensional data points with 
errors in both x and y coordinates. I found there is an algorithm provided in 
NUMERICAL RECIPES IN C http://www.library.cornell.edu/nr/bookcpdf/c15-3.pdf

I'm wondering if there is a similar function for this implemented in R. And 
how can I change the objective function, from example, from sum of squared 
error to sum of absolute error?

Regards,
Chongle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] easing out of Excel

2005-01-20 Thread Shawn Way

Definitely check out the lattice package.

One other option is to use sweave/latex mixed with RODBC.  This can be
used to produce PDF's for easy distribution as well.  I would also
consider operating this in a batch mode, the R/sweave/latex works very
well this way.


Shawn Way, PE
Engineering Manager

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Greg Snow
Sent: Thursday, January 20, 2005 10:52 AM
To: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] easing out of Excel

 Paul Sorenson [EMAIL PROTECTED] 01/19/05 03:18PM 
 I know enough about R to be dangerous and our marketing people have
 asked me to automate some reporting.  Data comes from an SQL
source
 and graphs and various summaries are currently created manually in
 Excel.  The raw information is invoicing records and the reporting
is
 basically summaries by customer, region, product line etc.
 
 With function such as aggregate(), hist() and pareto() (which
someone
 on this list kindly pointed me at) I can produce something roughly
 equivalent to the current reports.
 
 My question is, are there any neat R lock out features people
here
 like to use on this kind of info, particularly when the output is
very
 visual (report is intended for marketing people).
 
 Another way of looking at this is, What kind of hidden
information
 can I extract with R that the Excel solution hasn't touched?

Since you are looking for summaries within groups, you should look at
the
lattice package and some of the plots that you can produce with it
(maybe
for each product line you can produce a lattice/trellis graph with each
panel
representing a region and different colors symbols within panels to
represent
different customers).

If we had more of an idea of what you are looking for, we could give
better
suggestions.


 For example, even the pareto plot mentioned earlier is something
the
 Excel guys haven't thought of or can't easily produce.
 
 regards
 
 BTW the tool chain I am using goes something like:
  Production (run daily):
  DB - SQL/python - CSV - R/python - images -
network
  Presentation:
  network - CGI/python - browser

It looks like you want the reports fully automated and the final result
as HTML
(to be viewed with a browser), I suggest you look at the R2HTML package
and
the sweave function (this lets you write a report in HTML with r-code
in place of 
graphs and output, then a quick run through sweave and you have a final
report
in HTML ready to be viewed).

There are also several tools available for running R through CGI, go
to: 
http://www-r.project.org/ and click on R web-servers under the
Related Projects
heading in the left column to get details.

Hope this helps,


Greg Snow, Ph.D.
Statistical Data Center
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Cross-validation accuracy in SVM

2005-01-20 Thread Frank E Harrell Jr

Ton van Daelen wrote:
Hi all -
I am trying to tune an SVM model by optimizing the cross-validation
accuracy. Maximizing this value doesn't necessarily seem to minimize the
number of misclassifications. Can anyone tell me how the
cross-validation accuracy is defined? In the output below, for example,
cross-validation accuracy is 92.2%, while the number of correctly
classified samples is (1476+170)/(1476+170+4) = 99.7% !?
Thanks for any help.
Regards - Ton
Percent correctly classified is an improper scoring rule.  The percent 
is maximized when the predicted values are bogus.  In addition, one can 
add a very important predictor and have the % actually decrease.

Frank Harrell
---
Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
   cost:  8 
  gamma:  0.007 

Number of Support Vectors:  1015
 ( 148 867 )
Number of Classes:  2 

Levels: 
 false true

5-fold cross-validation on training data:
Total Accuracy: 92.24242 
Single Accuracies:
 90 93.3 94.84848 92.72727 90.30303 

Contingency Table
   predclasses
origclasses false true
  false 1476 0
  true 4   170
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Successful installation of R 2.0.1 on SUSE 9.1

2005-01-20 Thread Min-Han Tan

Hi,

We managed to compile R 2.0.1 on 64-bit SUSE Linux 9.1 on a HP
Proliant setup fairly uneventfully by following instructions on the R
installation guide. We did encounter a minor hiccup in setting up x11,
a problem which we note has been raised 4 or 5 times previously, but
this was overcome thanks to this recent post by Peter Dalgaard on SUSE
9.1 and R.https://stat.ethz.ch/pipermail/r-help/2005-January/062397.html,
as well as previous comments on the mailing list.

One clarification on that post may be helpful: there are only 3
additional developmental packages required for successful X11
installation.

XFree86-devel-4.3.99.902-30
fontconfig-devel
freetype2-devel

These were not available in YAST (9.1 SUSE), but were located in

http://ftp.suse.com/pub/suse/ 

Once again, thanks to the assorted R gurus and wizards for making this
mailing list such a great resource.

Regards,
Min-Han Tan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] how to call R in delphi?

2005-01-20 Thread Earl F. Glynn

Dieter Menne [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 To call R from Delphi, you may try
 http://www.menne-biomed.de/download/RDComDelphi.zip.

I downloaded this file and tried to compile the RDCom project using Delphi 5
and Delphi 7 but I get this message from both  compilers:

[Fatal Error] STATCONNECTORCLNTLib_TLB.pas(406): Could not create output
file 'c:\program
files\borland\delphi7\Twain\d5\dcu\STATCONNECTORCLNTLib_TLB.dcu'



The \db\dcu in the path in this error message was a bit curious so I
looked at

Project | Options | Directories/Conditionals



Unit output directory:

$(DELPHI)\Twain\d5\dcu



Search path:

$(DELPHI)\Compon;C:\D2Pr\CascCont\COMPON;$(DELPHI)\Source\Toolsapi



On my vanilla Delphi 5 and Delphi 7 installations all of the directories
for the Unit output directory and Search path are invalid for the RDCom.dpr
project.

If I delete the Unit output directory, I then get 17 compilation errors, all
like this:
[Error] RCom.pas(115): Undeclared identifier: 'VarType'
[Error] RCom.pas(141): Undeclared identifier: 'VarArrayDimCount'
[Error] RCom.pas(123): Undeclared identifier: 'VarArrayHighBound'
. . .

All of the above seems to happen whether or not I install
STATCONNECTORCLNTLib_TLB.pas and STATCONNECTORSRVLib_TLP.pas as components
(i.e., Component | Install Component | browse to .pas file | Open | OK |
Compile).  Am I supposed to do this at some point?

Can you give me any clues how to make this work?  Something seems to be
missing.

   Example program showing use of R from Delphi.
   Connecting to R via COM using Neuwirth's StatConnectorSrvLib
   Uses RCom.pas, which is a simple Delphi wrapper for passing
   commands, integer and double arrays.
   See http://cran.r-project.org/contrib/extra/dcom
   By:  [EMAIL PROTECTED]

I'm not sure I understand this either.  I went to
http://cran.r-project.org/contrib/extra/dcom

I read this documentation:
http://cran.r-project.org/contrib/extra/dcom/RSrv135.html

I downloaded and installed the R(COM) server (and rebooted)
http://cran.r-project.org/contrib/extra/dcom/RSrv135.exe

So, how I can I call R from Delphi using R(COM)?  Something seems to be
missing.

Duncan Murdoch's suggestion about direct calls to R.dll looks interesting,
but a complete working example would be nice.

Thanks for any help with this.

efg
Earl F. Glynn
Scientific Programmer
Stowers Institute for Medical Research

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Barplot at the axes of another plot

2005-01-20 Thread Robin Gruna

Hi,
I want to draw a barplot at the axes of another plot. I saw that with two 
histogramms and a scatterplot in a R graphics tutorial somewhere on the net, 
seemed to be a 2d histogramm. Can someone figure out what I mean and give me a 
hint to create such a graphic? Thank you very much, 
Robin  
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Barplot at the axes of another plot

2005-01-20 Thread Marc Schwartz

On Thu, 2005-01-20 at 23:53 +0100, Robin Gruna wrote:
 Hi,
 I want to draw a barplot at the axes of another plot. I saw that with
 two histogramms and a scatterplot in a R graphics tutorial somewhere
 on the net, seemed to be a 2d histogramm. Can someone figure out what
 I mean and give me a hint to create such a graphic? Thank you very
 much, 
 Robin  


See the examples in ?layout, which has the scatterplot with the marginal
histograms.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Windows Front end-crash error

2005-01-20 Thread Duncan Murdoch

On Thu, 20 Jan 2005 13:16:13 -0500, Doran, Harold [EMAIL PROTECTED]
wrote :

Dear List:

First, many thanks to those who offered assistance while I constructed
code for the simulation. I think I now have code that resolves most of
the issues I encountered with memory.

While the code works perfectly for smallish datasets with small sample
sizes, it arouses a windows-based error with samples of 5,000 and 250
datasets. The error is a dialogue box with the following:

R for Windows terminal front-end has encountered a problem and needs to
close.  We are sorry for the inconvenience. If you were in the middle of
something, the information you were working on might be lost.

The new code is below. Can anyone suggest whether this error is derived
from inefficient code, or is it derived based on a windows specific
issue that can somehow be resolved and if so, how.

It looks to me like an nlme bug.  I get the error in R-patched (built
Jan 15).  DrMingw shows this at the time of the crash:

Rgui.exe caused an Access Violation at location 01c8ae4b in module
nlme.dll Reading from location 7f1e8f18.

Registers:
eax=7f210020 ebx= ecx=01368c50 edx=b1df esi=4e20
edi=01108918
eip=01c8ae4b esp=0022d1d0 ebp=0022d208 iopl=0 nv up ei ng nz
ac po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=
efl=0296

Call stack:
01C8AE4B  nlme.dll:01C8AE4B  gls_loglik
004E5E77  R.dll:004E5E77  do_dotCode
...

I changed the loop to print some status lines, and it failed after the
first time it printed gls2...

library(MASS)
library(nlme)

set.seed(123)

mu-c(100,150,200,250)
Sigma-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4,4
)
mu2-c(0,0,0)
LE-8^2 #Linking Error
Sigma2-diag(LE,3)
sample.size-5000
N-100 #Number of datasets
#Take a single draw from VL distribution
vl.error-mvrnorm(n=N, mu2, Sigma2)

intercept1 - 0
slope1 - 0
intercept2 - 0
slope2 - 0

for(i in 1:N){
print(i)
flush.console()
temp - data.frame(ID=seq(1:sample.size),mvrnorm(n=sample.size,
mu,Sigma))

temp$X5 - temp$X1
temp$X6 - temp$X2 + vl.error[i,1] 
temp$X7 - temp$X3 + vl.error[i,2]
temp$X8 - temp$X4 + vl.error[i,3] 

print(reshape...)
flush.console()

long-reshape(temp, idvar=ID,
varying=list(c(X1,X2,X3,X4),c(X5,X6,X7,X8)), 
v.names=c(score.1,score.2),direction='long')

print(gls1...)
flush.console()

glsrun1 - gls(score.1~I(time-1), data=long, 
correlation=corAR1(form=~1|ID), method='ML')

print(gls2...)
flush.console()

glsrun2 - gls(score.2~I(time-1), data=long, 
correlation=corAR1(form=~1|ID), method='ML')

intercept1[[i]] - glsrun1$coefficient[1]
slope1[[i]] - glsrun1$coefficient[2]
intercept2[[i]] - glsrun2$coefficient[1]
slope2[[i]] - glsrun2$coefficient[2]
}

Hopefully this will let someone more familiar with nlme track it down.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Plots with same x-axes

2005-01-20 Thread Robin Gruna

Hi,
I want to plot two graphics on top of each other with layout(), a scatterplot 
and a barplot. The problems are the different x-axes ratios of the plots. How 
can I align the two x-axes?   Thank you very much,
Robin
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Plotting points from two vectors onto the same graph

2005-01-20 Thread K Fernandes

Hello,

I have three vectors defined as follows:

 x-c(10,20,30,40,50)
 y1-c(154,143,147,140,148)
 y2-c(178,178,171,188,180)

I would like to plot y1 vs x and y2 vs x on the same graph.  How might I do
this?  I have looked through a help file on plots but could not find the
answer to plotting multiple plots on the same graph.

Thank you for your help,
K

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Need help to transform data into co-occurence matix

2005-01-20 Thread Judie Z

Dear R experts,
I have the data in the following fomat(from some kind of card sorting process)
 
ID  Category   Card numbers
1   1   1,2,5
1   2   3,4
2   1   1,2
2   2   3
2   3   4,5
 
I want to transform this data into two co-occurence matrix (one for each ID)
-- For ID 1
   1 2 3 4 5
1 1 1 0 0 1
2 1 1 0 0 1
3 0 0 1 1 0
4 0 0 1 1 0
5 1 1 0 0 1
 
-- For ID 2
   1 2 3 4 5
1  1 1 0 0 0
2  1 1 0 0 0
3  0 0 1 0 0 
4  0 0 0 1 1
5  0 0 0 1 1
 
The columns and rows are representing the card numbers. All 0s mean the card 
numbers are not in the same category, vice versa.
 
Is there any way I can to this in R?
I would really appreciate your help. 
 
Judie, Tie
 
 


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Plotting points from two vectors onto the same graph

2005-01-20 Thread Mulholland, Tom

?points has this example

plot(-4:4, -4:4, type = n)# setting up coord. system
points(rnorm(200), rnorm(200), col = red)
points(rnorm(100)/2, rnorm(100)/2, col = blue, cex = 1.5)

In general you might want to check out the keyword section of the help, in 
particular the Graphics section which has an entry called aplot for ways to add 
to existing plots.

Tom

 -Original Message-
 From: K Fernandes [mailto:[EMAIL PROTECTED]
 Sent: Friday, 21 January 2005 9:51 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Plotting points from two vectors onto the same graph
 
 
 Hello,
 
 I have three vectors defined as follows:
 
  x-c(10,20,30,40,50)
  y1-c(154,143,147,140,148)
  y2-c(178,178,171,188,180)
 
 I would like to plot y1 vs x and y2 vs x on the same graph.  
 How might I do
 this?  I have looked through a help file on plots but could 
 not find the
 answer to plotting multiple plots on the same graph.
 
 Thank you for your help,
 K
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Plotting points from two vectors onto the same graph

2005-01-20 Thread Marc Schwartz

On Thu, 2005-01-20 at 20:51 -0500, K Fernandes wrote: 
 Hello,
 
 I have three vectors defined as follows:
 
  x-c(10,20,30,40,50)
  y1-c(154,143,147,140,148)
  y2-c(178,178,171,188,180)
 
 I would like to plot y1 vs x and y2 vs x on the same graph.  How might I do
 this?  I have looked through a help file on plots but could not find the
 answer to plotting multiple plots on the same graph.
 
 Thank you for your help,
 K

First, when posting a new query, please do not do so by replying to an
existing post. Your post is now listed in the archive linked to an
entirely different thread.

The easiest way to do this is to use the matplot() function:

x - c(10,20,30,40,50)
y1 - c(154,143,147,140,148)
y2 - c(178,178,171,188,180)

# now do the plot. cbind() the two sets of y values
# and the x values with be cycled for each
matplot(x, cbind(y1, y2), col = c(red, blue))

See ?matplot for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Plots with same x-axes

2005-01-20 Thread Marc Schwartz

On Fri, 2005-01-21 at 01:48 +0100, Robin Gruna wrote:
 Hi,
 I want to plot two graphics on top of each other with layout(), a
 scatterplot and a barplot. The problems are the different x-axes
 ratios of the plots. How can I align the two x-axes?   Thank you very
 much,
 Robin


Robin,

Here is an example:

# Set the layout, smaller plot on top for the 
# barplot region
nf - layout(c(2, 1), heights = c(1, 3))
layout.show(nf)

# Create the data
x - rnorm(50)
y - rnorm(50)

# Set the margins for the scatterplot so that they will match with the
# barplot settings
par(mar = c(3, 3, 0, 3))

# now do the scatterplot
plot(x, y)

# Get the hist data for x
xhist - hist(x, plot = FALSE)

# Set the margins for the barplot to use more of the plot
# region
par(mar = c(0, 3, 1, 3))

# now plot that barplot on top
# Set the 'space' argument to 0 so that the bars are
# next to each other
barplot(xhist$counts, axes = FALSE, space = 0)

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] easing out of Excel

2005-01-20 Thread Paul Sorenson

Thanks for the responses to this question, I fully realise it is a rather open 
question and the open pointers are the kind of thing I am looking for.

I will look into the lattice package and layout.

Regarding the HTML output, the current tool chain assets that I have have 
been refactored over time and are almost totally driven by config files so they 
suit my purposes very well.  I will look into other possibilities at a later 
date.

For those looking for a more rigorous specification of the problem, you are 
well justified in this.  I was deliberately fuzzy since managers just want 
stuff and I thought casting a wide net would pay off.  The problem is to 
summarise information which is nothing more than sales data.  The kinds of 
columns I am dealing with look like:

date, customer, invoice_no, product, amount, sales_region, etc etc.

Managers want to know things like:
- which products are doing well
- which regions are doing well
- who are good customers
- etc

To me these are simple aggregates and sorts, with visual presentations to match.

I figure a bit of effort, R can extract considerably more useful information 
from the data.

To be honest I am just evolving it as I go, using an existing spreadsheet as a 
basis.  I try something and if it is useful then great, if not, put it down to 
learning.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Need help to transform data into co-occurence matix

2005-01-20 Thread Liaw, Andy

 From: Judie Z
 
 Dear R experts,
 I have the data in the following fomat(from some kind of card 
 sorting process)
  
 ID  Category   Card numbers
 1   1   1,2,5
 1   2   3,4
 2   1   1,2
 2   2   3
 2   3   4,5
  
 I want to transform this data into two co-occurence matrix 
 (one for each ID)
 -- For ID 1
1 2 3 4 5
 1 1 1 0 0 1
 2 1 1 0 0 1
 3 0 0 1 1 0
 4 0 0 1 1 0
 5 1 1 0 0 1
  
 -- For ID 2
1 2 3 4 5
 1  1 1 0 0 0
 2  1 1 0 0 0
 3  0 0 1 0 0 
 4  0 0 0 1 1
 5  0 0 0 1 1
  
 The columns and rows are representing the card numbers. All 
 0s mean the card numbers are not in the same category, vice versa.
  
 Is there any way I can to this in R?
 I would really appreciate your help. 

It depends on how the data are structured in R.  Here's an example (I'm sure
others can come up with more clever/efficient ways):

 cardlist - list(c(1,2,5), c(3,4))
 indicator - function(i, n=max(i)) { x - rep(0, n); x[i] - 1; x}
 matrix(rowSums(sapply(cardlist, function(i) crossprod(t(indicator(i,
5), nrow=5)
 [,1] [,2] [,3] [,4] [,5]
[1,]11001
[2,]11001
[3,]00110
[4,]00110
[5,]11001

which is the matrix for ID 1 in your example.

HTH,
Andy
  
 Judie, Tie
   
 -
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Need help to transform data into co-occurence matix

2005-01-20 Thread Tim F Liao

Judie,

You may want to see if the MedlineR library, which has a
program for constructing co-occurrence matrices, will work for
you.  The program can be found at:

http://dbsr.duke.edu/pub/MedlineR/

Have fun with it,

Tim Liao
Professor of Sociology  Statistics
University of Illinois
Urbana, IL 61801

 Original message 
Date: Thu, 20 Jan 2005 18:19:19 -0800 (PST)
From: Judie Z [EMAIL PROTECTED]  
Subject: [R] Need help to transform data into co-occurence
matix  
To: r-help@stat.math.ethz.ch

Dear R experts,
I have the data in the following fomat(from some kind of card
sorting process)
 
ID  Category   Card numbers
1   1   1,2,5
1   2   3,4
2   1   1,2
2   2   3
2   3   4,5
 
I want to transform this data into two co-occurence matrix
(one for each ID)
-- For ID 1
   1 2 3 4 5
1 1 1 0 0 1
2 1 1 0 0 1
3 0 0 1 1 0
4 0 0 1 1 0
5 1 1 0 0 1
 
-- For ID 2
   1 2 3 4 5
1  1 1 0 0 0
2  1 1 0 0 0
3  0 0 1 0 0 
4  0 0 0 1 1
5  0 0 0 1 1
 
The columns and rows are representing the card numbers. All
0s mean the card numbers are not in the same category, vice
versa.
 
Is there any way I can to this in R?
I would really appreciate your help. 
 
Judie, Tie
 
 

   
-


   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] easing out of Excel

2005-01-20 Thread Mulholland, Tom

I hesitate to add this comment since it either completely confuses people or 
they take to it very quickly.

The data that you are using is mostly categorical. I expect that tables will 
have been used in the past and that to acertain extent the graphics are 
suppossed to help with getting a quick understanding of the data.

There is a package called vcd (Visualizing Categorical Data) which is useful 
for analysing this type of data. I like the use of the mosaicplot and in 
particular the shade parameter (which is based on standardized residuals). If 
set up properly it can be used to very quickly identify sales regions that are 
doing significantly better than they were last year, customers who have 
significantly reduced purchases. Basically if you can produce a table that 
would give this information then a shaded mosaicplot can efficiently highlight 
the  significant parts of the table.

They take a little bit of getting used to at first, but if you need to analyse 
this type of data they take a lot of the guess work out of making commentary on 
the data. How useful they are depends upon the users, who as I have said seem 
to be polarised in their reactions to the output.

Tom

 -Original Message-
 From: Paul Sorenson [mailto:[EMAIL PROTECTED]
 Sent: Friday, 21 January 2005 11:33 AM
 To: r-help@stat.math.ethz.ch
 Subject: RE: [R] easing out of Excel
 
 
 Thanks for the responses to this question, I fully realise it 
 is a rather open question and the open pointers are the 
 kind of thing I am looking for.
 
 I will look into the lattice package and layout.
 
 Regarding the HTML output, the current tool chain assets 
 that I have have been refactored over time and are almost 
 totally driven by config files so they suit my purposes very 
 well.  I will look into other possibilities at a later date.
 
 For those looking for a more rigorous specification of the 
 problem, you are well justified in this.  I was deliberately 
 fuzzy since managers just want stuff and I thought casting 
 a wide net would pay off.  The problem is to summarise 
 information which is nothing more than sales data.  The kinds 
 of columns I am dealing with look like:
 
 date, customer, invoice_no, product, amount, sales_region, etc etc.
 
 Managers want to know things like:
   - which products are doing well
   - which regions are doing well
   - who are good customers
   - etc
 
 To me these are simple aggregates and sorts, with visual 
 presentations to match.
 
 I figure a bit of effort, R can extract considerably more 
 useful information from the data.
 
 To be honest I am just evolving it as I go, using an existing 
 spreadsheet as a basis.  I try something and if it is useful 
 then great, if not, put it down to learning.
 
 cheers
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] dim vs length for vectors

2005-01-20 Thread Olivia Lau

Hi all,
I'm not sure if this is a feature or a bug (and I did read the 
FAQ and the posting guide, but am still not sure).  Some of my 
students have been complaining and I thought I just might ask: 
Let K be a vector of length k.  If one types dim(K), you get 
NULL rather than [1] k.  Is this logical?

Here's the way I explain it (and maybe someone can provide a 
more accurate explanation of what's going on):  R has several 
types of scalar (atomic) values, the most common of which are 
numeric, integer, logical, and character values.  Arrays are 
data structures which hold only one type of atomic value. 
Arrays can be one-dimensional (vectors), two-dimensional 
(matrices), or n-dimensional.

(We generally use arrays of n-1 dimensions to populate 
n-dimensional arrays -- thus, we generally use vectors to 
populate matrices, and matrices to populate 3-dimensional 
arrays, but could use any array of dimension  n-1 to populate 
an n-dimensional array.)

It logically follows that when one does dim() on a vector, one 
should *not* get NULL, but should get the length of the vector 
(which one *could* obtain by doing length(), but I think this is 
less logical).  I think that R should save length() for lists 
that have objects of different dimension and type.

Does this make sense?  Or is there a better explanation?
Thanks in advance!  Yours,
Olivia Lau
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] dim vs length for vectors

2005-01-20 Thread Gabor Grothendieck

Olivia Lau olau at fas.harvard.edu writes:

: 
: Hi all,
: 
: I'm not sure if this is a feature or a bug (and I did read the 
: FAQ and the posting guide, but am still not sure).  Some of my 
: students have been complaining and I thought I just might ask: 
: Let K be a vector of length k.  If one types dim(K), you get 
: NULL rather than [1] k.  Is this logical?
: 
: Here's the way I explain it (and maybe someone can provide a 
: more accurate explanation of what's going on):  R has several 
: types of scalar (atomic) values, the most common of which are 
: numeric, integer, logical, and character values.  Arrays are 
: data structures which hold only one type of atomic value. 
: Arrays can be one-dimensional (vectors), two-dimensional 
: (matrices), or n-dimensional.
: 
: (We generally use arrays of n-1 dimensions to populate 
: n-dimensional arrays -- thus, we generally use vectors to 
: populate matrices, and matrices to populate 3-dimensional 
: arrays, but could use any array of dimension  n-1 to populate 
: an n-dimensional array.)
: 
: It logically follows that when one does dim() on a vector, one 
: should *not* get NULL, but should get the length of the vector 
: (which one *could* obtain by doing length(), but I think this is 
: less logical).  I think that R should save length() for lists 
: that have objects of different dimension and type.
: 

In R, vectors are not arrays:

R v - 1:4
R dim(v)
NULL
R is.array(v)
[1] FALSE

R a - array(1:4)
R dim(a)
[1] 4
R is.array(a)
[1] TRUE

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] dim vs length for vectors

2005-01-20 Thread miguel manese

I think the more intuitive way to think of it is that dim works only
for matrices (an array being a 1 column matrix). and vectors are not
matrices.

 x - 1:5
 class(x)  # numeric
  dim(x) - 5
 class(x) #  array
 dim(x) - c(5,1)
 class(x) # matrix
 dim(x) - c(1,5)
 class(x) # matrix


On Fri, 21 Jan 2005 05:35:11 + (UTC), Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 Olivia Lau olau at fas.harvard.edu  writes:
 
 :
 : Hi all,
 :
 : I'm not sure if this is a feature or a bug (and I did read the
 : FAQ and the posting guide, but am still not sure).  Some of my
 : students have been complaining and I thought I just might ask:
 : Let K be a vector of length k.  If one types dim(K), you get
 : NULL rather than [1] k.  Is this logical?
 :
 : Here's the way I explain it (and maybe someone can provide a
 : more accurate explanation of what's going on):  R has several
 : types of scalar (atomic) values, the most common of which are
 : numeric, integer, logical, and character values.  Arrays are
 : data structures which hold only one type of atomic value.
 : Arrays can be one-dimensional (vectors), two-dimensional
 : (matrices), or n-dimensional.
 :
 : (We generally use arrays of n-1 dimensions to populate
 : n-dimensional arrays -- thus, we generally use vectors to
 : populate matrices, and matrices to populate 3-dimensional
 : arrays, but could use any array of dimension  n-1 to populate
 : an n-dimensional array.)
 :
 : It logically follows that when one does dim() on a vector, one
 : should *not* get NULL, but should get the length of the vector
 : (which one *could* obtain by doing length(), but I think this is
 : less logical).  I think that R should save length() for lists
 : that have objects of different dimension and type.
 :
 
 In R, vectors are not arrays:
 
 R v - 1:4
 R dim(v)
 NULL
 R is.array(v)
 [1] FALSE
 
 R a - array(1:4)
 R dim(a)
 [1] 4
 R is.array(a)
 [1] TRUE
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Cholesky Decomposition

2005-01-20 Thread kolluru ramesh

Can we do Cholesky Decompositon in R for any matrix


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] dim vs length for vectors

2005-01-20 Thread Gabor Grothendieck


More generally, anything that has a dim attribute is an array 
including 1d, 2d, 3d structures with dim attributes.
Matrices have a dim attribute so matrices are arrays and
is.array(m) will be TRUE if m is a matrix.  

miguel manese jjonphl at gmail.com writes:

: 
: I think the more intuitive way to think of it is that dim works only
: for matrices (an array being a 1 column matrix). and vectors are not
: matrices.
: 
:  x - 1:5
:  class(x)  # numeric
:   dim(x) - 5
:  class(x) #  array
:  dim(x) - c(5,1)
:  class(x) # matrix
:  dim(x) - c(1,5)
:  class(x) # matrix
: 
: On Fri, 21 Jan 2005 05:35:11 + (UTC), Gabor Grothendieck
: ggrothendieck at myway.com wrote:
:  Olivia Lau olau at fas.harvard.edu  writes:
:  
:  :
:  : Hi all,
:  :
:  : I'm not sure if this is a feature or a bug (and I did read the
:  : FAQ and the posting guide, but am still not sure).  Some of my
:  : students have been complaining and I thought I just might ask:
:  : Let K be a vector of length k.  If one types dim(K), you get
:  : NULL rather than [1] k.  Is this logical?
:  :
:  : Here's the way I explain it (and maybe someone can provide a
:  : more accurate explanation of what's going on):  R has several
:  : types of scalar (atomic) values, the most common of which are
:  : numeric, integer, logical, and character values.  Arrays are
:  : data structures which hold only one type of atomic value.
:  : Arrays can be one-dimensional (vectors), two-dimensional
:  : (matrices), or n-dimensional.
:  :
:  : (We generally use arrays of n-1 dimensions to populate
:  : n-dimensional arrays -- thus, we generally use vectors to
:  : populate matrices, and matrices to populate 3-dimensional
:  : arrays, but could use any array of dimension  n-1 to populate
:  : an n-dimensional array.)
:  :
:  : It logically follows that when one does dim() on a vector, one
:  : should *not* get NULL, but should get the length of the vector
:  : (which one *could* obtain by doing length(), but I think this is
:  : less logical).  I think that R should save length() for lists
:  : that have objects of different dimension and type.
:  :
:  
:  In R, vectors are not arrays:
:  
:  R v - 1:4
:  R dim(v)
:  NULL
:  R is.array(v)
:  [1] FALSE
:  
:  R a - array(1:4)
:  R dim(a)
:  [1] 4
:  R is.array(a)
:  [1] TRUE
:  
:  __
:  R-help at stat.math.ethz.ch mailing list
:  https://stat.ethz.ch/mailman/listinfo/r-help 
:  PLEASE do read the posting guide! http://www.R-project.org/posting-
guide.html 
: 
: 
: __
: R-help at stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

56 matches

Mail list logo